blog · 6 posts
Writing from the inside.
Engineering deep-dives, research notes, and the occasional rant. Updated roughly every other week.
What our $4,300/month LLM bill taught us about caching
We routed everything through a semantic cache and turned a 47% cache-hit rate into a 71% one. Here's exactly what we changed.
Read →
Claude Opus 4.7 vs GPT-5: a side-by-side on our internal evals
Four task categories, 1,200 prompts, blind-rated. The headline result isn't what the benchmarks suggest.
Read →
Shipping the Vibe Pulse — a real-time AI leaderboard
From idea to launch in 11 days. The stack, the data pipeline, and the three things that almost killed it.
Read →
Astro + edge functions: our 60-page marketing site recipe
Why we left Next.js for Astro for the public site, and the four trade-offs we live with.
Read →
Why we publish our prompts
Transparency is a moat, not a leak. A short defense of open prompt libraries.
Read →
Evals that survive contact with reality
Most public benchmarks are useless in production. Here's the eval rig we actually run before every model swap.
Read →