blog · 6 posts

Writing from the inside.

Engineering deep-dives, research notes, and the occasional rant. Updated roughly every other week.

What our $4,300/month LLM bill taught us about caching

We routed everything through a semantic cache and turned a 47% cache-hit rate into a 71% one. Here's exactly what we changed.

Ali · platform Read →

Four task categories, 1,200 prompts, blind-rated. The headline result isn't what the benchmarks suggest.

From idea to launch in 11 days. The stack, the data pipeline, and the three things that almost killed it.

Why we left Next.js for Astro for the public site, and the four trade-offs we live with.

Transparency is a moat, not a leak. A short defense of open prompt libraries.

Most public benchmarks are useless in production. Here's the eval rig we actually run before every model swap.