Loading…
Loading…
Tag
5 posts

June 5, 2026
Most engineers reason about LLMs in words, characters, or messages. The model sees none of that — it sees tokens, and tokens are compute someone's GPU has to run. This post traces what a token actually is, why output costs 3–10x more than input, the five-step journey of an API call, and the four cost levers most teams never pull.

May 31, 2026
Traditional RAG chops documents into arbitrary chunks, embeds them, and hopes cosine similarity finds the right one. PageIndex throws that out — it builds a hierarchical table-of-contents tree and lets an LLM reason its way to the right section, the way a human expert flips to the right chapter. No embeddings, no vector DB. It hit 98.7% on the FinanceBench benchmark.

April 27, 2026
On April 21, 2026, SpaceX announced an option to acquire Cursor for $60 billion later this year. This post unpacks the actual architecture that makes Cursor feel like a senior engineer — local indexing, Merkle-tree change detection, Tree-sitter AST chunking, Turbopuffer vector search, and the Composer agent model — then breaks down the three strategic reasons this deal is really about who owns the AI stack.

April 9, 2026
On March 31, 2026, Anthropic accidentally shipped the full Claude Code source — 512,000 lines of TypeScript — inside an npm package. The resulting clean-room rewrite became the fastest-growing GitHub repo in history. Here's exactly what the architecture reveals, and what you can take from it as an engineer.

November 5, 2025
Token-Oriented Object Notation (TOON) is a compact, LLM-optimized alternative to JSON for serializing structured, mostly flat/tabular data. By removing repeated field names, quotes and redundant punctuation, TOON reduces token usage by roughly 30–60% in real-world AI workflows, leading to lower API bills, larger usable context windows, and often better model retrieval accuracy.