Interactive
Instruments
Some arguments are easier to poke than to read. These are interactive versions of claims I have made in the writing and the bets. Each one runs entirely in your browser; nothing leaves the page.
- 01 Memory-bandwidth roofline Is a decode workload memory-bound or compute-bound? Plug in the model, batch size, precision, and GPU, and watch the operating point cross the ridge.
- 02 Activation probe cost What does running an activation probe at inference actually cost? Compare a linear probe against the forward pass it rides on, by model and probe scope.
- 03 The manifold dial Hyper-connections make composite forward gain explode with depth. A few Sinkhorn iterations project the mixing matrix onto the doubly-stochastic manifold and bound it. Drag the dial.
- 04 Speculative decoding speedup A cheap draft model proposes tokens; the target verifies them in one pass. How much speedup that buys depends on the acceptance rate, the draft length, and the cost ratio, and there is an optimal draft length.
- 05 Quality-diversity archive Optimize one number and search collapses onto the single best exploit. MAP-Elites keeps the best solution in every cell of a behavior space, mapping a diverse archive of vulnerabilities. Run both on the same budget and watch the difference.
- 06 Convergence is not correctness An agentic loop retries until a verifier says done, so it converges on whatever passes the check. Whether that is correct is decided by the verifier, not by convergence. Watch refinement and reward-hacking pull against each other.
- 07 Speculation under load Speculative decoding looks free on an idle GPU and becomes a tax on a busy fleet. Three policies share one capacity budget: plain decoding, fixed-length speculation, and a DSpark-style scheduler that shrinks its verify window as utilization climbs.
Each instrument is a self-contained widget, also embedded in the post it belongs to. More as I build them.