Dec 18, 2025 Why Steering Vectors Beat Prompting (And When They Don't) Jun 15, 2025 From 11% to 88% Peak Bandwidth: Writing Custom Triton Kernels for LLM Inference Mar 20, 2025 Making LLMs Faster: My Deep Dive into Speculative Decoding