Reading List
A collected list of research papers, tech blogs, videos that I follow
First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Key finding - sandbagging representations are model-specific, and steering can reduce sandbagging by 20%.
I tested activation steering on 4 agent behaviors across 3 models. The results surprised me.
A deep dive into building distributed LLM evaluation infrastructure that actually scales - architectural decisions, trade-offs, and lessons learned.
A practical framework for evaluating your multi-agent context management strategy. From ad-hoc string concatenation to self-evolving context systems - where does your architecture stand?
A hands-on exploration of writing custom GPU kernels with OpenAI Triton, going from PyTorch's 11% bandwidth utilization to 88% on RMSNorm.
A deep dive into implementing speculative decoding from scratch, with benchmarks on GPT-2 and extensions to diffusion models.
A collected list of research papers, tech blogs, videos that I follow
Let's talk tech! I'll post everything from polished pieces to spur-of-the-moment thoughts. And if you've got ideas for posts or want to collaborate, let's connect!