The Observer Effect in AI: When Models Know They're Being Tested - (Part 1/4)
Frontier AI models from OpenAI, Anthropic, and Google can now recognize when they're being tested. This observer effect undermines AI safety evaluation.
First empirical demonstration of activation-level sandbagging detection. Linear probes achieve 90-96% accuracy across Mistral, Gemma, and Qwen models. Key finding - sandbagging representations are model-specific, and steering can reduce sandbagging by 20%.
I tested activation steering on 4 agent behaviors across 3 models. The results surprised me.
A deep dive into building distributed LLM evaluation infrastructure that actually scales - architectural decisions, trade-offs, and lessons learned.
A practical framework for evaluating your multi-agent context management strategy. From ad-hoc string concatenation to self-evolving context systems - where does your architecture stand?
A hands-on exploration of writing custom GPU kernels with OpenAI Triton, going from PyTorch's 11% bandwidth utilization to 88% on RMSNorm.
A deep dive into implementing speculative decoding from scratch, with benchmarks on GPT-2 and extensions to diffusion models.
Frontier AI models from OpenAI, Anthropic, and Google can now recognize when they're being tested. This observer effect undermines AI safety evaluation.
AI companies are getting sued over training data, agents operate with no permission framework, and users can't control their AI profiles. I wrote four open standards (LLMConsent) to create a decentralized consent protocol for AI - like HTTP but for data rights, agent permissions, and user sovereignty. This is an RFC, not a product.
Explore Kimi K2’s trillion-parameter MoE architecture, MuonClip optimizer, and agentic training. Learn why it outperforms GPT-4.1 and DeepSeek-V3
A hands-on exploration of writing custom GPU kernels with OpenAI Triton, going from PyTorch's 11% bandwidth utilization to 88% on RMSNorm.
Discover how to implement Model Context Protocol (MCP) in autonomous multi-agent systems with this technical deep dive. Learn advanced context optimization strategies, distributed architecture patterns, and performance benchmarks with complete Python implementations. Includes hypothetical telecom implementation scenarios showing potential optimization benefits.