Subhadip Mitra · Google Cloud

Data platforms. AI systems. The infrastructure between them.

Head of Data & Analytics and Site Lead at Google Cloud, leading Data & AI innovation and transformation across Southeast Asia. I publish research on multi-agent systems, inference optimization, and AI safety, and write the practitioner's notes behind it, from inside real production systems.

21 publications · 9 packages LatestICLR 2026 Spark-LLM-Eval

01 Now

Building

ICLR 2026 paper on LLM safety, and compute primitives for orbital environments.

Exploring

New ventures at the frontier; conversations with research labs and founders.

Writing

Fused MoE kernels, circuit tracing in production, and bets on model honesty.

What I'm up to →

02 Latest

Beating FP16 with 4-bit Weights: A Portable W4A16 GEMM in Triton

§ DEEP-LEARNING · Jul 2026

I wrote a 4-bit weight-only GEMM in pure Triton. The fast W4A16 kernels are all CUDA, so this one runs on NVIDIA and AMD. It beats cuBLAS FP16 by 1.1 to 1.3x in the decode regime, and the road there was mostly me bein...

Read the essay →

03 Selected writing

AI-SAFETYJul 2026 The Activation-Cone Blind Spot, or Why Your Jailbreak Defense Can't See PrefillingPrompt-time activation defenses stop GCG and AutoDAN cold, then fail half the time against pr... INTERPRETABILITYJul 2026 What Runtime Interpretability Actually Costs, Part 1: The Case for Measuring ItEveryone assumes activation probes are too expensive to run in production. I ran the numbers ... DEEPSEEKJun 2026 DeepSeek DSpark: Speculation Is a Scheduling ProblemWhat DeepSeek's DSpark and DeepSpec release actually changes for LLM inference: suffix decay,... AGENTSJun 2026 Loop Engineering: Convergence Is Not CorrectnessA loop always converges. That it reached a stable 'done' state tells you nothing about whethe... LLMApr 2026 Attention Is All You Bid: Advertising in Embedding SpaceEmbedding space is the new ad real estate. Mapping LLM ad auctions, RAG poisoning, GEO, and a...

All essays in the archive →

04 Research focus

/ 01

INFERENCEInteractive Memory-bandwidth rooflineMemory-bound or compute-bound, and how KV cache and kernel efficiency move the wall. ADVERSARIALInteractive Quality-diversity archiveWhy MAP-Elites maps diverse vulnerabilities that a single objective misses.

All instruments →

06 Selected publications

2026

Closing the Activation-Cone Blind Spot: Response-Time Probing and Unified Defense

arXiv

PDF →

2026

Cross-Generational Transfer of Adversarial Attacks Reveals Non-Monotonic Safety Alignment in LLMs

arXiv

PDF →

2026

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA

arXiv

PDF →

2026

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

arXiv

PDF →

2026

Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety

ICLR 2026 Workshop AIWILD

PDF →

All publications →

Data platforms. AI systems. The infrastructure between them.

Beating FP16 with 4-bit Weights: A Portable W4A16 GEMM in Triton

Multi-agent systems

Inference optimization

AI safety & interpretability

Distributed systems