Activation probe cost
What does running an activation probe at inference actually cost? Compare a linear probe against the forward pass it rides on, by model and probe scope.
Sources: What runtime interpretability actually costs · Code
Bet 12 / Runtime interpretability
What does an activation probe actually cost?
On paper a probe is a rounding error: one dot product against a hidden state the forward pass already computed. So the "too expensive to run in production" folklore cannot be about the math. It is about the implementation, and the gap between a naive hook and a batched async readout is the whole story.
Where the latency actually goes (log scale) — the arithmetic floor vs the two implementations:
How this is computed
Per generated token: forward pass ≈ 2 × params FLOPs; each linear readout is 2 × d_model FLOPs. The arithmetic floor is that ratio. The bytes version tells the same story: the probe weights for a thousand readouts are single-digit megabytes against the ~14 GB of model weights moved per token.
- Naive hook: a Python forward hook breaks CUDA graph capture, forcing the decode path back to eager. An open vLLM internals plug-in pencils a decode-time mode at ~25% throughput; naive hooks run an order of magnitude over baseline. Per-readout device-to-host copies add synchronization points on top.
- Batched async: pack all probes at a layer into one weight matrix, do a single small matmul inside the graph, accumulate in a device buffer, export asynchronously. Cost stays in low single digits.
The cost was never in the math. That is the argument of the post. Overhead figures here are illustrative bands anchored to those measured numbers.