Bets
These are beliefs I hold that could be wrong. Not obvious truths or safe consensus positions - actual bets where I'm taking a side that reasonable people might disagree with.
I'm writing them down because it's easy to have vague opinions and retrofit explanations later. Each one carries a thesis, the condition that would prove me wrong, and a horizon. Specific, falsifiable claims are harder to hide from. If I'm wrong about something here, I'd like to know.
Wrong if the safety work that measurably prevents real incidents turns out to be the un-testable kind - norms, governance, interpretability-as-philosophy - while shipped classifiers and guards make little difference.
Wrong if roofline analysis of mainstream decode workloads on next-gen accelerators lands in the compute-bound region at typical serving batch sizes.
Wrong if production alignment in 2028 is still dominated by preference optimization, and probes, steering, and circuit-level interventions remain research demos rather than deployed guards.
Wrong if the strongest deep-tech orgs systematically separate leadership from hands-on work and consistently out-ship player-coach orgs.
Wrong if consent stays a compliance checkbox and no widely-adopted runtime consent and provenance primitives emerge the way authentication and authorization stacks did.
Wrong if enterprise differentiation comes to hinge mainly on raw model quality, with eval, serving, and data glue commoditized enough that swapping in the best model is the dominant lever.
Wrong if LLM-in-the-loop pipelines stay niche and rule-based ETL still runs the majority of production data engineering.
Wrong if general-purpose robotics is unblocked mainly by better high-level reasoning, with perception-to-actuation latency and data no longer the binding constraint.
Wrong if single-model, long-context, tool-augmented systems out-perform orchestrated multi-agent setups on most enterprise workloads and become the default pattern.
Wrong if reasoning models generalize to soft domains - legal, medical, strategy - without better verification, for example self-verification that survives independent audit.
Wrong if retrieval in embedding space never develops bidding or marketplace dynamics, and GEO and RAG poisoning stay fringe rather than becoming a governed ad market.
Wrong if a careful end-to-end benchmark shows always-on probing adds more than ~5-10% p50 latency, or meaningfully cuts throughput, on a standard serving stack with no reasonable engineering path below that.