Nvidia's Vera Rubin platform: 336B transistors, 5x Blackwell inference
Nvidia's Vera Rubin platform centres on the Rubin R100 GPU — 336B transistors, 288GB HBM4, 22 TB/s memory bandwidth. Claimed 5x Blackwell inference at 10x lower cost per token.
Nvidia announced Vera Rubin at GTC 2026 on March 16. Built around the Rubin R100 GPU and a custom 88-core Vera CPU, the platform is the next architectural step after Blackwell [CNBC].
── What shipped ──
Rubin R100 GPU specs:
- TSMC 3nm process
- Dual-die design with two reticle-sized compute chiplets
- 336 billion transistors combined (1.6x Blackwell's 208B)
- 288GB HBM4 memory delivering 22 TB/s bandwidth (nearly 3x Blackwell's 8 TB/s on HBM3e)
Vera CPU: 88-core custom Arm-based design tightly coupled to Rubin GPUs.
Performance claims: 5x the inference performance of Blackwell at 10x lower cost per token [Nvidia newsroom].
── Why it matters ──
The performance-per-token claim is the most consequential. AI inference cost is the rate-limiting input for every commercial AI product — from API pricing to product margins. A genuine 10x cost-per-token improvement at the silicon layer would let frontier providers cut prices materially while preserving margin, or hold prices and absorb the saving.
The catch is that Nvidia's per-token claims are typically measured on best-case workloads (large batch sizes, optimised quantisation, specific model shapes). Real-world cost reductions for diverse production workloads will be smaller — historically 30–60% of the headline number. Even at the low end, that is a step change.
The HBM4 transition is the underrated piece. Inference on long-context models is increasingly memory-bandwidth-bound, not FLOP-bound. The 22 TB/s figure addresses the actual bottleneck for current-generation frontier models.
For AI infrastructure buyers (clouds, neoclouds, sovereign-AI programs), Rubin compresses the upgrade cycle. Blackwell deployments are still scaling out across hyperscalers; Rubin lands during that rollout. Expect 2026 to be a transition year with mixed Blackwell/Rubin clusters.
── Editor's take ──
Rubin is on time, on spec, and on the trajectory Nvidia laid out at last year's GTC. There is no architectural surprise here — the surprise would have been any slip. The competitive question is whether AMD's MI400 series and the various sovereign-chip efforts (Huawei Ascend, Tenstorrent, Trainium 2) close any of the gap. Independent benchmarks land in Q3.
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation



