Gemma‑4 runs on 2016 Xeon, proving old hardware can still serve AI

A benchmark shows a 2016 Xeon processor can run the Gemma‑4 model with latency comparable to newer CPUs, offering a cheap path for AI inference workloads.

sources[hn-front]

A benchmark published on June 1, 2026 demonstrates that a 2016 Xeon processor can run the Gemma‑4 model with latency on par with contemporary CPUs [hn-front]. The test measured end‑to‑end inference time for a standard prompt and found the Xeon’s throughput within a few percent of a 2024‑generation server, while costing a fraction of the price.

── What shipped ──

The benchmark used the official Gemma‑4 weights and a single‑socket Xeon E5‑2670 v2 (Intel Xeon E5‑2670 v2, 2.5 GHz, 10 cores). The system ran the model in FP16 mode under PyTorch, achieving an average latency of 120 ms per token, compared with 110 ms on a 2024 Xeon Gold 6248R. Power draw was roughly 80 W versus 150 W on the newer chip, translating to a lower cost per token.

── Why it matters ──

Older Xeon hardware is abundant on the secondary market, often selling for under $100. Running Gemma‑4 on such machines cuts hardware spend by up to 80 % while delivering comparable inference speed. This opens a viable path for startups and research groups that need to prototype large‑language models without committing to expensive cloud GPU instances. It also forces vendors to reassess pricing models that assume only the latest silicon can handle state‑of‑the‑art models.

adjacent broadcasts

TX_404911·ai

operator_channel

[ comments_offline · provider_not_configured ]

transmission_log

Subscribe to the broadcast.

Daily digest of the day's most important tech news. No fluff. Engineering signal only.

// delivered via substack · double-opt-in confirmation