
Xiaomi launches Mimo v2.5 Pro Ultraspeed with 1 trillion parameters and 1,000 tps
Xiaomi’s new Mimo v2.5 Pro Ultraspeed model packs 1 trillion parameters and sustains 1,000 tokens per second, a 50 % parameter jump and 200 % throughput increase over its predecessor.
Xiaomi announced Mimo v2.5 Pro Ultraspeed, a large‑language model with 1 trillion parameters and a throughput of 1,000 tokens per second — the highest reported for a consumer‑grade model [Xiaomi Blog]. The model represents a 50 % increase in parameter count and a 200 % jump in throughput over Xiaomi’s previous Mimo release [Xiaomi Blog].
The architecture combines a deeper transformer stack with optimized attention kernels that exploit the latest GPU memory bandwidth, allowing the model to sustain the 1 k tps rate on standard inference hardware. Training data were expanded to include multilingual corpora spanning 120 languages, improving cross‑lingual capabilities.
The boost matters for three reasons. First, the 1 k tps throughput makes real‑time applications—such as interactive chatbots and live translation—practical without resorting to model parallelism. Second, the trillion‑parameter scale narrows the gap with leading models from Google and Microsoft, giving Xiaomi a foothold in enterprise AI deployments. Third, the efficiency gains reduce inference cost per token, lowering the barrier for startups that need high‑volume language processing.
Xiaomi positions the Mimo v2.5 Pro Ultraspeed as a direct competitor to Google’s PaLM and Microsoft’s Turing‑NLG, signaling its intent to capture a share of the fast‑growing high‑throughput AI market [Xiaomi Blog].
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


