Meta's Llama 4 family: 10M-token context, MoE architecture, fully open
TX_011AI

Meta's Llama 4 family: 10M-token context, MoE architecture, fully open

Llama 4 ships with two open-weight models: Scout (17B active / 109B total, 10M context) and Maverick (400B parameters). MoE replaces dense transformer. Largest open context window on the market.

Meta released the Llama 4 family on April 5, marking the line's transition from dense transformer to mixture-of-experts. Two open-weight models shipped; one larger model is still in training [Meta AI blog].

── What shipped ──

Llama 4 Scout is 17B active parameters across 109B total, with 16 experts and a 10M-token context window — the largest of any open-weight model available [Llama site].

Llama 4 Maverick is 400B parameters, also MoE. It matches GPT-4o on MMLU (87.2 vs 87.0) and beats it on multilingual benchmarks.

Llama 4 Behemoth, the flagship, is still training. No public release window.

Both shipping models are downloadable from llama.com and Hugging Face with deployment partnerships across cloud and data platforms.

── Why it matters ──

Two stories. First, MoE is now the default architecture across every frontier lab — Mistral, DeepSeek, Anthropic, OpenAI, and now Meta. Dense transformers at this scale appear to be over.

Second, the 10M context window on Scout is a serious capability disclosure for an open-weight model. Until now, that kind of context has been gated behind frontier-API pricing. Self-hosted Scout makes it free, modulo the inference hardware to run a 109B-parameter model.

For shipping engineers, the practical effect is that long-document workloads — full-codebase ingestion, multi-document RAG, agent contexts that don't need aggressive summarisation — are now possible on owned infrastructure if you have the GPUs.

── Editor's take ──

The signal is that Meta is still committed to genuinely open weights at the frontier. With xAI absorbed into SpaceX and Anthropic on a $200B Google compute hook, "open frontier" has fewer credible flag-bearers than it did a year ago. Meta keeping Llama open-weight is, for now, the load-bearing pillar of the open ecosystem.

adjacent broadcasts
operator_channel
[ comments_offline · provider_not_configured ]
transmission_log

// newsletter_offline · provider_not_configured