signal_tag · 2_broadcasts

#mixture-of-experts

// 2 transmissions tagged with #mixture-of-experts

Meta's Llama 4 family: 10M-token context, MoE architecture, fully open

Llama 4 ships with two open-weight models: Scout (17B active / 109B total, 10M context) and Maverick (400B parameters). MoE replaces dense transformer. Largest open context window on the market.

TX_045· 11:00AI

Mistral Large 3 ships as 41B-active sparse MoE under Apache 2.0

Mistral 3 family launched with three dense small models (3B, 8B, 14B) and Mistral Large 3 — a sparse MoE with 41B active and 675B total parameters. All under Apache 2.0. Large 3 hits #2 in OSS non-reasoning on LMArena.