VibeThinker 3B model beats Opus 4.5 on reasoning benchmarks

The VibeThinker paper on arXiv introduces a 3‑billion‑parameter model that outperforms Opus 4.5 on reasoning tasks using a new SFT+GRPO fine‑tuning pipeline. The result shows smaller models can rival larger ones when trained with the right technique.

sources[arXiv Paper]

VibeThinker, a 3 billion‑parameter language model, was released in a recent arXiv preprint that details a novel fine‑tuning pipeline called SFT+GRPO [arXiv Paper]. The authors report that the model surpasses Opus 4.5 on a suite of reasoning benchmarks, despite the latter’s larger size [arXiv Paper].

What shipped

The VibeThinker release includes the full model checkpoint, training scripts, and evaluation results. SFT+GRPO blends supervised fine‑tuning with gradient‑regularized policy optimization, allowing the model to adapt more effectively to task‑specific data. The paper shows state‑of‑the‑art scores on multiple reasoning datasets, confirming the method’s efficacy.

Why it matters

The achievement demonstrates that parameter count is not the sole driver of performance; targeted fine‑tuning can close the gap between small and large models, cutting compute costs for deployment. Because SFT+GRPO is a generic training recipe, it can be applied to other architectures, potentially raising the baseline for many downstream tasks. Finally, the work underscores the value of continued research into fine‑tuning strategies, which may yield outsized gains without scaling model size.

adjacent broadcasts

TX_194589·ai

operator_channel

[ comments_offline · provider_not_configured ]

transmission_log

Subscribe to the broadcast.

Daily digest of the day's most important tech news. No fluff. Engineering signal only.

// delivered via substack · double-opt-in confirmation

AI code assistants can leak credentials

GLM-5.2 runs on RTX 4090 or 32GB CPU

Claude's extended thinking mode inserts fabricated paragraphs

Subscribe to the broadcast.