
VibeThinker 3B model beats Opus 4.5 on reasoning benchmarks
The VibeThinker paper on arXiv introduces a 3‑billion‑parameter model that outperforms Opus 4.5 on reasoning tasks using a new SFT+GRPO fine‑tuning pipeline. The result shows smaller models can rival larger ones when trained with the right technique.
VibeThinker, a 3 billion‑parameter language model, was released in a recent arXiv preprint that details a novel fine‑tuning pipeline called SFT+GRPO [arXiv Paper]. The authors report that the model surpasses Opus 4.5 on a suite of reasoning benchmarks, despite the latter’s larger size [arXiv Paper].
What shipped
The VibeThinker release includes the full model checkpoint, training scripts, and evaluation results. SFT+GRPO blends supervised fine‑tuning with gradient‑regularized policy optimization, allowing the model to adapt more effectively to task‑specific data. The paper shows state‑of‑the‑art scores on multiple reasoning datasets, confirming the method’s efficacy.
Why it matters
The achievement demonstrates that parameter count is not the sole driver of performance; targeted fine‑tuning can close the gap between small and large models, cutting compute costs for deployment. Because SFT+GRPO is a generic training recipe, it can be applied to other architectures, potentially raising the baseline for many downstream tasks. Finally, the work underscores the value of continued research into fine‑tuning strategies, which may yield outsized gains without scaling model size.
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


