signal_tag · 1_broadcasts

#sft

// 1 transmissions tagged with #sft

VibeThinker 3B model beats Opus 4.5 on reasoning benchmarks

The VibeThinker paper on arXiv introduces a 3‑billion‑parameter model that outperforms Opus 4.5 on reasoning tasks using a new SFT+GRPO fine‑tuning pipeline. The result shows smaller models can rival larger ones when trained with the right technique.