WasmAgent adds trace-to-training compliance pipeline

WasmAgent’s new compliance engine records full agent runs, validates them against a typed schema, and exports the data for supervised fine‑tuning and direct preference optimization without human labeling.

sources[DevTo][GitHub]

WasmAgent released a compliance‑evaluation pipeline that captures full agent run traces and exports them as typed ComplianceEvalRecord objects for direct use in supervised fine‑tuning (SFT) and direct preference optimization (DPO) training, eliminating the need for human annotation [DevTo][GitHub].

What shipped

The library adds three execution modes—direct, prompt_retry, and full_pcl. In a benchmark with IFEval × Qwen2.5‑1.5B‑Q4 (3 seeds × 50 samples), prompt_retry achieved a 46.0 % pass rate with ±2.0 pp variance, while full_pcl raised the pass rate to 54.7 % and cut variance to ±1.2 pp, an 8.7 pp gain and a 40 % reduction in spread [DevTo].

full_pcl logs every repair attempt, turning a failed direct output followed by a patch and a successful regenerate into a DPO training pair: the regeneration serves as the positive example, the earlier failures as negatives. The compliance verifier supplies the reward signal, so no human‑generated preference data are required.

The repository ships with a test suite that reports 113 passing tests and zero failures (bun test packages/compliance/) and includes utilities such as RolloutForkRunner (four parallel forks) and RolloutRanker for deterministic selection of the best rollout [GitHub].

Why it matters

By turning every run into a validated record, WasmAgent removes the costly human‑in‑the‑loop step that dominates RLHF pipelines, enabling teams to generate training data at scale without hiring annotators. The full_pcl mode’s variance reduction translates to tighter service‑level guarantees for production agents. Capturing the entire repair loop provides a hierarchy of supervision signals that DPO can exploit, a depth of trace data unavailable in conventional RLHF setups.

Editor’s take

WasmAgent’s trace‑to‑training stack drives the marginal cost of each additional data point toward zero while improving signal quality through systematic variance reduction. Frameworks that lack comparable trace APIs risk losing developers who prioritize fully automated pipelines.

Reader poll

Which data pipeline will you base your next LLM fine‑tuning on?