// 1 transmissions tagged with #self-hosted-llm
Self-hosted Claude Code ran 15× slower because a rotating billing header broke caching in vllm‑mlx’s SimpleEngine; a shim and upstream patch restore caching and cut latency to 7‑8 seconds.