TX_797693· AI
Self-hosted Claude Code speedup: caching fix eliminates 15× slowdown
Self-hosted Claude Code ran 15× slower because a rotating billing header broke caching in vllm‑mlx’s SimpleEngine; a shim and upstream patch restore caching and cut latency to 7‑8 seconds.