OpenAI's first custom inference chip built by Broadcom

OpenAI announced a custom AI inference chip manufactured by Broadcom, marking the company’s entry into bespoke silicon for its models. The partnership aims to boost inference efficiency while keeping costs in check.

sources[TechCrunch]

OpenAI unveiled a custom chip built by Broadcom that is purpose‑designed for AI inference workloads, according to TechCrunch. The announcement signals OpenAI’s shift from relying solely on off‑the‑shelf GPUs to developing silicon that matches the performance profile of its large language models.

── What shipped ──

Details on the chip’s architecture remain sparse, but the hardware is intended to accelerate the forward pass of transformer models during serving. By tapping Broadcom’s mature semiconductor fabs, OpenAI sidesteps the long lead times associated with building a chip from scratch while still gaining control over key design parameters such as memory bandwidth and power envelope. The move mirrors earlier efforts by Google’s TPU program, which proved that custom silicon can deliver order‑of‑magnitude gains in inference latency and cost per token.

── Implications ──

The partnership gives OpenAI immediate access to Broadcom’s manufacturing expertise, allowing the company to iterate on silicon without the overhead of a full‑scale fab operation. For the AI hardware market, the announcement adds another heavyweight to a field already populated by NVIDIA’s GPUs, Google’s TPUs, and emerging ASICs from startups. As more AI firms adopt bespoke chips, the competitive landscape could fragment, driving rapid innovation in both hardware design and software optimization. OpenAI’s decision also underscores a broader industry trend: as model sizes grow, the economics of inference become a decisive factor, prompting firms to seek tighter integration between model architecture and the underlying compute substrate.

The chip’s rollout is slated for internal use later this year, with OpenAI planning to evaluate performance gains before considering broader deployment. [TechCrunch]

adjacent broadcasts