
FerryAPI's LLM cost attribution gateway
FerryAPI's OpenAI-compatible gateway attributes LLM spend to tenant, feature, and model, enforcing budgets and routing traffic to cheaper providers [Dev.to][FerryAPI].
FerryAPI released an OpenAI-compatible gateway that adds multi-provider cost attribution and budget enforcement for SaaS applications [Dev.to]. The gateway provides a single /v1 endpoint that proxies calls to OpenAI, Anthropic, Gemini, Bedrock, or any OpenAI-compatible host [FerryAPI]. It supports scoped API keys so each customer, app, or workflow can have an isolated token. Every request is logged with tenant_id, feature name, thread_id, model, input and output token counts, and raw cost. Built-in budget rules let operators set soft alerts at 50% of a quota, auto-switch to a cheaper provider at 80%, and hard-cap or fall back to a free route at 100% [Dev.to].
Key capabilities listed by FerryAPI include: OpenAI-compatible base URL for drop-in SDK integration, per-key prepaid balances and hard quotas, provider-aware routing, exportable logs that can be filtered by tenant, feature, or thread, and region and model availability visibility for pricing forecasts [FerryAPI].
The gateway's per-request metadata lets finance teams answer “Which tenant generated the largest AI bill this week?” without manual reconciliation. By capping spend at 80% and automatically routing to cheaper models, operators can prevent runaway agent loops that would otherwise erode margins [Dev.to]. The gateway can switch between providers on the fly, allowing SaaS products to exploit price differentials without rewriting client code. For example, using Gemini for high-throughput embeddings while reserving GPT-4o for high-quality generation [FerryAPI].
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


