FerryAPI's LLM cost attribution gateway

FerryAPI's OpenAI-compatible gateway attributes LLM spend to tenant, feature, and model, enforcing budgets and routing traffic to cheaper providers [Dev.to][FerryAPI].

sources[Dev.to][FerryAPI]

FerryAPI released an OpenAI-compatible gateway that adds multi-provider cost attribution and budget enforcement for SaaS applications [Dev.to]. The gateway provides a single /v1 endpoint that proxies calls to OpenAI, Anthropic, Gemini, Bedrock, or any OpenAI-compatible host [FerryAPI]. It supports scoped API keys so each customer, app, or workflow can have an isolated token. Every request is logged with tenant_id, feature name, thread_id, model, input and output token counts, and raw cost. Built-in budget rules let operators set soft alerts at 50% of a quota, auto-switch to a cheaper provider at 80%, and hard-cap or fall back to a free route at 100% [Dev.to].

Key capabilities listed by FerryAPI include: OpenAI-compatible base URL for drop-in SDK integration, per-key prepaid balances and hard quotas, provider-aware routing, exportable logs that can be filtered by tenant, feature, or thread, and region and model availability visibility for pricing forecasts [FerryAPI].

The gateway's per-request metadata lets finance teams answer “Which tenant generated the largest AI bill this week?” without manual reconciliation. By capping spend at 80% and automatically routing to cheaper models, operators can prevent runaway agent loops that would otherwise erode margins [Dev.to]. The gateway can switch between providers on the fly, allowing SaaS products to exploit price differentials without rewriting client code. For example, using Gemini for high-throughput embeddings while reserving GPT-4o for high-quality generation [FerryAPI].

adjacent broadcasts

TX_520188·ai

operator_channel

[ comments_offline · provider_not_configured ]

transmission_log

Subscribe to the broadcast.

Daily digest of the day's most important tech news. No fluff. Engineering signal only.

// delivered via substack · double-opt-in confirmation

ADA launches MIT‑licensed open‑source AI data analyst

Claude Code runs on Bun after Rust rewrite

Qwen 3.8 launches with 7b and 14b models

Subscribe to the broadcast.