OpenAI-compatible ยท self-hostable

One endpoint.
Every model.

A smart LLM proxy that routes each request to the best or cheapest model, fails over automatically when a provider rate-limits, and tracks every dollar โ€” behind a single OpenAI-compatible API.

๐Ÿ”€ Policy routing

cheapest ยท highest-quality ยท best-value (quality per dollar). You choose per request.

๐Ÿ›Ÿ Auto-failover

Provider down or rate-limited? It falls to the next candidate and shows you the full path.

๐Ÿ’ธ Cost & budget

Per-request cost, live per-model totals, and a hard spend cap that returns 402 when hit.

๐Ÿ”Œ Drop-in

OpenAI-compatible /v1/chat/completions. Add OpenAI, Anthropic & Google keys โ€” or none, and use mock mode.

Use it

curl http://localhost:8787/api/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"hello"}],"policy":"cheapest"}'

The response includes an x_router block with the chosen model, provider, cost, latency, and the full failover path.