OpenAI-compatible · self-hostable

One endpoint.
Every model.

A smart LLM proxy that routes each request to the best or cheapest model, fails over automatically when a provider rate-limits, and tracks every dollar — behind a single OpenAI-compatible API.

Run it in 30 seconds View source

🔀 Policy routing

cheapest · highest-quality · best-value (quality per dollar). You choose per request.

🛟 Auto-failover

Provider down or rate-limited? It falls to the next candidate and shows you the full path.

💸 Cost & budget

Per-request cost, live per-model totals, and a hard spend cap that returns 402 when hit.

🔌 Drop-in

OpenAI-compatible /v1/chat/completions. Add OpenAI, Anthropic & Google keys — or none, and use mock mode.

Use it

curl http://localhost:8787/api/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"hello"}],"policy":"cheapest"}'