A smart LLM proxy that routes each request to the best or cheapest model, fails over automatically when a provider rate-limits, and tracks every dollar โ behind a single OpenAI-compatible API.
cheapest ยท highest-quality ยท best-value (quality per dollar). You choose per request.
Provider down or rate-limited? It falls to the next candidate and shows you the full path.
Per-request cost, live per-model totals, and a hard spend cap that returns 402 when hit.
OpenAI-compatible /v1/chat/completions. Add OpenAI, Anthropic & Google keys โ or none, and use mock mode.
curl http://localhost:8787/api/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"user","content":"hello"}],"policy":"cheapest"}'
The response includes an x_router block with the chosen model, provider, cost, latency, and the full failover path.