Model Fallbacks
Try backup models automatically in /v1/chat/completions.
Model fallbacks let a single /v1/chat/completions request carry an ordered list of candidate models.
GonkaGate tries the first usable candidate, then moves to the next one only when the failure is safe to retry.
Use this when one exact model is preferred, but your application would rather receive a response from a backup model than fail on a temporary model or runtime problem.
How it works
Send either model, models, or both:
{
"model": "moonshotai/kimi-k2.6",
"models": ["minimaxai/minimax-m2.7"],
"messages": [
{
"role": "user",
"content": "Write a two sentence release note for a fallback model feature."
}
]
}Candidate order is:
model, when present.- Each entry in
models, in order. - Duplicate IDs are collapsed after normalization.
Internally, each upstream attempt still uses one model. The models array is request-level routing input; it is not forwarded as a multi-model payload.
Request shapes
Primary model with backups
Use this for most production traffic. The request is still readable, and the first model remains obvious.
{
"model": "moonshotai/kimi-k2.6",
"models": ["minimaxai/minimax-m2.7"],
"messages": [{ "role": "user", "content": "Summarize this incident." }]
}Models-only fallback list
When model is omitted, the first models entry becomes the primary candidate.
{
"models": ["moonshotai/kimi-k2.6", "minimaxai/minimax-m2.7"],
"messages": [{ "role": "user", "content": "Draft a short customer update." }]
}Single-model request
Existing requests keep working. No fallback is configured unless models is present or a preset supplies model order.
{
"model": "moonshotai/kimi-k2.6",
"messages": [{ "role": "user", "content": "Hello!" }]
}Fallback behavior
GonkaGate can try the next candidate only before it has committed a client-visible response.
| Situation | Fallback behavior |
|---|---|
| Non-streaming retry-eligible failure | GonkaGate may try the next candidate and return the first successful response. |
| Streaming before the first chunk | GonkaGate may still switch candidates. |
| Streaming after any visible chunk | GonkaGate stays on the active model because the response has already started. |
| Invalid request, auth, quota, or input | GonkaGate returns the error instead of trying another model. Fix the request or account state. |
| Plugin or preset validation error | GonkaGate returns the configuration error. Fallback does not repair an invalid request-level setup. |
Retry-eligible failures are temporary model or runtime failures, such as an unavailable model backend or a transient upstream/runtime error. They are not a general error recovery mechanism.
Pricing
Pricing follows the model that actually completes the request.
- The response model and usage are attributed to the selected candidate.
- Dashboard usage and billing should be read by the selected model, not just the first
modelvalue in the request. - If every candidate fails before a completion is produced, no later fallback model is billed as a successful generation.
Use Get Models or the live model catalog before rollout so each candidate ID exists for your key.
Using with plugins
Plugins are resolved once for the logical request, then the selected model attempt uses that prepared request.
{
"model": "moonshotai/kimi-k2.6",
"models": ["minimaxai/minimax-m2.7"],
"plugins": [
{
"id": "web",
"max_results": 5
}
],
"messages": [{ "role": "user", "content": "What changed in today's release notes?" }]
}Keep these rules in mind:
web,response-healing, andprivacy-sanitizationapply to the whole request.- PDF Inputs checks model capability across the fallback candidates when native PDF forwarding is requested.
- A plugin configuration error is not retried on the next model. Fix the plugin payload first.
Using with presets
Presets can either supply the model order or only supply shared defaults.
{
"model": "moonshotai/kimi-k2.6",
"models": ["minimaxai/minimax-m2.7"],
"preset": "support-agent",
"messages": [{ "role": "user", "content": "Reply to this support ticket." }]
}With model plus models plus preset, the request owns the fallback order and the preset supplies supported defaults such as prompt, parameters, and reasoning.
To let the preset own model order too, use the preset as the model:
{
"model": "@preset/support-agent",
"messages": [{ "role": "user", "content": "Reply to this support ticket." }]
}See Chat Completion Presets for merge rules, slug validation, and preset-managed model lists.
Using with the OpenAI SDK
The raw HTTP body is the source of truth. Some SDKs do not type models as a first-class field, so pass it as an extra body field when needed.
from openai import OpenAI
client = OpenAI(
base_url="https://api.gonkagate.com/v1",
api_key="gp-your-api-key",
)
completion = client.chat.completions.create(
model="moonshotai/kimi-k2.6",
messages=[
{"role": "user", "content": "Write a two sentence release note."}
],
extra_body={
"models": [
"minimaxai/minimax-m2.7",
]
},
)
print(completion.choices[0].message.content)For TypeScript, a plain fetch request is the smallest fully typed option:
const response = await fetch("https://api.gonkagate.com/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.GONKAGATE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "moonshotai/kimi-k2.6",
models: ["minimaxai/minimax-m2.7"],
messages: [{ role: "user", content: "Write a two sentence release note." }]
})
});
if (!response.ok) {
throw new Error(await response.text());
}
const completion = await response.json();
console.log(completion.choices[0]?.message?.content);Limits and unsupported fields
modelsmust be a non-empty array when present.- Each
modelsitem must be a non-empty string. modelsaccepts up to 64 entries.- Requests must include either
modelormodels. - Use model IDs from
GET /v1/models; do not guess IDs from display names. - GonkaGate does not support
provider,route,allow_fallbacks, provider ordering, or provider filters in this contract. - This page covers direct
/v1/chat/completionsrequests. Chat history routes have their ownmodel | modelsbehavior.
Troubleshooting
| Problem | What to check |
|---|---|
400 invalid_request | The request is missing both model and models, or models is empty. |
404 model_not_found | Refresh every candidate from Get Models. |
| Fallback never reaches a later model | The first failure may be a validation, auth, quota, context, plugin, or preset error rather than a retryable one. |
| Streaming stops after output has begun | Once visible chunks are sent, GonkaGate cannot swap to another model for the same response. |
| Different price than the first model ID | Check which candidate returned the completion; billing follows the selected model. |
See also
- Chat Completions API reference for the exact request schema.
- Model Selection Guide for choosing current model IDs.
- Chat Completion Presets for saved model order and shared defaults.
- Choose a Plugin for request-level runtime extensions.