Skip to main content
GonkaGate Docs

Model Fallbacks

Try backup models automatically in /v1/chat/completions.

Model fallbacks let a single /v1/chat/completions request carry an ordered list of candidate models. GonkaGate tries the first usable candidate, then moves to the next one only when the failure is safe to retry.

Use this when one exact model is preferred, but your application would rather receive a response from a backup model than fail on a temporary model or runtime problem.

How it works

Send either model, models, or both:

request.json
{
  "model": "moonshotai/kimi-k2.6",
  "models": ["minimaxai/minimax-m2.7"],
  "messages": [
    {
      "role": "user",
      "content": "Write a two sentence release note for a fallback model feature."
    }
  ]
}

Candidate order is:

  1. model, when present.
  2. Each entry in models, in order.
  3. Duplicate IDs are collapsed after normalization.

Internally, each upstream attempt still uses one model. The models array is request-level routing input; it is not forwarded as a multi-model payload.

Request shapes

Primary model with backups

Use this for most production traffic. The request is still readable, and the first model remains obvious.

Request Example
{
  "model": "moonshotai/kimi-k2.6",
  "models": ["minimaxai/minimax-m2.7"],
  "messages": [{ "role": "user", "content": "Summarize this incident." }]
}

Models-only fallback list

When model is omitted, the first models entry becomes the primary candidate.

Models-only fallback list
{
  "models": ["moonshotai/kimi-k2.6", "minimaxai/minimax-m2.7"],
  "messages": [{ "role": "user", "content": "Draft a short customer update." }]
}

Single-model request

Existing requests keep working. No fallback is configured unless models is present or a preset supplies model order.

Request Example
{
  "model": "moonshotai/kimi-k2.6",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Fallback behavior

GonkaGate can try the next candidate only before it has committed a client-visible response.

SituationFallback behavior
Non-streaming retry-eligible failureGonkaGate may try the next candidate and return the first successful response.
Streaming before the first chunkGonkaGate may still switch candidates.
Streaming after any visible chunkGonkaGate stays on the active model because the response has already started.
Invalid request, auth, quota, or inputGonkaGate returns the error instead of trying another model. Fix the request or account state.
Plugin or preset validation errorGonkaGate returns the configuration error. Fallback does not repair an invalid request-level setup.

Retry-eligible failures are temporary model or runtime failures, such as an unavailable model backend or a transient upstream/runtime error. They are not a general error recovery mechanism.

Pricing

Pricing follows the model that actually completes the request.

  • The response model and usage are attributed to the selected candidate.
  • Dashboard usage and billing should be read by the selected model, not just the first model value in the request.
  • If every candidate fails before a completion is produced, no later fallback model is billed as a successful generation.

Use Get Models or the live model catalog before rollout so each candidate ID exists for your key.

Using with plugins

Plugins are resolved once for the logical request, then the selected model attempt uses that prepared request.

web-search-with-fallbacks.json
{
  "model": "moonshotai/kimi-k2.6",
  "models": ["minimaxai/minimax-m2.7"],
  "plugins": [
    {
      "id": "web",
      "max_results": 5
    }
  ],
  "messages": [{ "role": "user", "content": "What changed in today's release notes?" }]
}

Keep these rules in mind:

  • web, response-healing, and privacy-sanitization apply to the whole request.
  • PDF Inputs checks model capability across the fallback candidates when native PDF forwarding is requested.
  • A plugin configuration error is not retried on the next model. Fix the plugin payload first.

Using with presets

Presets can either supply the model order or only supply shared defaults.

request-models-plus-preset.json
{
  "model": "moonshotai/kimi-k2.6",
  "models": ["minimaxai/minimax-m2.7"],
  "preset": "support-agent",
  "messages": [{ "role": "user", "content": "Reply to this support ticket." }]
}

With model plus models plus preset, the request owns the fallback order and the preset supplies supported defaults such as prompt, parameters, and reasoning.

To let the preset own model order too, use the preset as the model:

JSON Example
{
  "model": "@preset/support-agent",
  "messages": [{ "role": "user", "content": "Reply to this support ticket." }]
}

See Chat Completion Presets for merge rules, slug validation, and preset-managed model lists.

Using with the OpenAI SDK

The raw HTTP body is the source of truth. Some SDKs do not type models as a first-class field, so pass it as an extra body field when needed.

openai-sdk.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gonkagate.com/v1",
    api_key="gp-your-api-key",
)

completion = client.chat.completions.create(
    model="moonshotai/kimi-k2.6",
    messages=[
        {"role": "user", "content": "Write a two sentence release note."}
    ],
    extra_body={
        "models": [
            "minimaxai/minimax-m2.7",
        ]
    },
)

print(completion.choices[0].message.content)

For TypeScript, a plain fetch request is the smallest fully typed option:

model-fallbacks.ts
const response = await fetch("https://api.gonkagate.com/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.GONKAGATE_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "moonshotai/kimi-k2.6",
    models: ["minimaxai/minimax-m2.7"],
    messages: [{ role: "user", content: "Write a two sentence release note." }]
  })
});

if (!response.ok) {
  throw new Error(await response.text());
}

const completion = await response.json();
console.log(completion.choices[0]?.message?.content);

Limits and unsupported fields

  • models must be a non-empty array when present.
  • Each models item must be a non-empty string.
  • models accepts up to 64 entries.
  • Requests must include either model or models.
  • Use model IDs from GET /v1/models; do not guess IDs from display names.
  • GonkaGate does not support provider, route, allow_fallbacks, provider ordering, or provider filters in this contract.
  • This page covers direct /v1/chat/completions requests. Chat history routes have their own model | models behavior.

Troubleshooting

ProblemWhat to check
400 invalid_requestThe request is missing both model and models, or models is empty.
404 model_not_foundRefresh every candidate from Get Models.
Fallback never reaches a later modelThe first failure may be a validation, auth, quota, context, plugin, or preset error rather than a retryable one.
Streaming stops after output has begunOnce visible chunks are sent, GonkaGate cannot swap to another model for the same response.
Different price than the first model IDCheck which candidate returned the completion; billing follows the selected model.

See also

Was this page helpful?