Rate Limit Handling
How to handle GonkaGate 429 responses, throttling, and retry limits safely.
Read error.code before you retry a 429.
In GonkaGate, insufficient_quota means the prepaid USD balance is too low for this request. rate_limit_exceeded and transfer_agent_capacity_reached are usually temporary. Keep that branch logic in one shared helper so every caller follows the same retry policy.
Decide whether to retry
| If you get | What it usually means | What to do |
|---|---|---|
429 + insufficient_quota | The prepaid USD balance is too low for this request | Stop retrying. Surface balance or top-up state, then retry only after funds are available. |
429 + rate_limit_exceeded | Your traffic hit a request limit | Honor Retry-After when present and retry with a small backoff budget. |
429 + transfer_agent_capacity_reached | Temporary capacity pressure | Wait, retry carefully, and keep the retry budget small. |
429 without a known code | This is usually still temporary throttling or capacity pressure | Treat it as temporary throttling first, but log the full response details if it repeats. |
Use one shared retry helper
const sleep = (ms: number) =>
new Promise((resolve) => setTimeout(resolve, ms));
export async function requestWithRateLimitHandling(
makeRequest: () => Promise<Response>,
maxRetries = 3,
): Promise<Response> {
for (let attempt = 0; attempt <= maxRetries; attempt += 1) {
const response = await makeRequest();
if (response.status !== 429) {
return response;
}
const body = await response.clone().json().catch(() => null);
const errorCode = body?.error?.code;
if (errorCode === "insufficient_quota") {
throw new Error("insufficient_quota");
}
if (attempt === maxRetries) {
throw new Error("retry_budget_exhausted");
}
const retryAfterSeconds = Number(
response.headers.get("Retry-After") ?? "0",
);
const waitMs =
retryAfterSeconds > 0
? retryAfterSeconds * 1000
: Math.min(1000 * 2 ** attempt, 8000);
await sleep(waitMs);
}
throw new Error("retry_budget_exhausted");
}This baseline does four things: branch on error.code, stop on insufficient_quota, honor Retry-After, and cap retries.
Read these fields first
- HTTP status
429 error.codein the JSON bodyRetry-Afterwhen the server tells you exactly how long to waitx-ratelimit-*headers when your client exposes them, so you can log the current limit, remaining allowance, and reset windowx-request-idfor repeated failures or support escalation
Treat error.message as human-readable context only. Do not build retry logic from the message text.
Common mistakes
- Treating every
429as retryable throttling. In GonkaGate,insufficient_quotais a billing state, not a backoff case. - Ignoring
Retry-Afterwhen it is present. That usually creates synchronized retries and more throttling. - Hiding
insufficient_quotabehind automatic retries. Stop and show a billing or top-up state instead. - Letting workers, cron jobs, or batch traffic retry forever. Keep the retry budget small and make interactive traffic the priority.
See also
- GonkaGate API Error Handling for the same retry-or-stop logic across
401,403,5xx, and other non-429failures. - Pricing for prepaid USD balance rules behind
insufficient_quota. - Create a chat completion for the exact
POST /v1/chat/completionsrequest and response contract.
Was this page helpful?