Rate Limit Handling
Best practices for handling 429s and throttling without disrupting users.
API Reference
Latest limits and headers live in the API reference: Rate limits section.
Handling Rate Limits
Implement retry logic with exponential backoff to handle rate limits gracefully:
Use Exponential Backoff
Start with a small delay (1s), then double it on each retry. Check the Retry-After header for the recommended wait time.
retry-with-backoff.ts
async function retryWithBackoff<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error: any) {
if (error.status === 429 && i < maxRetries - 1) {
// Get retry delay from header or use exponential backoff
const retryAfter = error.headers?.get("Retry-After");
const delay = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, i) * 1000;
console.log(`Rate limited. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
throw new Error("Max retries exceeded");
}
// Usage
const response = await retryWithBackoff(() =>
client.chat.completions.create({
model: "qwen/qwen3-235b-a22b-instruct-2507-fp8",
messages: [{ role: "user", content: "Hello!" }]
})
);Best Practices
Follow these practices to avoid hitting rate limits:
- Implement client-side throttling — Add delays between requests to stay under the limit.
- Batch requests where possible — Combine multiple operations into fewer API calls.
- Cache responses — Store and reuse responses for identical requests.