Best Practices

Optimize your API usage for cost, performance, and reliability.

Cost Optimization

Keep your API costs under control with these strategies:

Real-time monitoring: Track costs via usage.total_cost_usd in the API response. Every response includes the exact request cost.

Low balance warnings: Set up alerts when balance drops below a threshold (e.g., $10) to avoid service interruption.

cost-tracking.ts

// Track costs after each request
let runningBalance = parseFloat(initialBalance);

async function sendMessage(content: string) {
  const response = await client.chat.completions.create({
    model: "qwen/qwen3-235b-a22b-instruct-2507-fp8",
    messages: [{ role: "user", content }],
  });

  const usage = response.usage as any;
  runningBalance -= usage.total_cost_usd;

  if (runningBalance < 10) {
    showLowBalanceWarning();
  }

  return response;
}

Performance

Optimize response times and throughput:

Use streaming for long responses. Streaming reduces perceived latency from 5-10 seconds to instant feedback.

streaming.ts

const stream = await client.chat.completions.create({
  model: "qwen/qwen3-235b-a22b-instruct-2507-fp8",
  messages: [{ role: "user", content: "Explain quantum computing" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Cache repeated queries application-side. Identical prompts yield identical responses at temperature: 0.

Tip

Batch related requests: send multiple messages in a single conversation instead of separate API calls.

Prompting

Keep prompts structured and explicit to get predictable outputs.

Lead with context State role, goal, and constraints before the request.
Specify output format Ask for the exact structure you want: bullets, JSON, or a table.
Add a short example A single example anchors tone and level of detail.

Want to try these ideas live? Open Chat Playground

Reliability

Build robust applications that handle failures gracefully:

Implement retry with exponential backoff. On 429 (rate limit) or 503 (network) errors, retry with increasing delay.

retry.ts

async function requestWithRetry(fn: () => Promise<any>, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      const status = error?.status;
      if (status === 429 || status === 503) {
        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        await new Promise(r => setTimeout(r, delay));
        continue;
      }
      throw error;
    }
  }
  throw new Error("Max retries exceeded");
}

Set reasonable timeouts. For chat completions we recommend 30-60 seconds, for embeddings — 10-15 seconds.

Security

Keep your API keys and data secure:

Never expose API keys in frontend code or public repositories
Store keys in environment variables, not in code
Rotate keys regularly, especially if you suspect a leak
Use separate keys for development and production

Warning

API key is shown only once when created. If you lose it — create a new key and delete the old one.

Error Handling

Handle errors gracefully for a better user experience:

Check HTTP status before parsing the response. 4xx codes indicate client errors, 5xx — temporary service unavailability.

error_handling.py

from openai import OpenAI, APIError

client = OpenAI(base_url="https://api.gonkagate.com/v1", api_key="gp-...")

try:
    response = client.chat.completions.create(
        model="qwen/qwen3-235b-a22b-instruct-2507-fp8",
        messages=[{"role": "user", "content": "Hello"}]
    )
except APIError as e:
    if e.status_code == 401:
        print("Invalid API key")
    elif e.status_code == 402:
        print("Insufficient balance")
    elif e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 503:
        print("Network temporarily unavailable")
    else:
        print(f"Unexpected error: {e}")