Skip to main content

LlamaIndex Setup

Route LlamaIndex OpenAI LLM calls through GonkaGate.

Route LlamaIndex OpenAI LLM calls through GonkaGate by setting api_base, a gp-... API key, and a current GonkaGate model ID. Keep the rest of your LlamaIndex pipeline in place. This guide covers the OpenAI LLM wrapper only. Keep embeddings and broader RAG wiring on your current provider or local model.

Configure Settings.llm

Use Settings.llm when the same GonkaGate-backed LLM should be reused across multiple query or index flows.

Configure Settings.llm
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(
    model="qwen/qwen3-235b-a22b-instruct-2507-fp8",
    api_key="gp-your-api-key",
    api_base="https://api.gonkagate.com/v1",
)

response = Settings.llm.complete("Return exactly: LlamaIndex connected")
print(response)

Expected result: Settings.llm.complete(...) returns LlamaIndex connected.

Use a fresh model ID from GET /v1/models before you send real traffic. If your code already instantiates OpenAI(...) locally, apply the same api_base, API key, and model ID there.

Pass the same LLM into your query flow

If your app already has an index, pass the configured LLM into as_query_engine() instead of rebuilding the pipeline.

Pass the same LLM into your query flow
query_engine = index.as_query_engine(llm=Settings.llm)
response = query_engine.query("Summarize this document in one sentence.")
print(response)

Change only the LLM connection values

  • Keep the LlamaIndex OpenAI wrapper in place.
  • Change only api_base, the API key, and the model ID.
  • Use Settings.llm when several flows should share the same LLM configuration.
  • Revisit retrieval or broader RAG design only after the base LLM connection works.

Common failures

Response or symptomWhat it usually meansWhat to do
401 invalid_api_keyThe API key is missing, invalid, or loaded from the wrong placeRecheck Authentication and API Keys
404 model_not_foundThe model ID is stale or unsupportedRefresh it from GET /v1/models
429 insufficient_quotaThe prepaid USD balance is too low for the requestTop up the balance, then retry after funds are available
429 rate_limit_exceededYou hit a runtime limitHonor Retry-After and add bounded backoff
LLM calls work but embeddings failThis guide covers only the OpenAI LLM wrapper pathKeep embeddings on your current provider or local model

See also

Was this page helpful?