Skip to main content

Chat Completion Parameters

Tune /v1/chat/completions in GonkaGate: what to change first, what the common fields control, and when to use JSON output or tool calling.

Tune /v1/chat/completions one field at a time. Start with temperature for output style or max_tokens for response size. Move to response_format, tools, tool_choice, or token-level fields only when you need structured output, function calls, or debugging.

Minimum working example

Response Example
export GONKAGATE_API_KEY="gp-your-api-key"

curl https://api.gonkagate.com/v1/chat/completions \
  -H "Authorization: Bearer $GONKAGATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b-instruct-2507-fp8",
    "messages": [
      {
        "role": "user",
        "content": "Explain idempotency keys in 3 short bullets."
      }
    ],
    "temperature": 0.2,
    "max_tokens": 180
  }'

Expected result: the request shape stays the same, but the answer should be steadier and less likely to run long.

  • temperature makes the answer steadier.
  • max_tokens caps response size.
  • Replace the example model with a fresh ID from GET /v1/models before rollout.

Pick the right parameter family

  • More stable or more varied text: start with temperature or top_p. These change sampling behavior.
  • Shorter or bounded responses: start with max_tokens or stop. These control response length and stop conditions.
  • Less repetition: start with frequency_penalty or presence_penalty. These reduce repeated tokens or push the model toward new ones.
  • Machine-readable output: start with response_format. This changes the response contract, not just the style.
  • Function execution from your app: start with tools and tool_choice. This lets the model request a named function.
  • Repeatability or token-level inspection: start with seed, logprobs, or top_logprobs. These help with debugging, evaluation, and reproducibility.

What the common fields change

Sampling and repetition

FieldTypeWhat it changesNotes
temperaturenumberRandomness and variety0-2
top_pnumberNucleus sampling0-1
frequency_penaltynumberPenalizes tokens that repeat often-2 to 2
presence_penaltynumberPenalizes tokens that already appeared-2 to 2
stopstring or arrayEnds generation on one or more stop sequencesUse when your app already knows the delimiter or boundary.

Change one sampling family at a time. If you are already adjusting temperature, avoid piling on top_p, penalties, and stop in the same test unless you need them for one very specific contract.

Response length, repeatability, and debugging

FieldTypeWhat it changesNotes
max_tokensintegerUpper bound on generated tokensSet this early if response size matters. Must be at least 1.
seedintegerBest-effort repeatabilityUseful for tests and comparisons, but not every model guarantees identical output.
logprobsbooleanReturns token log probabilitiesUseful for inspection and evaluation.
top_logprobsintegerReturns the top alternative tokens at each position0-20 and only useful with logprobs: true.

Response format and tool calling

FieldTypeWhat it changesNotes
response_formatobjectAsks the model for JSON or another structured response formatSupport depends on the selected model/provider pair.
toolsarrayDeclares functions the model may callUse the standard OpenAI-compatible tool shape.
tool_choicestring or objectControls whether the model may call a tool and which oneUse required or a specific function when tool use is mandatory.

GonkaGate stays OpenAI-compatible for /v1/chat/completions, but capabilities such as structured outputs and tool calling still depend on the selected model/provider pair.

For the exact request-body schema, validation rules, response payloads, and GonkaGate-specific request fields, use Create a chat completion.

Common mistakes

  • Changing sampling, penalties, output format, and tool settings in the same test, then not knowing which field changed the result.
  • Treating response_format, tools, or tool_choice as cosmetic. They change how your application consumes the response.
  • Setting top_logprobs without logprobs: true.
  • Assuming structured outputs or tool calling behave the same across every model/provider pair.
  • Leaving max_tokens unset when response size matters operationally.

See also

Was this page helpful?