Chat Completion Parameters
Tune /v1/chat/completions in GonkaGate: what to change first, what the common fields control, and when to use JSON output or tool calling.
Tune /v1/chat/completions one field at a time. Start with temperature for output style or max_tokens for response size. Move to response_format, tools, tool_choice, or token-level fields only when you need structured output, function calls, or debugging.
Minimum working example
export GONKAGATE_API_KEY="gp-your-api-key"
curl https://api.gonkagate.com/v1/chat/completions \
-H "Authorization: Bearer $GONKAGATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-235b-a22b-instruct-2507-fp8",
"messages": [
{
"role": "user",
"content": "Explain idempotency keys in 3 short bullets."
}
],
"temperature": 0.2,
"max_tokens": 180
}'Expected result: the request shape stays the same, but the answer should be steadier and less likely to run long.
temperaturemakes the answer steadier.max_tokenscaps response size.- Replace the example
modelwith a fresh ID fromGET /v1/modelsbefore rollout.
Pick the right parameter family
- More stable or more varied text: start with
temperatureortop_p. These change sampling behavior. - Shorter or bounded responses: start with
max_tokensorstop. These control response length and stop conditions. - Less repetition: start with
frequency_penaltyorpresence_penalty. These reduce repeated tokens or push the model toward new ones. - Machine-readable output: start with
response_format. This changes the response contract, not just the style. - Function execution from your app: start with
toolsandtool_choice. This lets the model request a named function. - Repeatability or token-level inspection: start with
seed,logprobs, ortop_logprobs. These help with debugging, evaluation, and reproducibility.
What the common fields change
Sampling and repetition
| Field | Type | What it changes | Notes |
|---|---|---|---|
temperature | number | Randomness and variety | 0-2 |
top_p | number | Nucleus sampling | 0-1 |
frequency_penalty | number | Penalizes tokens that repeat often | -2 to 2 |
presence_penalty | number | Penalizes tokens that already appeared | -2 to 2 |
stop | string or array | Ends generation on one or more stop sequences | Use when your app already knows the delimiter or boundary. |
Change one sampling family at a time. If you are already adjusting temperature, avoid piling on top_p, penalties, and stop in the same test unless you need them for one very specific contract.
Response length, repeatability, and debugging
| Field | Type | What it changes | Notes |
|---|---|---|---|
max_tokens | integer | Upper bound on generated tokens | Set this early if response size matters. Must be at least 1. |
seed | integer | Best-effort repeatability | Useful for tests and comparisons, but not every model guarantees identical output. |
logprobs | boolean | Returns token log probabilities | Useful for inspection and evaluation. |
top_logprobs | integer | Returns the top alternative tokens at each position | 0-20 and only useful with logprobs: true. |
Response format and tool calling
| Field | Type | What it changes | Notes |
|---|---|---|---|
response_format | object | Asks the model for JSON or another structured response format | Support depends on the selected model/provider pair. |
tools | array | Declares functions the model may call | Use the standard OpenAI-compatible tool shape. |
tool_choice | string or object | Controls whether the model may call a tool and which one | Use required or a specific function when tool use is mandatory. |
GonkaGate stays OpenAI-compatible for /v1/chat/completions, but capabilities such as structured outputs and tool calling still depend on the selected model/provider pair.
For the exact request-body schema, validation rules, response payloads, and GonkaGate-specific request fields, use Create a chat completion.
Common mistakes
- Changing sampling, penalties, output format, and tool settings in the same test, then not knowing which field changed the result.
- Treating
response_format,tools, ortool_choiceas cosmetic. They change how your application consumes the response. - Setting
top_logprobswithoutlogprobs: true. - Assuming structured outputs or tool calling behave the same across every model/provider pair.
- Leaving
max_tokensunset when response size matters operationally.
See also
- Create a chat completion for the full request schema, response payloads, and validation rules.
- Structured Outputs for JSON response workflows.
- Tool Calling for function execution loops.
- Streaming Responses for
stream: trueand SSE handling.