Sampler parameters¶
Pluma ships several built-in sampler profiles you can swap between mid-chat (Settings → Sampler). Each profile carries a complete set of generation parameters; the values below are the defaults the built-in profiles use as starting points.
Built-in profiles¶
| Profile | Best for |
|---|---|
| Default | All-round; sensible mids. |
| Creative | Higher temperature + top-p; longer, looser outputs. |
| Precise | Lower temperature, repetition penalty bumped; deterministic answers. |
| Qwen Instruct | Tuned for Qwen 2.5 / 3 series. |
| Llama 3 Instruct | Tuned for Llama-3 family. |
| Mistral Instruct | Tuned for Mistral / Mixtral family. |
Sampler family auto-pick: when you change the active model, Pluma tries to match the model id against a known family pattern (qwen2.5-* → Qwen Instruct, llama-3* → Llama 3 Instruct, etc.) and swaps the sampler unless you've pinned one manually.
Parameters¶
| Parameter | Range | Built-in default | Effect |
|---|---|---|---|
temperature |
0.0 – 2.0 | 0.7 | Output randomness. Lower = more deterministic, higher = looser. |
top_p |
0.0 – 1.0 | 0.95 | Nucleus sampling cutoff. Lower trims the long tail of unlikely tokens. |
top_k |
0+ | 40 | Hard cap on the number of candidate tokens per step. 0 disables. |
min_p |
0.0 – 1.0 | 0.0 | Token probability floor relative to the most likely token. |
typical_p |
0.0 – 1.0 | 1.0 | Locally typical sampling. 1.0 disables. |
repetition_penalty |
1.0 – 2.0 | 1.0 | Penalises tokens that already appeared. >1.0 discourages loops. |
frequency_penalty |
-2.0 – 2.0 | 0.0 | OpenAI-style frequency penalty. |
presence_penalty |
-2.0 – 2.0 | 0.0 | OpenAI-style presence penalty. |
max_tokens |
int | unset (provider default) | Hard cap on generation length. |
seed |
int | unset (random) | Pin for reproducibility. |
context_size |
int | unset (provider default) | Token budget for the prompt; assembled messages get trimmed-from-the-front to fit. |
SillyTavern preset import¶
If you've got a SillyTavern preset JSON, drop it into the sampler editor. Pluma parses prompts[] + prompt_order[] and uses them for system-message assembly, including the marker substitution from the active card ({{charDescription}}, {{scenario}}, {{dialogueExamples}}, {{personaDescription}}, etc.).
Extra fields¶
Anything in the sampler profile's extra object passes through to the upstream untouched. Use this for backend-specific knobs (mlx_lm's num_steps, llama.cpp's mirostat, etc.) without polluting the well-known fields.