DRY Sampling Guide for llama.cpp

What is DRY?

DRY is a sequence-aware sampler built into llama.cpp's default sampler chain. Unlike basic repetition penalties — which penalize individual tokens that have appeared before — DRY detects and penalizes extended repetitive sequences. It works by matching the current generation against prior context and applying an exponential penalty based on the length of the repeated sequence.

Short common phrases pass through normally, but the longer a repetition gets, the harder it's penalized. DRY also uses "sequence breakers" (newlines, colons, etc.) to reset detection at natural boundaries. This makes it much more effective at catching a loop as it forms, rather than bluntly penalizing individual token reuse.

The penalty formula is:

penalty = multiplier × base ^ (match_length - allowed_length)

Parameters

`--dry-multiplier N`

The main on/off switch and strength control. Default: 0.0 (disabled).

DRY is completely off by default. You must set this to a positive value to activate it. This value scales the entire penalty. Start around 0.8 and adjust from there.

`--dry-base N`

Controls how aggressively the penalty escalates as repeated sequences get longer. Default: 1.75.

At 1.75, some example penalty multipliers (before scaling by dry-multiplier):

Tokens over allowed length	Penalty factor
1	1.75×
2	3.06×
4	9.38×
6	28.7×

If long loops survive your current settings, increase this. If short natural repetitions are getting suppressed, lower it.

`--dry-allowed-length N`

The minimum sequence length below which repetitions are ignored entirely. Default: 2.

Common bigrams like "of the" or "in a" won't be penalized. If you're working with code or structured output where short repeated patterns are normal, raise this to 3 or 4.

`--dry-penalty-last-n N`

How far back in the context to scan for matching sequences. Default: -1 (entire context).

-1 = scan the full context (recommended for loop detection)
0 = disable
Positive integer = limit the lookback window

Full context scanning is usually what you want. Narrow this only if you have very long contexts and need better performance.

`--dry-sequence-breaker`

Strings that interrupt sequence matching. Defaults: \n, :, ", *.

A newline, for example, acts as a boundary — a phrase repeated across two paragraphs isn't treated the same as a phrase looping within one block. You can:

Add custom breakers by passing the flag multiple times
Pass "none" to disable all breakers

For code generation, consider adding ;, {, or } as breakers.

Sampler Ordering

This is critical and easy to miss. The order in which samplers are applied matters, especially for quantized models.

Daniel Han (Unsloth) found that with models like QwQ-32B, naively adding repetition penalties actually caused looping, and that changing the sampler order fixed it. His recommended order:

--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

Note that DRY comes after temperature here, not before. The default llama.cpp order places penalties first, which can interact badly with some quantized models.

Server API

When using the llama.cpp server (HTTP API), the same parameters are available as JSON fields in your request body:

{
  "dry_multiplier": 0.8,
  "dry_base": 1.75,
  "dry_allowed_length": 2,
  "dry_penalty_last_n": -1,
  "dry_sequence_breakers": ["\n", ":", "\"", "*"]
}

Recommended Starting Points

General text (quantized model prone to looping)

--dry-multiplier 0.8 --dry-base 1.75 --dry-allowed-length 2 --dry-penalty-last-n -1

If loops still occur, increase the multiplier to 1.0–1.5. If output becomes incoherent, back off the multiplier or raise the allowed length.

Reasoning models (QwQ-style)

--dry-multiplier 0.5 \
--repeat-penalty 1.1 \
--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

The lower multiplier combined with a mild classic repeat penalty and the reordered sampler chain was reported to work well for QwQ-32B.

Code generation

--dry-multiplier 0.8 --dry-allowed-length 3 \
--dry-sequence-breaker ";" --dry-sequence-breaker "{" --dry-sequence-breaker "}"

Raise allowed-length to tolerate short repeated patterns common in code (e.g., repeated variable declarations), and add language-specific breakers.

Limitations

DRY is effective at preventing verbatim sequence repetition within the current context. However, it cannot:

Detect semantic repetition (same idea, different words)
Catch patterns that emerge statistically across many independent generations (that's what the Antislop framework targets)
Break out of a loop already in progress — it prevents the loop from continuing, but if the model is already deep in a degenerate state, DRY may not be enough on its own

For reactive loop-breaking, consider combining DRY with monitoring/truncation logic or the Antislop backtracking sampler. See the Antislop Sampler repo and the Antislop paper (ICLR 2026) for complementary approaches.