x264 Adaptive Quantization (AQ mode)

In x264, the --aq-mode option controls macroblock-level adaptive quantization. In mode 0 (disabled), all blocks use the frame’s base QP. In mode 1 (variance AQ) and mode 2 (auto-variance AQ), x264 measures each block’s activity (roughly its AC variance) to adjust its QP: complex blocks get higher QP (fewer bits) while flat/dark blocks get lower QP (more bits). The goal is to maintain overall bitrate while improving perceptual quality (less banding in flat areas). x264’s code was tuned so that AQ modes use roughly the same total bits as no-AQ (see comment in code). Below we detail each mode’s formula, code logic, and effects.

Mode 0: Disabled

AQ mode 0 turns off adaptive quantization. In this case x264 simply sets all MB-level offsets to zero. In code, if i_aq_mode==X264_AQ_NONE or strength=0, x264 does:

memset(frame->f_qp_offset,   0, mb_count*sizeof(float));
memset(frame->f_qp_offset_aq,0, mb_count*sizeof(float));

, so no per-block QP adjustment. (This also implies x264 uses only frame-level QP, and it internally raises the Q-comp parameter to compensate for no AQ.) In practice AQ=0 is not recommended except for testing, since it forfeits the quality gains of AQ.

Mode 1: Variance AQ (Default)

Mode 1 (x264’s “variance AQ”) computes each block’s activity $E$ (sum of squared AC pixels via x264_ac_energy_mb) and applies a fixed log-domain bias. In code the logic is:

Compute block activity $E = x264_ac_energy_mb(h, x,y)$ (a non-negative integer representing AC energy).
Compute a QP-adjustment factor as $\Delta QP = strength \times \Bigl(\log_2(\max(E,1)) - C\Bigr),$ where $strength=f_{aq}\times1.0397$ and $C=14.427+2,(B-8)$ (for 8-bit $C\approx14.427$). Concretely, the code does:
```
// mode1: variance AQ
strength = f_aq_strength * 1.0397f;
uint32_t E = x264_ac_energy_mb(h, mb_x, mb_y, frame);
float qp_adj = strength * (log2(max(E,1)) - (14.427f + 2*(BIT_DEPTH-8)));
frame->f_qp_offset[mb_xy] = frame->f_qp_offset_aq[mb_xy] = qp_adj;
```
. The constant $14.427$ was chosen so that an “average” block ($E\approx2^{14.427}$) gets $\Delta QP\approx0$, keeping mean QP unchanged. For $E$ below the threshold, $\Delta QP$ is negative (reducing the block’s QP, i.e. allocating more bits to flat blocks), and for high-$E$ blocks $\Delta QP$ is positive (coarser QP on busy areas). Mathematically this means each block’s dequantizer scale is multiplied by $\mathrm{var}^C$ (since $QP$ in H.264 is logarithmic).
Impact: Mode 1 redistributes bits to smooth areas (flat/dark regions) at the expense of highly-textured regions. This reduces blocking/banding in flat areas while preserving detail in busy regions. Because of the log scaling, gains are moderate per block. In practice mode 1 is the default and generally recommended for film-like content. (x264 also raises qcomp internally to partially offset AQ’s effect, ensuring overall bitrate stays stable.) There is negligible computational overhead beyond computing block variances.

Mode 2: Auto-Variance AQ (Experimental)

Mode 2 (“auto-variance AQ”) is more dynamic: it adjusts offsets relative to the frame’s average activity. The code does:

Measure each block: Compute $E_{xy} = x264_ac_energy_mb(h, x,y, frame)$ for each block. Then compute a contrast metric $Q_{xy} = (c,E_{xy} + 1)^{1/8},$ where $c = 1/(1<<[2,(BIT_DEPTH-8)])$ (for 8-bit, $c=1$). This is implemented as:
```
float bit_depth_correction = 1.f/(1<<(2*(BIT_DEPTH-8)));
for each MB:
    uint32_t E = x264_ac_energy_mb(...);
    float Q = powf(E * bit_depth_correction + 1, 0.125f);
    frame->f_qp_offset[mb_xy] = Q;
    avg += Q; avg2 += Q*Q;
```
. Here $Q$ grows sublinearly with $E$ (the 8th-root).
Normalize: Compute the frame-average $\bar Q = \frac{1}{N}\sum Q_{xy}$ (and its squared mean), then set $strength = f_{aq} \times \bar Q,$ and adjust $\bar Q$ by subtracting half the variance term (noting comments do mean-normalization):
```
avg_adj = avg/N;
avg_adj2 = avg2/N;
strength = f_aq_strength * avg_adj;
avg_adj = avg_adj - 0.5f*(avg_adj2 - 14.f)/avg_adj;
```
. (The 14.f here centers the metric; after this, $\bar Q$ is roughly unity.)

Apply offsets: For each block, offset its QP by $\Delta QP = strength \times (Q_{xy} - \bar Q),. $ In code:

// After computing strength and avg_adj:
for each MB:
    float Q = frame->f_qp_offset[mb_xy];  // from step 1
    float qp_adj = strength * (Q - avg_adj);
    frame->f_qp_offset[mb_xy] = frame->f_qp_offset_aq[mb_xy] = qp_adj;

. Because the mean $Q$ was subtracted, the offsets sum to zero (no net bit change).

Interpretation: Mode 2 effectively ranks blocks relative to the frame’s contrast. A block with above-average $Q$ (i.e. higher activity) gets a positive $\Delta QP$, raising its QP (fewer bits); below-average blocks get negative $\Delta QP$, lowering QP. Unlike mode 1’s fixed threshold, mode 2 automatically adapts each frame: bright well-lit or high-contrast scenes yield different $\bar Q$ than dark or low-contrast scenes, so the bias shifts per frame. Anecdotally, mode 2 tends to concentrate bits even more into static or flat areas (often the scene background) at the expense of busy/moving parts. As one user noted, mode 2 “pulls bits from the fast moving parts to use them in slower or still parts” for visual effect. (This is a heuristic result of biasing to below-average variance blocks.)
Implementation notes: Mode 2 is labeled “experimental” and was improved by later “biased” tweaks (mode 3) but the core is above. It requires two passes over all MBs (compute $E$ and $Q$, then apply offsets), adding some encoding cost. Also, because it forces the frame mean of offsets to zero, it can sometimes increase the instantaneous bitrate in flat frames and decrease it in noisy frames, potentially causing more fluctuation.

Effects on Quality and Bitrate

With AQ enabled, visual quality usually improves for the same bitrate. In particular, flat or dark regions (e.g. skies, walls) get more bits (smaller quantizer) so blocking/banding is reduced. Dynamic or highly-textured areas get slightly coarser quantization. In practice this trade – shifting bits from “less noticeable” complex parts into sensitive flat parts – yields higher subjective quality. The x264 developers note a “significant gain in overall image quality” with AQ on.

Importantly, x264 calibrates the AQ formulas so that the overall frame bit usage stays nearly constant. As the code comment says, the constants were “chosen to result in approximately the same overall bitrate as without AQ”. Internally, x264 will also raise the qcomp (quantizer-curve compression) factor to offset AQ’s bias. Thus enabling AQ (mode 1 or 2) at the default strength (~1.0) typically does not vastly change file size, but redistributes it for better quality. Users often observe that with mode 1, CRF yields a similar bitrate but cleaner flats; with mode 2, CRF might have to be slightly increased to hit a target bitrate (since mode 2 can be more aggressive).

Limitations and Recommendations

Mode 0 (Disabled): Yields no AQ benefit and is not recommended. It may require compensating with very high qcomp to avoid banding. Use only if you explicitly want uniform quantizers (e.g. QP mode).
Mode 1 (Variance): This is the default and generally recommended AQ mode. It works well on most natural (film-like) content. Strength ≈1.0 is a good default. (For animation or very low-detail video, the Avidemux guide suggests lowering strength ~0.6, since AQ is “less efficient with animation”.) There is no extra twist per frame, so results are predictable.
Mode 2 (Auto-variance): This experimental mode adapts per-frame. It can produce even better looking backgrounds in flat scenes, but may introduce frame-to-frame variations or “over-corrections” in highly dynamic content. Some report that mode 2 was tuned for anime/static backgrounds and may require carefully adjusting AQ strength or CRF to avoid overshoot. It is supported in CRF and ABR modes (not QP mode), but as “experimental” its behavior can sometimes increase bitrate for very low-contrast footage. If used, one should test with aq-strength and monitor subjective results.

In summary, x264’s AQ modes compute per-block offsets (via either a log-energy or a root-energy model) to trade bits between detailed and flat areas. Mode 1 uses a fixed log2 variance heuristic, while mode 2 zero-centers the offsets around each frame’s average. Both aim to improve visible quality (especially in flat regions) without significantly changing overall bitrate.

References: x264 source code (ratecontrol.c) and documentation.