Rectified-CFG++: Fixing Guidance for Flow Models

Classifier-free guidance was built for stochastic diffusion SDEs, where renoising corrects errors at every step. Flow models use deterministic ODEs — there is no such safety net. This post explains why CFG breaks on flow models and how a geometry-aware predictor-corrector fixes it, with striking improvements in text rendering.

Mar 2026 Flow Models NeurIPS 2025 Guidance

Thesis. Classifier-free guidance (CFG) was designed for stochastic diffusion SDEs, where renoising at each step corrects extrapolation errors. Flow models solve deterministic ODEs — no renoising, no error correction. Rectified-CFG++ fixes this with a geometry-aware predictor-corrector that stays on the data manifold.

Main technical point. The key insight is replacing extrapolation with interpolation: anchor to the conditional flow velocity, then correct with a time-decaying guidance term that vanishes near the data endpoint.

Practical implication. A drop-in replacement for CFG that requires no retraining. Works across Flux, SD3, SD3.5, and Lumina with consistent improvements in text rendering, prompt adherence, and perceptual quality.

1) Why CFG Breaks on Flow Models

Classifier-free guidance1[3] computes a guided velocity by linearly combining the unconditional and conditional predictions:

\[ \hat{v} = (1 - \omega)\, v^u + \omega \, v^c = v^u + \omega\,(v^c - v^u) \]

When \(\omega > 1\), this is extrapolation: we overshoot past the conditional direction. In diffusion SDEs, this works because each step adds noise back to the sample (the stochastic term in the reverse SDE). This renoising acts as a natural error corrector — it pulls the trajectory back toward the data manifold even when guidance pushes it off.

Flow models are different. They define a deterministic ODE from noise \(x_1 \sim \mathcal{N}(0,I)\) to data \(x_0\):

\[ \frac{dx_t}{dt} = v_\theta(x_t, t), \qquad t: 1 \to 0. \]

There is no stochastic correction. Every error in the velocity field accumulates. When CFG extrapolates the velocity, the ODE trajectory drifts off the learned manifold, and there is nothing to bring it back. The result: oversaturated colors, distorted geometry, and garbled text — artifacts that worsen with stronger guidance.

Diffusion SDE + CFG (renoising corrects drift) data manifold noise CFG overshoot renoise Flow ODE + CFG (no correction → drift compounds) data manifold noise off-manifold! ideal path
Left: In diffusion SDEs, renoising (blue dashed arrows) pulls the trajectory back toward the manifold after each CFG overshoot. Right: In flow ODEs, there is no correction mechanism — extrapolation errors compound and the trajectory drifts off-manifold.

Here is what this looks like in practice on Flux (a state-of-the-art flow model). Without guidance, the model produces a plausible but undersaturated image. Standard CFG introduces color banding, oversaturation, and text artifacts. Rectified-CFG++ preserves the benefits of guidance while staying on-manifold.

No Guidance Flux output with no guidance
No guidance (w=0)
Standard CFG Flux output with standard CFG
CFG (w=3.5)
Rect-CFG++ Flux output with Rectified-CFG++
Ours (w=3.5)

2) The Fix: Predictor-Corrector Guidance

The core idea of Rectified-CFG++2[5] is to replace the naive CFG extrapolation with a two-stage predictor-corrector scheme that anchors to the conditional flow and applies guidance as a bounded perturbation.

The key shift is conceptual: instead of mixing unconditional and conditional velocities at the current point (which creates an extrapolated direction that may point off-manifold), we:

  1. Follow the conditional flow to a predicted midpoint
  2. Evaluate the guidance signal (conditional minus unconditional) at that midpoint
  3. Apply a scheduled correction that decays toward zero near the data endpoint

Algorithm 1: Rectified-CFG++ Sampling Step

  1. Predictor (half-step along conditional flow):
    \(\tilde{x}_{t-\Delta t/2} = x_t + \frac{\Delta t}{2} \cdot v^c_\theta(x_t, t)\) Move halfway using only the conditional velocity. No extrapolation here — this stays on the learned conditional trajectory.
  2. Evaluate at midpoint:
    \(v^c_{\text{mid}} = v^c_\theta(\tilde{x}_{t-\Delta t/2},\; t - \Delta t/2), \qquad v^u_{\text{mid}} = v^u_\theta(\tilde{x}_{t-\Delta t/2},\; t - \Delta t/2)\) Compute both conditional and unconditional velocities at the predicted midpoint.
  3. Corrector (anchored guidance):
    \(\hat{v} = v^c_\theta(x_t, t) + \alpha(t) \cdot \bigl(v^c_{\text{mid}} - v^u_{\text{mid}}\bigr)\) The anchor is the conditional velocity at the current point. The correction term is the guidance direction evaluated at the midpoint, scaled by \(\alpha(t)\).
  4. ODE update:
    \(x_{t-\Delta t} = \text{ODEUpdate}(x_t, \hat{v}, t, \Delta t)\) Standard Euler or midpoint update with the corrected velocity.

The critical difference from standard CFG: the conditional velocity is the anchor, not the extrapolated mixture. Guidance enters only as a perturbation, and the schedule \(\alpha(t)\) controls how much perturbation is allowed at each timestep.

Geometric view of Rectified-CFG++ predictor-corrector on the flow manifold
Geometric view of the predictor-corrector scheme. The predictor follows the conditional flow (blue), the corrector adds a bounded guidance perturbation (green). The trajectory stays in a tube around the ideal conditional path.

3) The Adaptive Schedule \(\alpha(t)\)

The guidance strength is not a constant. It follows a time-dependent schedule3:

\[ \alpha(t) = \lambda_{\max} \cdot (1 - t)^\gamma \]

where \(\lambda_{\max}\) is the peak guidance strength and \(\gamma \geq 1\) controls the decay rate. The behavior is intuitive:

  • At \(t = 1\) (pure noise): \(\alpha(1) = 0\). No guidance at the very start — there is no meaningful conditional signal to amplify yet.
  • At intermediate \(t\): \(\alpha(t)\) is large. This is where guidance matters most: the model is committing to global structure (composition, object layout, color palette) and needs the text conditioning signal amplified.
  • At \(t \to 0\) (near data): \(\alpha(t) \to 0\). The conditional model alone handles fine details — letter strokes, texture microstructure, edge sharpness. Guidance perturbation would only corrupt these.
t=0 t=0.5 t=1 time (noise ← data) α(t) strong guidance fine details resolved here guidance → 0 pure noise, no signal to guide constant CFG
The adaptive schedule \(\alpha(t) = \lambda_{\max}(1-t)^\gamma\) compared to constant CFG (red dashed). Guidance is strongest when the model resolves fine details (near \(t=0\)) and vanishes near pure noise (\(t=1\)). Note: \(t=0\) is the data endpoint in flow convention.

4) Why Text Rendering Improves

This is the section that matters most for practitioners4. Text rendering is one of the most visible failure modes of current flow-based generators, and Rectified-CFG++ provides a clean explanation and fix.

The argument is straightforward:

  1. Text requires pixel-precise strokes in the final denoising steps. The global layout (where the text goes, its approximate size) is determined early. But the exact letterforms — the difference between a legible "STOP" and garbled "STCP" — are resolved in the final steps near \(t = 0\).
  2. Standard CFG perturbs these strokes. With constant guidance \(\omega > 1\), the extrapolated velocity distorts the fine details that the conditional model had correctly predicted. The model "knows" the right letters but CFG pushes the trajectory off.
  3. Rectified-CFG++ with \(\alpha(t) \to 0\) near data lets the conditional model finish undisturbed. In the final steps, guidance strength vanishes. The conditional velocity alone determines the fine structure. The result: clean, legible text.

The following comparisons are from SD3. Each pair shows the same prompt and seed; only the guidance method differs. Look at the text regions carefully.

Stop Sign

Prompt: "A realistic photo of a stop sign on a suburban street corner."

Standard CFG (w=3.0) Stop sign with standard CFG showing text artifacts
CFG: text on the stop sign is distorted
Rect-CFG++ (w=2.5) Stop sign with Rectified-CFG++ showing clean text
Ours: clean, legible "STOP"

Neon Sign

Prompt: "A vibrant neon sign glowing in a dark alley at night."

Standard CFG (w=2.5) Neon sign with standard CFG
CFG: neon text garbled and bleeding
Rect-CFG++ (w=2.5) Neon sign with Rectified-CFG++
Ours: sharp, readable neon lettering

Carved Text

Prompt: "Ancient carved text on a weathered stone tablet."

Standard CFG (w=3.5) Carved text with standard CFG
CFG: carved letters are smeared and illegible
Rect-CFG++ (w=2.5) Carved text with Rectified-CFG++
Ours: crisp carved letterforms

Newspaper Headline

Prompt: "A newspaper front page with a bold headline about a scientific breakthrough."

Standard CFG (w=3.5) Headline with standard CFG
CFG: headline text is blurred and scrambled
Rect-CFG++ (w=3.0) Headline with Rectified-CFG++
Ours: coherent, readable headline

Postage Stamp

Prompt: "A vintage postage stamp with decorative text and border."

Standard CFG (w=3.5) Stamp with standard CFG
CFG: stamp text is muddled
Rect-CFG++ (w=3.5) Stamp with Rectified-CFG++
Ours: clean stamp lettering and borders

The pattern is consistent across all examples: Rectified-CFG++ produces sharper, more legible text. The effect is most dramatic on fine text (stamps, carved inscriptions) where CFG's constant perturbation is most destructive.

5) Theoretical Guarantees

The theoretical analysis5[5] provides two key results that explain why the method works. The proofs are in the paper; here we give the intuition.

Lemma (Midpoint Guidance Consistency)

If the velocity fields \(v^c, v^u\) are \(L\)-Lipschitz and bounded by \(V_{\max}\), then the guidance signal at the predicted midpoint stays close to the guidance signal at the current point:

\[ \bigl\|(v^c_{\text{mid}} - v^u_{\text{mid}}) - (v^c_t - v^u_t)\bigr\| \leq L \cdot V_{\max} \cdot \Delta t \]

In plain terms: evaluating guidance at the predicted midpoint versus the current point introduces an error proportional to step size. For reasonable step counts (20-50), this is small.

Proposition (Trajectory Deviation Bound)

The single-step deviation from the ideal conditional path is bounded by:

\[ \|x_{t-\Delta t}^{\text{guided}} - x_{t-\Delta t}^{\text{cond}}\| \leq \alpha(t) \cdot B \cdot \Delta t \]

where \(B\) depends on the Lipschitz constants and velocity bounds. The key factor is \(\alpha(t)\): because the schedule decays to zero near the data endpoint, the cumulative deviation over the final critical steps is bounded by a quantity that goes to zero.

Geometrically, the guided trajectory stays inside a tube around the ideal conditional path. The tube width is \(\alpha(t) \cdot B \cdot \Delta t\) at time \(t\), which shrinks as \(t \to 0\). Near the data manifold, the tube collapses and the guided path nearly coincides with the conditional path.

6) Results

We evaluate on MS-COCO 10K across four flow-based architectures. The table below reports FID (lower is better), CLIP score (higher is better), PickScore (higher is better), and HPSv2 (higher is better).

MS-COCO 10K: Across Architectures

Model Method FID ↓ CLIP ↑ PickScore ↑ HPSv2 ↑
Lumina-Next CFG 28.74 0.268 21.35 25.80
Rect-CFG++ 25.91 0.273 21.58 26.12
SD3 CFG 30.12 0.281 22.01 27.45
Rect-CFG++ 27.38 0.286 22.24 27.89
SD3.5 CFG 26.85 0.289 22.18 28.01
Rect-CFG++ 24.12 0.293 22.41 28.37
Flux CFG 24.53 0.295 22.56 28.72
Rect-CFG++ 22.17 0.299 22.78 29.05

Guidance Method Comparison on SD3.5

Method FID ↓ CLIP ↑ PickScore ↑ HPSv2 ↑
CFG 26.85 0.289 22.18 28.01
CFG-Zero* 25.94 0.290 22.25 28.15
APG 25.47 0.291 22.30 28.22
Rect-CFG++ 24.12 0.293 22.41 28.37

User Study

In a paired preference study, human evaluators preferred Rectified-CFG++ over standard CFG 72.8% of the time, with particularly strong preference on prompts involving text, fine detail, and complex compositions.

User study results showing 72.8% preference for Rectified-CFG++
User study results. Human evaluators preferred Rectified-CFG++ over standard CFG in 72.8% of paired comparisons.
Comparison across methods on multiple prompts
Qualitative comparison across guidance methods. Rectified-CFG++ produces consistently sharper text and more natural colors.

7) Implementation

The implementation is a drop-in replacement for the standard CFG sampling loop. Below is the core logic in Python-style pseudocode.

def rectified_cfgpp_sample(model, prompt, num_steps=28, lambda_max=3.5, gamma=1.5):
    """Rectified-CFG++ sampling loop for flow models."""
    # Initialize from noise
    x = torch.randn(1, C, H, W)
    dt = 1.0 / num_steps

    # Timesteps from t=1 (noise) to t=0 (data)
    timesteps = torch.linspace(1.0, 0.0, num_steps + 1)

    for i in range(num_steps):
        t = timesteps[i]

        # Adaptive guidance schedule: alpha(t) = lambda_max * (1-t)^gamma
        alpha = lambda_max * (1.0 - t) ** gamma

        # --- PREDICTOR: half-step along conditional flow ---
        v_cond = model(x, t, prompt=prompt)
        x_mid = x + (dt / 2) * v_cond
        t_mid = t - dt / 2

        # --- EVALUATE at midpoint ---
        v_cond_mid = model(x_mid, t_mid, prompt=prompt)
        v_uncond_mid = model(x_mid, t_mid, prompt=None)

        # --- CORRECTOR: anchored guidance ---
        guidance = v_cond_mid - v_uncond_mid
        v_hat = v_cond + alpha * guidance

        # --- ODE UPDATE ---
        x = x + dt * v_hat

    return x

A few implementation notes:

  • NFE cost: Each step requires 3 model evaluations (one conditional at current, one conditional at midpoint, one unconditional at midpoint). This is 1.5x the cost of standard CFG (2 evals per step). In practice, you can often reduce the step count by 30-40% at matched quality, so total cost is comparable.
  • Hyperparameters: \(\lambda_{\max} \in [2.5, 4.0]\) and \(\gamma \in [1.0, 2.0]\) cover the useful range. Start with \(\lambda_{\max} = 3.5, \gamma = 1.5\) as defaults.
  • Compatibility: Works with any flow model that supports conditional and unconditional forward passes. Tested on Flux, SD3, SD3.5, and Lumina-Next without modification.

References

  1. Liu et al., Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow, 2022.
  2. Lipman et al., Flow Matching for Generative Modeling, 2022.
  3. Ho and Salimans, Classifier-Free Diffusion Guidance, 2022.
  4. Chung et al., CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models, 2024.
  5. Saini et al., Rectified-CFG++ for Flow-Based Models, NeurIPS 2025.
  6. Esser et al., Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (Stable Diffusion 3), 2024.
  7. Black Forest Labs, Flux.1, 2024.