arXiv preprint · 2026

LumaGuide: Distribution Shaping for Training-Free
HDR Generation in Diffusion Models

Bowen Chen1   Shreshth Saini1   Balu Adsumilli2   Alan C. Bovik1
1The University of Texas at Austin    2Google
arXiv Paper Code BibTeX
LumaGuide teaser
Figure 1. LumaGuide steers a pretrained diffusion model toward HDR-consistent luminance distributions at sampling time — without modifying any model weights. Generated outputs preserve semantic content while producing structured highlights and faithful shadow detail under exposure adjustment, matching target HDR statistics in PQ space.

Abstract

Pretrained diffusion models generate realistic images, but their outputs remain constrained by the statistical biases of their training data, limiting their ability to produce high dynamic range (HDR) content. In this work, we introduce LumaGuide, a training-free framework for distribution shaping in diffusion models. Instead of modifying model parameters, LumaGuide steers the sampling process to match target feature distributions via differentiable energy-based guidance. We instantiate this framework for HDR generation by controlling luminance distributions in perceptually uniform PQ space. Our results show that aligning luminance histograms can induce HDR-consistent behavior, including coherent highlights and preserved shadow detail, while maintaining semantic fidelity. Beyond HDR, LumaGuide enables flexible specification of target distributions through data-driven presets, reference images, or text-driven predictors, and extends naturally to video generation with temporal consistency constraints.

Method

At every denoising step, LumaGuide computes a differentiable soft histogram of the predicted clean image in perceptually uniform PQ space and minimizes a Wasserstein-1 distance to a target histogram. The resulting gradient is back-propagated through the VAE decoder to shape the sampling velocity. Because the histogram is permutation-invariant, the diffusion prior is free to handle semantics and spatial structure — LumaGuide only constrains the global luminance statistics.

Method overview
Figure 2. Method overview. We form a soft PQ-luminance histogram of the Tweedie estimate at each step, compute a Wasserstein-1 distance to the target distribution, and use its gradient as an additive guidance term on the velocity / score.

Results

LumaGuide reshapes the generated PQ-luminance histograms toward the target HDR distribution while preserving semantic structure, across Flux.1, SD3, SDXL, and CogVideoX. No fine-tuning is required.

Per-pixel PQ luminance comparison under exposure adjustment
Figure 3. Per-pixel PQ-luminance surface for the same seed and prompt. The Flux.1 baseline (top) overshoots into clipped highlights; LumaGuide (bottom) redistributes mass into mid-tones and preserves the highlight gradient, matching the target HDR distribution.
Histogram alignment plots
Figure 4. Luminance histogram alignment. Generated PQ histograms before and after guidance, alongside the HDR target distribution.
Subjective study results
Figure 5. Subjective study. Human-preference results against HDR baselines, reported in JOD units.
Ablations on guidance scale, schedule and bins
Figure 6. Ablations on guidance scale, schedule, and the number of histogram bins.

Quantitative comparison

Best value per column is in bold. LumaGuide rows are highlighted. Arrows indicate the direction of better.

Table 1. Comparison with HDR generation baselines.

Best alignment and largest dynamic range — competitive quality, moderate runtime, and entirely training-free.

Method Q-quality ↑ Q-alignment ↑ DRstops ↑ JOD ↑ Time ↓
LEDiff0.4250.6124.71−0.88~8.6 s
BracketDiffusion0.4480.64812.25−0.30~389 s
X2HDR0.5790.77311.41+0.43~6 s
LumaGuide0.5680.81414.99+0.757.8 s

Table 2. Ablation of feature domain and distance.

PQ-space guidance with W1 dominates linear-domain and ℓ2/KL variants.

Domain Distance uW1 ↓ p50dist ↓ p99dist ↓ DRstops ↑
LinearW13.730.1150.19916.37
Linear23.790.1160.20716.33
PQ23.400.1000.20616.18
PQKL2.060.0650.14316.51
PQW10.580.0240.05314.99

Table 3. Cross-backbone results.

Drop-in across Flux.1, SD3 and SDXL — no retraining.

Backbone Q-quality ↑ Q-alignment ↑ DR (stops) ↑
Flux.10.5680.81414.99
SD30.5120.79515.42
SDXL0.4310.65515.90

HDR video generation

Applying the same distribution shaping objective to a pretrained video diffusion model (CogVideoX) yields zero-shot HDR video synthesis. A Temporal Luminance Coherence (TLC) term penalizes highlight flicker across frames while preserving motion and semantics.

HDR video frame strip
Figure 7. Selected frames from an HDR video clip generated under TLC guidance. Video reel coming soon.

BibTeX

@article{chen2026lumaguide,
  title   = {LumaGuide: Distribution Shaping for Training-Free HDR Generation in Diffusion Models},
  author  = {Chen, Bowen and Saini, Shreshth and Adsumilli, Balu and Bovik, Alan C.},
  journal = {arXiv preprint},
  year    = {2026}
}