SDRTV → HDRTV · Inverse Tone Mapping

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

Shreshth Saini1 · Hakan Gedik1 · Neil Birkbeck2 · Yilin Wang2 · Balu Adsumilli2 · Alan C. Bovik1

1 The University of Texas at Austin  ·  2 Google, Inc.

SDR inputs and LumaFlux HDR outputs from the Luma-Eval benchmark

SDR → HDR with LumaFlux. Examples from the Luma-Eval benchmark: each pair shows a real-world low-quality SDR input and the LumaFlux reconstruction, which restores a substantially broader dynamic range and the color saturation lost during tone-mapping compression — recovering both diffuse texture and specular highlight detail.

Abstract

The rapid adoption of HDR-capable devices has created a pressing need to convert 8-bit Standard Dynamic Range content into perceptually and physically accurate 10-bit HDR. Existing inverse tone-mapping (ITM) methods often rely on fixed tone-mapping operators that struggle to generalize to real-world degradations, stylistic variations, and camera pipelines — producing clipped highlights, desaturated colors, or unstable tone reproduction.

LumaFlux is the first physically and perceptually guided diffusion transformer for SDR→HDR reconstruction, built by adapting a large pretrained DiT without retraining its weights. It introduces (1) a Physically-Guided Adaptation module injecting luminance, spatial, and frequency cues into attention through gated low-rank residuals; (2) a Perceptual Cross-Modulation layer stabilizing chroma and texture via FiLM conditioning on SigLIP features; (3) an HDR Residual Coupler fusing both signals under a timestep- and layer-adaptive schedule; and (4) a Rational-Quadratic Spline decoder reconstructing smooth, interpretable tone fields for highlight and exposure expansion. We further curate the first large-scale SDR–HDR training corpus and establish Luma-Eval, a new evaluation benchmark with HDR references and expert-graded SDR.

+1.6 dB PSNR over strongest baseline 318k curated SDR–HDR pairs ≈2 days to train on 4× H200 prompt-free — no text encoders

Method

SDR formation is a lossy chain — tone mapping, BT.2020→709 gamut compression, quantization, and codec noise. Inverting it needs both physical priors (luminance and color constraints) and perceptual priors (semantics and realism). LumaFlux gets the latter for free from a frozen Flux backbone and injects the former through four lightweight, zero-initialized modules scheduled by a shared timestep–layer conditioner Ψ(t, ℓ): early layers and timesteps receive strong global tone gains, late ones refine highlight detail.

PGAPhysically-Guided Adaptation

Gated low-rank residuals on attention value projections, conditioned per-token and per-head on luminance, gradient, and saturation maps with FFT spectral gating — expanding highlights only where the scene demands it.

PCMPerceptual Cross-Modulation

FiLM modulation of hidden states from frozen SigLIP embeddings, enforcing color constancy and semantic coherence across illumination and content variations.

CouplerHDR Residual Coupler

Fuses physical and perceptual token residuals with a λ(t, ℓ) gate that decays as t→0: global tone first, fine highlight roll-off last — a guidance flow inside the latent manifold.

RQSRational-Quadratic Spline Decoder

A monotone, invertible, differentiable tone field that expands VAE-decoded luma into HDR with smooth highlight knees — no banding, no saturation collapse.

LumaFlux architecture overview

Architecture. The SDR input splits into physical (Tphys) and perceptual (Tperc) streams. In each Luma-MMDiT block, PGA injects luminance- and spectrum-aware low-rank updates into attention, PCM applies FiLM to normalized features, and the HDR Residual Coupler fuses both cues. A frozen VAE decoder with the RQS tone-field head reconstructs HDR in PQ/BT.2020.

Baseline DiT fine-tuning vs LumaFlux adaptation

Architectural paradigms. Left: direct LoRA/full fine-tuning of a DiT overfits small HDR datasets and hallucinates texture. Right: LumaFlux inserts lightweight, physically interpretable modules into the frozen MM-DiT, preserving the pretrained generative prior while enabling accurate ITM with few trainable parameters.

Dataset & Benchmark

We unify HIDROVQA (411 PGC videos), CHUG (428 UGC videos), and LIVE-TMHDR (40 studio HDR videos with expert-graded SDR) under PQ-encoded BT.2020 at a 1,000-nit mastering peak. Each HDR frame is paired with SDR variants from a composite degradation chain: 8 tone-mapping operators × CRF {23, 31, 39} compression — ≈318k pairs including 54k expert-graded references. Luma-Eval adds 20 held-out HDR sources (10 PGC + 10 UGC) evaluated under both perceptually optimized and degradation-heavy SDR conditions.

Results

Metrics in PU21 space for luminance (PSNR, SSIM), HDR domain (HDR-VDP3), and perceptual color (ΔEITP, lower is better). Best in blue.

Method HDRTV1K HDRTV4K Luma-Eval
PSNR↑SSIM↑VDP3↑ΔE↓ PSNR↑SSIM↑VDP3↑ΔE↓ PSNR↑SSIM↑VDP3↑ΔE↓
HDRTVNet++38.360.9738.758.2830.820.8818.127.8536.540.9018.227.35
ICTCPNet36.590.9228.577.7933.120.9778.906.7534.450.9198.746.93
HDRTVDM36.980.9718.5510.8430.150.8867.909.9535.100.9038.249.44
HDCFM38.420.9738.527.8333.250.9088.207.4236.780.9158.297.20
Deep SR-ITM37.100.9698.239.2426.590.8126.928.8833.210.8757.418.41
FlashVSR35.340.8836.318.7933.510.8465.727.5134.800.8575.846.23
LEDiff36.520.8725.719.1332.250.8635.329.6631.730.8595.129.85
PromptIR32.140.9549.179.5928.480.8989.177.0034.120.9138.886.82
LumaFlux (ours) 39.270.9829.836.12 35.860.9789.725.86 36.920.9388.915.67

Quantitative comparison across benchmarks (Table 1 of the paper).

Variant (ablation, 100k iters)PSNR↑PSNR(Y)↑ΔEITPHDR-VDP3↑HDR-LPIPS↓FR-HIDROVQA↑
Flux + LoRA only33.4234.288.587.820.13672.9
+ PGA (no spectral)34.9435.817.628.180.12275.2
+ PGA (spectral gating)35.1836.027.318.290.11676.3
+ PCM (SigLIP FiLM)35.8936.736.788.460.10778.6
+ RQS (linear)35.7236.546.858.410.10878.0
+ RQS (monotone spline)35.9836.846.098.610.08780.8

Component ablations on Luma-Eval (Table 3 of the paper): PGA delivers the largest gain; the monotone RQS restores perceptual smoothness and broadens effective dynamic range.

BibTeX

@article{saini2026lumaflux,
  title   = {LumaFlux: Lifting 8-Bit Worlds to HDR Reality with
             Physically-Guided Diffusion Transformers},
  author  = {Saini, Shreshth and Gedik, Hakan and Birkbeck, Neil and
             Wang, Yilin and Adsumilli, Balu and Bovik, Alan C.},
  journal = {arXiv preprint arXiv:2604.02787},
  year    = {2026}
}