Dataset & Method  ·  CVPR 2026

Seeing Beyond8Bits:
Subjective and Objective Quality Assessment of HDR-UGC Videos

Shreshth Saini1, Bowen Chen1, Neil Birkbeck2, Yilin Wang2, Balu Adsumilli2, Alan C. Bovik1,3
1Laboratory for Image and Video Engineering (LIVE), UT Austin    2Google / YouTube    3University of Colorado Boulder
Dataset: Beyond8Bits  ·  Method: HDR-Q (MLLM) with HAPO (HDR-Aware Policy Optimization)

6,861

HDR-UGC Source Videos
(2,253 Crowd + 4,608 Vimeo)

~44K

Transcoded / Distorted Videos
bit-ladder 0.2–5 Mbps

>1.5 M

Crowd Subjective Ratings
~35 ratings / video on AMT

HDR-Q

First MLLM for HDR-UGC VQA
Ovis2.5 + LoRA, trained with HAPO

Publish-ready release: 5,917 sources (2,153 Crowd + 3,764 Vimeo) and 41,419 transcoded clips in data/Beyond8Bits_publish.csv.

Overview of Beyond8Bits dataset and performance evaluation

Overview of our dataset and performance evaluation. Top: Example comparisons between HDR and SDR frames, illustrating differences in brightness range, color depth, and visual detail across diverse scenes. Bottom-left: The distribution of video categories in the HDR-UGC-44K dataset, covering human-centered content, nature & outdoor scenes, and various other real-world scenarios. Bottom-right: Performance comparison between our proposed HDR-Q model and baseline methods on three datasets, where HDR-Q achieves significant improvements in PLCC.

Abstract

High Dynamic Range (HDR) user-generated (UGC) videos are rapidly proliferating across social platforms, yet most perceptual video quality assessment (VQA) systems remain tailored to Standard Dynamic Range (SDR). HDR's higher bit depth, wide color gamut, and elevated luminance range expose distortions such as near-black crushing, highlight clipping, banding, and exposure flicker that amplify UGC artifacts and challenge SDR models. To catalyze progress, we curate Beyond8Bits, a large-scale subjective dataset of ~44K videos from 6,861 sources with >1.5M crowd ratings, spanning diverse scenes, capture conditions, and compression settings. We further introduce HDR-Q, the first Multimodal Large Language Model (MLLM) for HDR-UGC VQA. We propose (i) a novel HDR-aware vision encoder (a SigLIP-2 adapter finetuned with an HDR–SDR dual-domain contrastive objective) to produce HDR-sensitive embeddings, and (ii) HAPO (HDR-Aware Policy Optimization), an RL fine-tuning framework that anchors reasoning to HDR cues. HAPO augments GRPO with an HDR–SDR contrastive KL that encourages token reliance on HDR inputs, dual-entropy regularization to prevent modality neglect, and a Gaussian-weighted regression reward for fine-grained MOS calibration. Across Beyond8Bits and public HDR-VQA benchmarks (LIVE-HDR, SFV+HDR), HDR-Q delivers state-of-the-art performance. Beyond8Bits subsumes our earlier CHUG (ICIP '25) and BrightVQ (WACV '26 Oral) releases.

What's in Beyond8Bits

Beyond8Bits unifies two prior subjective studies (CHUG, BrightVQ) and adds a substantially larger Vimeo-sourced HDR-UGC partition, plus HDR-Q — the first MLLM tailored for HDR-UGC quality assessment. Each source video is transcoded across a multi-rung bit-ladder to expose realistic compression, up/down-scaling, and re-encoding distortions.

Largest HDR-UGC study

>1.5 M opinion scores across ~44K videos from 6,861 sources — an order of magnitude beyond prior HDR VQA datasets.

Two complementary sources

Consumer-device HDR captures (Crowd: iPhone / Pixel / Galaxy, 2,253 sources) plus CC-licensed Vimeo HDR uploads (Vimeo: 4,608 sources). Both share the same bit-ladder and AMT instrument.

Full HDR signaling preserved

Every clip retains 10-bit HEVC, PQ transfer, BT.2020 gamut through transcoding; clips are trimmed to ≤10 seconds.

HDR-Q (first MLLM for HDR-UGC)

An Ovis2.5-based MLLM with a SigLIP-2 HDR-aware vision encoder, trained with HAPO (extends GRPO with HDR–SDR contrastive KL, dual-entropy regularization, and high-entropy weighting).

Rich metadata

Per-video MOS / SOS (SUREAL-aggregated), type, ref, resolution, bitrate, orientation, framerate, split, and frame dimensions.

Open & reusable

Metadata released under CC BY 4.0; videos retain their original licenses for non-commercial research.

Dataset Composition

Beyond8Bits is partitioned into a Crowd split (consumer-device HDR-UGC from a dedicated crowdsourcing campaign) and a Vimeo split (CC-licensed public HDR uploads). Both partitions share the same transcoding bit-ladder, AMT rating instrument, and subject pool, enabling apples-to-apples cross-domain analysis. The table below reports the publish-ready release; paper-reported totals are slightly larger and track ongoing license clearing.

Crowd partition (includes CHUG)

Source videos (released)2,153
Transcoded videos (released)15,071
DevicesiPhone / Pixel / Galaxy (10-bit HEVC)
OrientationsPortrait 12,075 · Landscape 2,996
ConsentNon-exclusive research redistribution

Vimeo partition

Source videos (released)3,764
Transcoded videos (released)26,348
LicenseCreative Commons (original Vimeo licenses retained)
OrientationsLandscape 22,974 · Portrait 3,374
Common framerates24 / 25 / 30 / 60 fps

Subjective study

PlatformAmazon Mechanical Turk (HDR-capable devices only)
Rating instrumentContinuous 0–100 Likert, ITU-R BT.500-14
Ratings collected>1.5 M (valid, post-QC)
Avg. ratings / video~35
ScreeningHDR-display qualification quiz, training + calibration phase, golden-set + repeat videos, bit-depth / bandwidth checks
MOS aggregationSUREAL MLE (median inter-subject SRCC 0.90)

Transcoding grid (bit-ladder)

Target resolutions360p · 720p · 1080p (+ source/ref)
Bitrate rungs0.2 / 0.5 / 1 / 2 / 3 Mbps (paper up to 5 Mbps)
Clip length≤ 10 seconds
HDR signaling10-bit HEVC · PQ transfer · BT.2020 gamut

Train / Val / Test split (release)

Split policy70 / 10 / 20 by source identity
Train28,987
Validation4,151
Test8,281
Split columnsplit in Beyond8Bits_publish.csv

Source vs. transcoded

Reference sources (ref=1)5,917
Transcoded clips (ref=0)35,502
360p transcodes5,917
720p transcodes11,834
1080p transcodes17,751

Content & Score Distribution

Beyond8Bits covers a wide swath of perceptual space along spatial and temporal complexity, luminance/chrominance, and overall quality. The grid below samples representative frames across the dataset's content categories — portraits, group & event footage, and indoor/outdoor scenes — captured under diverse lighting and device conditions.

Overview of the Beyond8Bits video dataset illustrated through sampled frames

Overview of our video dataset illustrated through sampled frames across portraits, group & events, and indoor / outdoor scenes.

Sample Frames with Crowd-Sourced MOS

A sample of Beyond8Bits source frames spanning the full MOS range, with the corresponding crowd-sourced Actual MOS. Frames cover both Crowd and Vimeo partitions and both orientations.

Subjective Study Interface

Ratings were collected on Amazon Mechanical Turk with a continuous single-stimulus rating bar, HDR-capable display attestation, and gold / self-consistency screening HITs interleaved throughout the session.

Amazon Mechanical Turk rating interface

The web-based AMT rating interface used across the CHUG, BrightVQ and Vimeo sub-studies that together form Beyond8Bits.

How to Access the Dataset

We release per-video metadata (video_id, mos, sos, type, ref, resolution, bitrate, orientation, framerate, split, height, width) for the full publish-ready release (41,419 clips across Crowd + Vimeo) and the corresponding video payloads via an S3-hosted mirror. The complete dataset package, including raw per-rating CSV files, is also available on UT-Box.

Clone the GitHub repository

All metadata files, ID lists, rating CSVs and licensing info live in the Beyond8Bits GitHub repository. The paper supplementary contains detailed per-field documentation.

git clone https://github.com/shreshthsaini/Beyond8Bits.git
cd Beyond8Bits

Grab the video-ID manifest

The full Beyond8Bits manifest (41,419 transcoded videos across Crowd + Vimeo partitions) is at data/Beyond8Bits_publish.txt (one hashed ID per line) with matched MOS / SOS and per-video metadata in data/Beyond8Bits_publish.csv. The CHUG-compatible crowd subset remains available at data/Beyond8Bits_publish_crowd.csv / .txt.

Download a single video with the AWS CLI

Replace VIDEO_ID with any hashed ID from the manifest:

aws s3 cp s3://ugchdrmturk/videos/VIDEO_ID.mp4 ./Beyond8Bits_Videos/

Bulk download all videos

To mirror the full published partition in one shot:

cat data/Beyond8Bits_publish.txt | while read video; do
  aws s3 cp s3://ugchdrmturk/videos/${video}.mp4 ./Beyond8Bits_Videos/
done

Stream a video directly in the browser

Replace VIDEO_ID below to play any video without the AWS CLI:

https://ugchdrmturk.s3.us-east-2.amazonaws.com/videos/VIDEO_ID.mp4

Example: 9ae245a27cc5ea9d2f3fae9692250281.mp4

Load scores & metadata in Python

import pandas as pd
df = pd.read_csv("data/Beyond8Bits_publish.csv")
print(df.columns.tolist())
# ['video_id', 'mos', 'sos', 'type', 'ref', 'resolution',
#  'bitrate', 'orientation', 'framerate', 'split',
#  'height', 'width']
print(df["mos"].describe())
print(df["type"].value_counts())   # Crowd vs Vimeo
print(df["split"].value_counts())  # train / test

Full dataset package on UT-Box

The complete Beyond8Bits release — including per-rating raw CSVs and the full metadata bundle — is mirrored on UT-Box:

https://utexas.box.com/s/pvz8zpmpogvpy62pqpar2e54c6ovyd5z

Note: Beyond8Bits_publish.csv / .txt now covers all 41,419 transcoded videos across Crowd + Vimeo partitions.

Sample Videos

Representative clips drawn from the CHUG and BrightVQ sub-studies (all part of Beyond8Bits). Best viewed on an HDR10/HLG-capable display.

Portraits

Crowd · Portrait

1080p · 30 fps · MOS 86.0

Crowd · Portrait

1080p · 30 fps · MOS 79.8

Crowd · Portrait

1080p · 29.97 fps · MOS 78.8

Vimeo · Portrait

1080p · 30 fps · MOS 80.1

Vimeo · Portrait

1080p · 30 fps · MOS 79.4

Vimeo · Portrait

1080p · 30 fps · MOS 79.3

Landscapes

Crowd · Landscape

1080p · 29.97 fps · MOS 78.4

Crowd · Landscape

1080p · 30 fps · MOS 77.1

Crowd · Landscape

1080p · 59.94 fps · MOS 76.9

Vimeo · Landscape

1080p · 30 fps · MOS 81.1

Vimeo · Landscape

1080p · 58.69 fps · MOS 80.9

Vimeo · Landscape

1080p · 58.1 fps · MOS 81.8

Part of a Family of HDR-UGC Releases

Beyond8Bits consolidates and extends two earlier releases from our lab. If you use Beyond8Bits, please also consider citing its predecessors.

Method: HDR-Q + HAPO

HDR-Q is the first multimodal large language model for HDR-UGC quality assessment. It integrates (i) an HDR-aware vision encoder that produces HDR-sensitive embeddings while retaining semantic alignment, and (ii) HDR-Aware Policy Optimization (HAPO), a reinforcement-learning objective that extends GRPO with HDR-specific grounding and stability terms.

Overview of HDR-Q with HAPO

Overview of HDR-Q with HAPO. Left: HAPO compares rollouts under HDR inputs (text + SDR + HDR tokens) versus an HDR-deprived pathway (text + SDR only), maximizing their KL divergence to enforce HDR grounding and applying dual-entropy regularization to prevent reward hacking. Group-wise rewards include MOS/attribute accuracy, reasoning quality, and self-rewarding. Right: a LoRA-tuned LLM decodes the HDR-aware reasoning; visual inputs originate from both a standard encoder and our HDR-aware adapter.

HDR-aware vision encoder

A SigLIP-2 encoder adapted with HDR–SDR dual-domain contrastive supervision. Captions are generated by Qwen2.5-VL-72B; HDR embeddings are pushed closer to their caption than the tone-mapped SDR counterpart, preserving 10-bit PQ / BT.2020 cues without collapsing onto SDR representations.

HDR–SDR contrastive KL

Maximizes DKLHDR ‖ πSDR) between rollouts with and without HDR tokens, preventing modality neglect (policies that read textual priors while ignoring the HDR signal).

Dual-entropy regularization

Per-pathway entropy penalties on both HDR and SDR rollouts prevent the trivial "entropy inflation" solution to the contrastive KL while keeping HDR-grounded distributions sharp.

High-entropy weighting (HEW)

Rescales GRPO's group-normalized advantage with per-token entropy, concentrating the learning signal on informative reasoning tokens (e.g., banding, highlight clipping, near-black crushing).

Gaussian MOS reward + self-reward

A Gaussian-weighted regression reward Rsc calibrates fine-grained MOS; a group-level self-reward Rself consolidates within-group consensus for reasoning stability.

Two-stage RL training

Stage 1 — Modality Alignment: short HAPO runs align HDR tokens and projection layers. Stage 2 — Full-RFT: complete HAPO on the HDR-UGC corpus. Base: Ovis2.5 + rank-4 LoRA; 8 uniformly-sampled frames at native 10-bit PQ; trained on 4× NVIDIA H200 GPUs.

License & Terms of Use

Beyond8Bits metadata (CSV / TXT manifests, rating aggregates) is released under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Video payloads retain their original licenses (CC-licensed Vimeo clips; user-contributed crowd videos for which we have non-exclusive research redistribution rights). The dataset is intended strictly for non-commercial research in HDR video quality assessment, tone-mapping, inverse tone-mapping, generative modeling evaluation, and related perceptual studies.

By downloading any portion of the dataset you agree not to (a) redistribute the raw video payloads outside the approved S3 mirror without written consent, (b) attempt to de-anonymize contributing workers, or (c) use the data to train models for deployment in a commercial product without a separate license from UT Austin / YouTube. Please refer to LICENSE in the GitHub repo for the full terms.

BibTeX

If you find Beyond8Bits useful in your research, please cite:

@article{Saini_2026_Beyond8Bits,
    author  = {Saini, Shreshth and Chen, Bowen and Birkbeck, Neil and Wang, Yilin and Adsumilli, Balu and Bovik, Alan C.},
    title   = {Seeing Beyond8Bits: Subjective and Objective Quality Assessment of HDR-UGC Videos},
    journal = {arXiv preprint arXiv:2603.00938},
    year    = {2026}
}

We will update this entry with the official CVPR 2026 BibTeX once published; for now please cite the arXiv version.

Please also consider citing the two sub-studies that Beyond8Bits extends:

@INPROCEEDINGS{Saini_2025_ICIP_CHUG,
  author    = {Saini, Shreshth and Bovik, Alan C. and Birkbeck, Neil and Wang, Yilin and Adsumilli, Balu},
  booktitle = {2025 IEEE International Conference on Image Processing (ICIP)},
  title     = {CHUG: Crowdsourced User-Generated HDR Video Quality Dataset},
  year      = {2025},
  pages     = {2504-2509},
  doi       = {10.1109/ICIP55913.2025.11084488}
}

@InProceedings{Saini_2026_WACV_BrightRate,
    author    = {Saini, Shreshth and Chen, Bowen and Wang, Yilin and Birkbeck, Neil and Adsumilli, Balu and Bovik, Alan C.},
    title     = {BrightRate: Quality Assessment for User-Generated HDR Videos},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {March},
    year      = {2026},
    pages     = {1522-1532}
}