I am a final year PhD student at Laboratory of Image and Video Engineering at The University of Texas Austin, advised by Prof. Alan C. Bovik . My research focuses on the Theoretical Foundations of Generative Models (e.g. Flows, Diffusion, and MLLMs) and their applications in efficient sampling, image/video-quality assessment (QA), editing, and inverse problems (eg: ITM).
I collaborate with YouTube/ Google Media Algorithms team in my PhD. I was a Student Researcher in the LUMA team at Google Research (Jun–Oct 2025).
Before starting my PhD at UT Austin, I worked as a Research Engineer-AI at Arkray, inc and as Machine Learning Engineer at BioMind AI. At both places, I was working on developing novel and scalable AI solutions for medical image analysis.
📣Open to Opportunities.
I am actively seeking full-time Research Scientist positions, starting 2026. My expertise spans generative AI, multimodal learning, post-training, and perceptual quality assessment. Let's connect →
Scratch Pad: Essays/Posts and Technical Notes & Insights live on the blog. Explore →
Updates
Jan 2026:Open to full-time Research Scientist opportunities.
Jan 2026: Serving as a reviewer for ICML 2026 and ECCV 2026.
Dec 2025: BrightRate accepted in WACV 2026 for oral presentation.
Dec 2025: Completed Progress review.
Sep 2025: Rectified CFG++ accepted to NeurIPS 2025!
Jun 2025: Excited to join Google Research as Student Researcher in the LUMA team!
May 2025: Paper accepted in IEEE International Conference on Image Processing, IEEE ICIP 2025!
May 2025: Paper accepted in 42nd International Conference on Machine Learning (ICML) 2025!
May 2025: Paper accepted in 3rd Workshop on Generative Models for Computer Vision, CVPR 2025!
Mar 2025: Honored to be appointed as Assistant Director of LIVE at UT Austin!
Jun 2024: Joined Amazon as Applied Scientist-II Intern in the Perception team!
Jan 2024: Joined Alibaba US as Research Intern to work on diffusion models!
Nov 2023: Paper accepted in 3rd Workshop on Image/Video/Audio Quality in CV and Gen AI, WACV 2024!
Jun 2023: Completed first-ever large-scale HDR subjective perceptual quality study on Amazon Mechanical Turk!
Aug 2022: Joined LIVE at UT Austin as PhD student under Prof. Alan C. Bovik!
Aug 2022: Awarded prestigious Engineering Graduate Fellowship until 2027!
Feb 2022: Joined BioMind, Singapore as Research Engineer/Machine Learning Engineer.
Aug 2020: Started as Research Engineer (AI) at Arkray, Inc.
May 2019: Research Assistant at National University of Singapore (NUS) under Prof. Mengling "Mornin" Feng.
Aug 2018: Undergraduate Researcher at Image Processing and Computer Vision Lab, IIT Jodhpur under Prof. Anil Kumar Tiwari.
May 2018: Research Intern at The Multimedia Analytics, Networks and Systems Lab, IIT Mandi under Prof. Aditya Nigam.
Aug 2018 - Aug 2020: Undergraduate Researcher, Image Processing and Computer Vision Lab, IIT Jodhpur
May 2018 - Aug 2018: Research Intern, The Multimedia Analytics, Networks and Systems Lab, IIT Mandi
Applied Scientist Intern | Amazon - Perception Team Seattle, Washington | Jun 2024 – Aug 2024
Worked with the Perception team on large-scale synthetic data generation
Developed novel edit-bench and T2I-based diffusion model for consistent image/video editing and generation
Aiming to conduct Image+Video editing challenge and workshop
Research Intern | Alibaba Group Sunnyvale, California | Jan 2024 – May 2024
Developed generalizable and robust Vision Model-based Video Quality Assessment (VQA) methods
Using Diffusion Model priors as perceptual consistency for IQA (Paper: under review)
Co-Founder | Short-X Austin, Texas | Jan 2023 – Jan 2024
Short-X aims to automate the arduous task of making short-form contents from traditional long-form content
Built core AI models and pipelines for Short-X, working on transcription, extracting semantically meaningful and unique highlights, removing pauses, identifying speaker and smart vertical cropping
Graduate Research Assistant | Laboratory for Image and Video Engineering, UT Austin Austin, Texas | Aug 2022 – Present
Developing scalable vision models for HDR videos for tasks like ITM/TM, gamut expansion & quality assessment
Created the largest HDR-SDR dataset for short-form videos (publicly available)
Developing video quality assessment methods for HDR videos, which uses Non-Linear expansion of extremes of sub-level luminance
Machine Learning Engineer | BioMind (Products) Singapore, Singapore | Feb 2022 – Jun 2022
Developed SOTA multimodal DL models for segmentation and classification of 25+ tumor/non-tumor classes
Exploited TFRecords for memory-intense 4D datasets and proposed multi-task model for tumor predictions
Research Engineer – AI | Arkray, Inc. Kyoto, Japan (Remote) | Aug 2020 – Dec 2021
Proposed semi-supervised DL models to learn from a large chunk of the private unlabelled and noisy 2D datasets
Deployed models for products: UrineSediment Analyzer, and automated BodyFluid Analyzer (Aution EYE)
Research Assistant | National University of Singapore Singapore | May 2019 – Jul 2019 Supervisor: Dr. Mengling 'Mornin' Feng
Developed novel deep learning architecture for large-scale public health datasets
Published SOTA results with low cost for skin lesion analysis
Undergraduate Researcher | Image Processing and Computer Vision Lab, IIT Jodhpur Jodhpur, India | Aug 2018 – Aug 2020 Supervisor: Dr. Anil Kumar Tiwari
Worked on developing ML methods aimed for AI-based diagnosis and treatment support
Developed DL models for retinal vessel & skin lesion segmentation, and diagnosis of left-atrium in 3D GE-MRIs
Research Intern | The Multimedia Analytics, Networks and Systems Lab, IIT Mandi Mandi, India | May 2018 – Jul 2018 Supervisor: Dr. Aditya Nigam
Developed novel CNN model for iris segmentation which uses cascaded hourglass modules at the bottleneck of encoder-decoder design
We introduce TABES, a novel trajectory-aware entropy steering mechanism for masked diffusion models that improves token prediction through adaptive backward sampling guided by information-theoretic principles.
LumaFlux introduces physically-guided diffusion transformers for inverse tone mapping, lifting standard 8-bit content to HDR with physically accurate luminance expansion and color reproduction.
Rectified CFG++ for Flow Based Models S Saini, S Gupta, AC Bovik.
Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025) (also at 3rd CVPR Workshop on Generative Models)
PDF / ArXiv / Page / Code
Rectified CFG++ enhances conditional image generation with Rectified Flow models by adaptively correcting the latent trajectory. This method improves visual coherence and alignment with text prompts, outperforming existing samplers in generation quality and efficiency.
This study investigates the exploitation of diffusion model priors to achieve perceptual consistency in image quality assessment. By leveraging the inherent priors learned by diffusion models, the assessment of image quality is made more aligned with human perception, leading to more accurate and reliable evaluations.
Contrastive HDR-VQA introduces a deep contrastive representation learning approach for high dynamic range video quality assessment. By learning robust representations through contrastive learning, the method achieves state-of-the-art performance in predicting the quality of HDR videos.
CHUG is a crowdsourced dataset for HDR video quality, addressing the need for diverse, real-world content. It aids in developing more accurate and robust quality assessment models.
Prime-EditBench is a real-world benchmark designed to evaluate image and video editing with diffusion models, enabling standardized assessment of editing performance.
M2SLAe-Net introduces a multi-scale multi-level attention embedded network for improved retinal vessel segmentation. By integrating attention mechanisms at multiple scales and levels, the network achieves enhanced accuracy and robustness in segmenting retinal vessels, aiding in the diagnosis of various eye diseases.
B-SegNet introduces a branched SegMentor network for accurate skin lesion segmentation. By employing a branched architecture, the network effectively captures both local and global features of skin lesions, leading to improved segmentation performance and aiding in the diagnosis of skin cancer.
This paper presents a detector and SegMentor network for simultaneous skin lesion localization and segmentation. The network combines detection and segmentation tasks to provide a comprehensive solution for skin lesion analysis, enabling accurate localization and precise segmentation of lesions for improved diagnostic accuracy.
PixISegNet introduces a pixel-level iris segmentation network that utilizes a convolutional encoder-decoder architecture with a stacked hourglass bottleneck. This network achieves precise iris segmentation by effectively capturing both local and global features, making it suitable for various biometric applications.
This book chapter explores the use of encoder-decoder based deep learning techniques for iris segmentation in unconstrained environments. The proposed methods effectively handle challenges such as variations in lighting, occlusion, and off-angle images, making them suitable for real-world biometric applications.
This repository contains problems and solutions related to general inverse problems, as part of the CSE 393P course. It includes implementations and analyses of various inverse problem-solving techniques.
Implementation of an efficient SR3 DM for Super Resolution. This project explores the potential of pre-trained diffusion models to enhance the generalization ability and reduce computation costs in image super-resolution tasks.
Zero-shot Diffusion Model for Video Animation (Zero-DA) adapts image generation models to video production. This framework tackles the challenge of maintaining temporal uniformity across video frames using hierarchical cross-frame constraints.
This project aims to mitigate the inherent bias in recidivism score predictions by leveraging machine learning techniques to rectify and minimize biases towards gender and racial/ethnic groups.
This project proposes the use of transformers to learn long-range interactions with mutual self-attention between frames as a surrogate for motion estimation in video frame interpolation.
Professional Service
Reviewer and Program Committee Member
ICLR2025, 2026
ICML2025, 2026
CVPR2025, 2026
ICCV/ECCV2025, 2026
WACV2024, 2025, 2026
IEEE TIP2025, 2026
IEEE Trans. on Multimedia2024, 2025, 2026
TMLR2025, 2026
Other Service
Assistant Director, LIVE at UT Austin2025-Present
Volunteer, Internal Workshop on Deep Learning (IWDL), India2018
Established and ran LAMBDA Lab at IITJ2018-2020
Overall Head, Entrepreneurship and Innovation Cell at IITJ2018-2019
Assistant Head, Counselling Services at IITJ2018-2019