HPDv3 Dataset Visualization

We present Human Preference Dataset v3 (HPDv3), a comprehensive dataset containing 1.17 million binary preference choices across 1.08 million images, grouped in pairs by prompt.

Abstract

Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Our contributions provide a robust benchmark for full-spectrum evaluation and introduce a scalable, human-aligned approach to improving image generation quality. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality.

Contributions

We propose the wide-spectrum human preference dataset HPDv3 by integrating high-quality real-world images and state-of-the-art generative model outputs, including 1.08M text-image pairs and 1.17M annotated pairwise comparisons. This serves as a nuanced benchmark for evaluating generative models.
We introduce HPSv3, a human preference model trained with HPDv3, which leverages the feature of VLMs and is trained using an uncertainty-aware ranking loss to discern subtle differences in training samples.
We introduce CoHP, a novel reasoning approach to enhance image generation quality by iteratively refining outputs using HPSv3.

Model Architecture & COHP Method

Left: HPSv3 employs a VLM backbone to extract rich semantic representations from images and captions, then utilizes uncertainty-aware ranking to effectively learn human preferences from paired comparison data.

Right: CoHP incorporates both model preferences and sample references, selected through HPSv3, to build a thinking-and-choosing image generation process.

HPSv3 Benchmark

Models	All	Characters	Arts	Design	Architecture	Animals	Natural Scenery	Transportation	Products	Plants	Food	Science	Others
Kolors	10.55	11.79	10.47	9.87	10.82	10.60	9.89	10.68	10.93	10.50	10.63	11.06	9.51
Flux-dev	10.43	11.70	10.32	9.39	10.93	10.38	10.01	10.84	11.24	10.21	10.38	11.24	9.16
Playground-v2.5	10.27	11.07	9.84	9.64	10.45	10.38	9.94	10.51	10.62	10.15	10.62	10.84	9.39
Infinity	10.26	11.17	9.95	9.43	10.36	9.27	10.11	10.36	10.59	10.08	10.30	10.59	9.62
CogView4	9.61	10.72	9.86	9.33	9.88	9.16	9.45	9.69	9.86	9.45	9.49	10.16	8.97
PixArt-Σ	9.37	10.08	9.07	8.41	9.83	8.86	8.87	9.44	9.47	9.52	9.73	10.35	8.58
Gemini 2.0 Flash	9.21	9.98	8.44	7.64	10.11	9.42	9.01	9.74	9.64	9.55	10.16	7.61	9.23
SDXL	8.20	8.67	7.63	7.53	8.57	8.18	7.76	8.65	8.85	8.32	8.43	8.78	7.29
HunyuanDit	8.19	7.96	8.11	8.28	8.71	7.24	7.86	8.33	8.55	8.28	8.31	8.48	8.20
SD3-Medium	5.31	6.70	5.98	5.15	5.25	4.09	5.24	4.25	5.71	5.84	6.01	5.71	4.58
SD2	-0.24	-0.34	-0.56	-1.35	-0.24	-0.54	-0.32	1.00	1.11	-0.01	-0.38	-0.38	-0.84

Chain-of-Human-Preference (CoHP) Visualization

Chain-of-Human-Preference (CoHP) is an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step.

Model-wise Preference Sample-wise Preference

Comparison of CoHP using Different Human Preference Model

HPSv3 As Reward Model

We employ DanceGRPO as the reinforcement learning algorithm for image generation and use HPSv3 as the reward model to refine SD1.4.

Comparison of HPSv3 with Other Human Preference Models as Reward Model

BibTeX

@misc{ma2025hpsv3widespectrumhumanpreference,
        title={HPSv3: Towards Wide-Spectrum Human Preference Score}, 
        author={Yuhang Ma and Xiaoshi Wu and Keqiang Sun and Hongsheng Li},
        year={2025},
        eprint={2508.03789},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2508.03789}, 
  }

HPSv3: Towards Wide-Spectrum Human Preference Score

We introduce HPSv3, a model designed to align with human preferences and HPDv3, a wide-spectrum human preference dataset containing both real images and generated images.