HPSv3: Towards Wide-Spectrum Human Preference Score

1Mizzen AI   2CUHK MMLab   3King's College London   4Shanghai Jiaotong University  
5Shanghai AI Laboratory   6CPII, InnoHK
ICCV 2025

*Equal Contribution   Equal Advising
Teaser Image

We introduce HPSv3, a model designed to align with human preferences and HPDv3, a wide-spectrum human preference dataset containing both real images and generated images.

HPDv3 Dataset Visualization

We present Human Preference Dataset v3 (HPDv3), a comprehensive dataset containing 1.17 million binary preference choices across 1.08 million images, grouped in pairs by prompt.

Abstract

Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Our contributions provide a robust benchmark for full-spectrum evaluation and introduce a scalable, human-aligned approach to improving image generation quality. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality.

Contributions

  • We propose the wide-spectrum human preference dataset HPDv3 by integrating high-quality real-world images and state-of-the-art generative model outputs, including 1.08M text-image pairs and 1.17M annotated pairwise comparisons. This serves as a nuanced benchmark for evaluating generative models.
  • We introduce HPSv3, a human preference model trained with HPDv3, which leverages the feature of VLMs and is trained using an uncertainty-aware ranking loss to discern subtle differences in training samples.
  • We introduce CoHP, a novel reasoning approach to enhance image generation quality by iteratively refining outputs using HPSv3.

Model Architecture & COHP Method

Left: HPSv3 employs a VLM backbone to extract rich semantic representations from images and captions, then utilizes uncertainty-aware ranking to effectively learn human preferences from paired comparison data.
Right: CoHP incorporates both model preferences and sample references, selected through HPSv3, to build a thinking-and-choosing image generation process.

HPSv3 Benchmark

Models All Characters Arts Design Architecture Animals Natural Scenery Transportation Products Plants Food Science Others
Kolors 10.55 11.79 10.47 9.87 10.82 10.60 9.89 10.68 10.93 10.50 10.63 11.06 9.51
Flux-dev 10.43 11.70 10.32 9.39 10.93 10.38 10.01 10.84 11.24 10.21 10.38 11.24 9.16
Playground-v2.5 10.27 11.07 9.84 9.64 10.45 10.38 9.94 10.51 10.62 10.15 10.62 10.84 9.39
Infinity 10.26 11.17 9.95 9.43 10.36 9.27 10.11 10.36 10.59 10.08 10.30 10.59 9.62
CogView4 9.61 10.72 9.86 9.33 9.88 9.16 9.45 9.69 9.86 9.45 9.49 10.16 8.97
PixArt-Σ 9.37 10.08 9.07 8.41 9.83 8.86 8.87 9.44 9.47 9.52 9.73 10.35 8.58
Gemini 2.0 Flash 9.21 9.98 8.44 7.64 10.11 9.42 9.01 9.74 9.64 9.55 10.16 7.61 9.23
SDXL 8.20 8.67 7.63 7.53 8.57 8.18 7.76 8.65 8.85 8.32 8.43 8.78 7.29
HunyuanDit 8.19 7.96 8.11 8.28 8.71 7.24 7.86 8.33 8.55 8.28 8.31 8.48 8.20
SD3-Medium 5.31 6.70 5.98 5.15 5.25 4.09 5.24 4.25 5.71 5.84 6.01 5.71 4.58
SD2 -0.24 -0.34 -0.56 -1.35 -0.24 -0.54 -0.32 1.00 1.11 -0.01 -0.38 -0.38 -0.84

Chain-of-Human-Preference (CoHP) Visualization

Chain-of-Human-Preference (CoHP) is an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step.

Comparison of CoHP using Different Human Preference Model

Slide 1

HPSv3 As Reward Model

We employ DanceGRPO as the reinforcement learning algorithm for image generation and use HPSv3 as the reward model to refine SD1.4.

Comparison of HPSv3 with Other Human Preference Models as Reward Model

Slide 1

BibTeX

@misc{ma2025hpsv3widespectrumhumanpreference,
        title={HPSv3: Towards Wide-Spectrum Human Preference Score}, 
        author={Yuhang Ma and Xiaoshi Wu and Keqiang Sun and Hongsheng Li},
        year={2025},
        eprint={2508.03789},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2508.03789}, 
  }