Approximating Human Preferences Using a Multi-Judge Learned System AI updates on arXiv.org

_ October 31, 2025_ Tech Jacks Solutions_ 0 Comments

arXiv:2510.25884v1 Announce Type: new
Abstract: Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming this challenge advances key applications, such as creating reliable reward models for Reinforcement Learning from Human Feedback (RLHF) and building effective routing systems that select the best-suited model for a given user query. In this work, we propose a framework for modeling diverse, persona-based preferences by learning to aggregate outputs from multiple rubric-conditioned judges. We investigate the performance of this approach against naive baselines and assess its robustness through case studies on both human and LLM-judges biases. Our primary contributions include a persona-based method for synthesizing preference labels at scale and two distinct implementations of our aggregator: Generalized Additive Model (GAM) and a Multi-Layer Perceptron (MLP). Read More

Author

Gallery

Contacts

Approximating Human Preferences Using a Multi-Judge Learned System AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Services

Learn

Company

Gallery

Contacts

Approximating Human Preferences Using a Multi-Judge Learned System AI updates on arXiv.org

Tech Jacks Solutions

SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications AI updates on arXiv.org

A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation AI updates on arXiv.org

Leave a comment Cancel reply

Services

Learn

Company