Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers AI updates on arXiv.org

_ December 26, 2025_ Tech Jacks Solutions_ 0 Comments

arXiv:2510.00915v3 Announce Type: replace-cross
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) replaces costly human labeling with automated verifiers. To reduce verifier hacking, many RLVR systems binarize rewards to ${0,1}$, but imperfect verifiers inevitably introduce emph{false negatives} (rejecting correct answers) and emph{false positives} (accepting incorrect ones). We formalize verifier unreliability as a stochastic reward channel with asymmetric noise rates $rho_0$ and $rho_1$ — the FP rate and the FN rate, respectively. From this abstraction we derive two lightweight corrections: (i) a emph{backward} correction that yields an unbiased surrogate reward and thus an unbiased policy-gradient estimator in expectation, and (ii) a emph{forward} correction that reweights score-function terms so the expected update aligns with the clean gradient direction and requires only the FN rate. We implement both as lightweight hooks in a group relative policy optimization pipeline, both corrections improve RLVR for math reasoning under synthetic and real verifier noise, with the forward variant being more stable under heavier noise. Finally, an appeals mechanism with a lightweight LLM verifier estimates the FN rate online and further improves performance. Read More

Author

Gallery

Contacts

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone

Gallery

Contacts

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers AI updates on arXiv.org

Tech Jacks Solutions

When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation AI updates on arXiv.org

AIAuditTrack: A Framework for AI Security system AI updates on arXiv.org

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone