Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space AI updates on arXiv.org

_ February 13, 2026_ Tech Jacks Solutions_ 0 Comments

arXiv:2510.26219v2 Announce Type: replace-cross
Abstract: Test-time alignment of large language models (LLMs) attracts attention because fine-tuning LLMs requires high computational costs. In this paper, we propose a new test-time alignment method called adaptive importance sampling on pre-logits (AISP) on the basis of the sampling-based model predictive control with the stochastic control input. AISP applies the Gaussian perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation. We demonstrate that the optimal mean is obtained by importance sampling with sampled rewards. AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods. Read More

Author

Gallery

Contacts

Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Services

Learn

Company

Gallery

Contacts

Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space AI updates on arXiv.org

Tech Jacks Solutions

Fin-RATE: A Real-world Financial Analytics and Tracking Evaluation Benchmark for LLMs on SEC Filings AI updates on arXiv.org

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition AI updates on arXiv.org

Leave a comment Cancel reply

Services

Learn

Company