Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL AI updates on arXiv.org

_ February 5, 2026_ Tech Jacks Solutions_ 0 Comments

arXiv:2602.04089v1 Announce Type: new
Abstract: Large language models (LLMs) achieve strong performance when all task-relevant information is available upfront, as in static prediction and instruction-following problems. However, many real-world decision-making tasks are inherently online: crucial information must be acquired through interaction, feedback is delayed, and effective behavior requires balancing information collection and exploitation over time. While in-context learning enables adaptation without weight updates, existing LLMs often struggle to reliably leverage in-context interaction experience in such settings. In this work, we show that this limitation can be addressed through training. We introduce ORBIT, a multi-task, multi-episode meta-reinforcement learning framework that trains LLMs to learn from interaction in context. After meta-training, a relatively small open-source model (Qwen3-14B) demonstrates substantially improved in-context online learning on entirely unseen environments, matching the performance of GPT-5.2 and outperforming standard RL fine-tuning by a large margin. Scaling experiments further reveal consistent gains with model size, suggesting significant headroom for learn-at-inference-time decision-making agents. Code reproducing the results in the paper can be found at https://github.com/XiaofengLin7/ORBIT. Read More

Author

Gallery

Contacts

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone

Gallery

Contacts

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL AI updates on arXiv.org

Tech Jacks Solutions

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines AI updates on arXiv.org

ThreatsDay Bulletin: Codespaces RCE, AsyncRAT C2, BYOVD Abuse, AI Cloud Intrusions & 15+ Stories The Hacker Newsinfo@thehackernews.com (The Hacker News)

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone