Google DeepMind Researchers Apply Semantic Evolution to Create Non Intuitive VAD-CFR and SHOR-PSRO Variants for Superior Algorithmic ConvergenceMarkTechPost In the competitive arena of Multi-Agent Reinforcement Learning (MARL), progress has long been bottlenecked by human intuition. For years, researchers have manually refined algorithms like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO), navigating a vast combinatorial space of update rules via trial-and-error. Google DeepMind research team has now shifted this paradigm with
The post Google DeepMind Researchers Apply Semantic Evolution to Create Non Intuitive VAD-CFR and SHOR-PSRO Variants for Superior Algorithmic Convergence appeared first on MarkTechPost.
In the competitive arena of Multi-Agent Reinforcement Learning (MARL), progress has long been bottlenecked by human intuition. For years, researchers have manually refined algorithms like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO), navigating a vast combinatorial space of update rules via trial-and-error. Google DeepMind research team has now shifted this paradigm with
The post Google DeepMind Researchers Apply Semantic Evolution to Create Non Intuitive VAD-CFR and SHOR-PSRO Variants for Superior Algorithmic Convergence appeared first on MarkTechPost. Read More
RAG vs. Context Stuffing: Why selective retrieval is more efficient and reliable than dumping all data into the promptMarkTechPost Large context windows have dramatically increased how much information modern language models can process in a single prompt. With models capable of handling hundreds of thousands—or even millions—of tokens, it’s easy to assume that Retrieval-Augmented Generation (RAG) is no longer necessary. If you can fit an entire codebase or documentation library into the context window,
The post RAG vs. Context Stuffing: Why selective retrieval is more efficient and reliable than dumping all data into the prompt appeared first on MarkTechPost.
Large context windows have dramatically increased how much information modern language models can process in a single prompt. With models capable of handling hundreds of thousands—or even millions—of tokens, it’s easy to assume that Retrieval-Augmented Generation (RAG) is no longer necessary. If you can fit an entire codebase or documentation library into the context window,
The post RAG vs. Context Stuffing: Why selective retrieval is more efficient and reliable than dumping all data into the prompt appeared first on MarkTechPost. Read More
Making Conformal Predictors Robust in Healthcare Settings: a Case Study on EEG Classificationcs.AI updates on arXiv.org arXiv:2602.19483v1 Announce Type: cross
Abstract: Quantifying uncertainty in clinical predictions is critical for high-stakes diagnosis tasks. Conformal prediction offers a principled approach by providing prediction sets with theoretical coverage guarantees. However, in practice, patient distribution shifts violate the i.i.d. assumptions underlying standard conformal methods, leading to poor coverage in healthcare settings. In this work, we evaluate several conformal prediction approaches on EEG seizure classification, a task with known distribution shift challenges and label uncertainty. We demonstrate that personalized calibration strategies can improve coverage by over 20 percentage points while maintaining comparable prediction set sizes. Our implementation is available via PyHealth, an open-source healthcare AI framework: https://github.com/sunlabuiuc/PyHealth.
arXiv:2602.19483v1 Announce Type: cross
Abstract: Quantifying uncertainty in clinical predictions is critical for high-stakes diagnosis tasks. Conformal prediction offers a principled approach by providing prediction sets with theoretical coverage guarantees. However, in practice, patient distribution shifts violate the i.i.d. assumptions underlying standard conformal methods, leading to poor coverage in healthcare settings. In this work, we evaluate several conformal prediction approaches on EEG seizure classification, a task with known distribution shift challenges and label uncertainty. We demonstrate that personalized calibration strategies can improve coverage by over 20 percentage points while maintaining comparable prediction set sizes. Our implementation is available via PyHealth, an open-source healthcare AI framework: https://github.com/sunlabuiuc/PyHealth. Read More
Soft Sequence Policy Optimization: Bridging GMPO and SAPOcs.AI updates on arXiv.org arXiv:2602.19327v1 Announce Type: cross
Abstract: A significant portion of recent research on Large Language Model (LLM) alignment focuses on developing new policy optimization methods based on Group Relative Policy Optimization (GRPO). Two prominent directions have emerged: (i) a shift toward sequence-level importance sampling weights that better align with the sequence-level rewards used in many tasks, and (ii) alternatives to PPO-style clipping that aim to avoid the associated loss of training signal and entropy collapse. Recent work, such as Soft Adaptive Policy Optimization (SAPO), reformulates the Scopic objective within the GRPO framework and achieves both sequence coherence and token adaptivity. Geometric-Mean Policy Optimization (GMPO) leverages token-wise ratio clipping within sequence importance sampling weights. Building on these ideas, this work proposes a new objective that promotes effective policy exploration while maintaining training stability. Specifically, we introduce Soft Sequence Policy Optimization, an off-policy reinforcement learning objective that incorporates soft gating functions over token-level probability ratios within sequence-level importance weights.
arXiv:2602.19327v1 Announce Type: cross
Abstract: A significant portion of recent research on Large Language Model (LLM) alignment focuses on developing new policy optimization methods based on Group Relative Policy Optimization (GRPO). Two prominent directions have emerged: (i) a shift toward sequence-level importance sampling weights that better align with the sequence-level rewards used in many tasks, and (ii) alternatives to PPO-style clipping that aim to avoid the associated loss of training signal and entropy collapse. Recent work, such as Soft Adaptive Policy Optimization (SAPO), reformulates the Scopic objective within the GRPO framework and achieves both sequence coherence and token adaptivity. Geometric-Mean Policy Optimization (GMPO) leverages token-wise ratio clipping within sequence importance sampling weights. Building on these ideas, this work proposes a new objective that promotes effective policy exploration while maintaining training stability. Specifically, we introduce Soft Sequence Policy Optimization, an off-policy reinforcement learning objective that incorporates soft gating functions over token-level probability ratios within sequence-level importance weights. Read More
Evaluating SAP RPT-1 for Enterprise Business Process Prediction: In-Context Learning vs. Traditional Machine Learning on Structured SAP Datacs.AI updates on arXiv.org arXiv:2602.19237v1 Announce Type: cross
Abstract: Tabular foundation models aim to make machine learning accessible for enterprise data without task-specific training. This paper presents the first independent evaluation of SAP’s Retrieval Pretrained Transformer (RPT-1) from a practitioner perspective. RPT-1 is a compact 64.6 MB model pretrained on 1.34 TB of structured data across 3.1 million tables. We benchmark it against tuned gradient-boosted decision trees (XGBoost, LightGBM, CatBoost) on three SAP business scenarios: demand forecasting across SD/MM/PP modules, predictive data integrity in BC/MM/QM, and financial risk classification in FI/CO/AR. Across five-fold cross-validation on datasets ranging from 2,500 to 3,200 rows, RPT-1 reaches 91-96% of tuned GBDT accuracy without any training examples. The classification gap is modest at 3.6-4.1 percentage points on AUC-ROC, though regression tasks show wider gaps of 8.9-11.1 percentage points on R-squared. An interesting finding is a crossover at roughly 75-100 context rows where RPT-1 actually outperforms XGBoost under limited data. Based on these results, we propose a practical hybrid workflow: use RPT-1 for rapid screening, then train GBDT selectively where prediction accuracy justifies the effort. All experiments are reproducible through publicly available Hugging Face Spaces.
arXiv:2602.19237v1 Announce Type: cross
Abstract: Tabular foundation models aim to make machine learning accessible for enterprise data without task-specific training. This paper presents the first independent evaluation of SAP’s Retrieval Pretrained Transformer (RPT-1) from a practitioner perspective. RPT-1 is a compact 64.6 MB model pretrained on 1.34 TB of structured data across 3.1 million tables. We benchmark it against tuned gradient-boosted decision trees (XGBoost, LightGBM, CatBoost) on three SAP business scenarios: demand forecasting across SD/MM/PP modules, predictive data integrity in BC/MM/QM, and financial risk classification in FI/CO/AR. Across five-fold cross-validation on datasets ranging from 2,500 to 3,200 rows, RPT-1 reaches 91-96% of tuned GBDT accuracy without any training examples. The classification gap is modest at 3.6-4.1 percentage points on AUC-ROC, though regression tasks show wider gaps of 8.9-11.1 percentage points on R-squared. An interesting finding is a crossover at roughly 75-100 context rows where RPT-1 actually outperforms XGBoost under limited data. Based on these results, we propose a practical hybrid workflow: use RPT-1 for rapid screening, then train GBDT selectively where prediction accuracy justifies the effort. All experiments are reproducible through publicly available Hugging Face Spaces. Read More
Kaiwu-PyTorch-Plugin: Bridging Deep Learning and Photonic Quantum Computing for Energy-Based Models and Active Sample Selectioncs.AI updates on arXiv.org arXiv:2602.19114v1 Announce Type: cross
Abstract: This paper introduces the Kaiwu-PyTorch-Plugin (KPP) to bridge Deep Learning and Photonic Quantum Computing across multiple dimensions. KPP integrates the Coherent Ising Machine into the PyTorch ecosystem, addressing classical inefficiencies in Energy-Based Models. The framework facilitates quantum integration in three key aspects: accelerating Boltzmann sampling, optimizing training data via Active Sampling, and constructing hybrid architectures like QBM-VAE and Q-Diffusion. Empirical results on single-cell and OpenWebText datasets demonstrate KPPs ability to achieve SOTA performance, validating a comprehensive quantum-classical paradigm.
arXiv:2602.19114v1 Announce Type: cross
Abstract: This paper introduces the Kaiwu-PyTorch-Plugin (KPP) to bridge Deep Learning and Photonic Quantum Computing across multiple dimensions. KPP integrates the Coherent Ising Machine into the PyTorch ecosystem, addressing classical inefficiencies in Energy-Based Models. The framework facilitates quantum integration in three key aspects: accelerating Boltzmann sampling, optimizing training data via Active Sampling, and constructing hybrid architectures like QBM-VAE and Q-Diffusion. Empirical results on single-cell and OpenWebText datasets demonstrate KPPs ability to achieve SOTA performance, validating a comprehensive quantum-classical paradigm. Read More
Adaptive Collaboration of Arena-Based Argumentative LLMs for Explainable and Contestable Legal Reasoningcs.AI updates on arXiv.org arXiv:2602.18916v1 Announce Type: cross
Abstract: Legal reasoning requires not only high accuracy but also the ability to justify decisions through verifiable and contestable arguments. However, existing Large Language Model (LLM) approaches, such as Chain-of-Thought (CoT) and Retrieval-Augmented Generation (RAG), often produce unstructured explanations that lack a formal mechanism for verification or user intervention. To address this limitation, we propose Adaptive Collaboration of Argumentative LLMs (ACAL), a neuro-symbolic framework that integrates adaptive multi-agent collaboration with an Arena-based Quantitative Bipolar Argumentation Framework (A-QBAF). ACAL dynamically deploys expert agent teams to construct arguments, employs a clash resolution mechanism to adjudicate conflicting claims, and utilizes uncertainty-aware escalation for borderline cases. Crucially, our framework supports a Human-in-the-Loop (HITL) contestability workflow, enabling users to directly audit and modify the underlying reasoning graph to influence the final judgment. Empirical evaluations on the LegalBench benchmark demonstrate that ACAL outperforms strong baselines across Gemini-2.5-Flash-Lite and Gemini-2.5-Flash architectures, effectively balancing efficient predictive performance with structured transparency and contestability. Our implementation is available at: https://github.com/loc110504/ACAL.
arXiv:2602.18916v1 Announce Type: cross
Abstract: Legal reasoning requires not only high accuracy but also the ability to justify decisions through verifiable and contestable arguments. However, existing Large Language Model (LLM) approaches, such as Chain-of-Thought (CoT) and Retrieval-Augmented Generation (RAG), often produce unstructured explanations that lack a formal mechanism for verification or user intervention. To address this limitation, we propose Adaptive Collaboration of Argumentative LLMs (ACAL), a neuro-symbolic framework that integrates adaptive multi-agent collaboration with an Arena-based Quantitative Bipolar Argumentation Framework (A-QBAF). ACAL dynamically deploys expert agent teams to construct arguments, employs a clash resolution mechanism to adjudicate conflicting claims, and utilizes uncertainty-aware escalation for borderline cases. Crucially, our framework supports a Human-in-the-Loop (HITL) contestability workflow, enabling users to directly audit and modify the underlying reasoning graph to influence the final judgment. Empirical evaluations on the LegalBench benchmark demonstrate that ACAL outperforms strong baselines across Gemini-2.5-Flash-Lite and Gemini-2.5-Flash architectures, effectively balancing efficient predictive performance with structured transparency and contestability. Our implementation is available at: https://github.com/loc110504/ACAL. Read More
Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logiccs.AI updates on arXiv.org arXiv:2602.18607v1 Announce Type: new
Abstract: In CAS adaptation, a challenge is to define the dynamic architecture of the system and changes in its behavior. Implementation-wise, this is projected into an adaptation mechanism, typically realized as an Adaptation Manager (AM). With the advances of generative LLMs, generating AM code based on system specification and desired AM behavior (partially in natural language) is a tempting opportunity. The recent introduction of vibe coding suggests a way to target the problem of the correctness of generated code by iterative testing and vibe coding feedback loops instead of direct code inspection.
In this paper, we show that generating an AM via vibe coding feedback loops is a viable option when the verification of the generated AM is based on a very precise formulation of the functional requirements. We specify these as constraints in a novel temporal logic FCL that allows us to express the behavior of traces with much finer granularity than classical LTL enables.
Furthermore, we show that by combining the adaptation and vibe coding feedback loops where the FCL constraints are evaluated for the current system state, we achieved good results in the experiments with generating AMs for two example systems from the CAS domain. Typically, just a few feedback loop iterations were necessary, each feeding the LLM with reports describing detailed violations of the constraints. This AM testing was combined with high run path coverage achieved by different initial settings.
arXiv:2602.18607v1 Announce Type: new
Abstract: In CAS adaptation, a challenge is to define the dynamic architecture of the system and changes in its behavior. Implementation-wise, this is projected into an adaptation mechanism, typically realized as an Adaptation Manager (AM). With the advances of generative LLMs, generating AM code based on system specification and desired AM behavior (partially in natural language) is a tempting opportunity. The recent introduction of vibe coding suggests a way to target the problem of the correctness of generated code by iterative testing and vibe coding feedback loops instead of direct code inspection.
In this paper, we show that generating an AM via vibe coding feedback loops is a viable option when the verification of the generated AM is based on a very precise formulation of the functional requirements. We specify these as constraints in a novel temporal logic FCL that allows us to express the behavior of traces with much finer granularity than classical LTL enables.
Furthermore, we show that by combining the adaptation and vibe coding feedback loops where the FCL constraints are evaluated for the current system state, we achieved good results in the experiments with generating AMs for two example systems from the CAS domain. Typically, just a few feedback loop iterations were necessary, each feeding the LLM with reports describing detailed violations of the constraints. This AM testing was combined with high run path coverage achieved by different initial settings. Read More
Beyond Description: A Multimodal Agent Framework for Insightful Chart Summarizationcs.AI updates on arXiv.org arXiv:2602.18731v1 Announce Type: new
Abstract: Chart summarization is crucial for enhancing data accessibility and the efficient consumption of information. However, existing methods, including those with Multimodal Large Language Models (MLLMs), primarily focus on low-level data descriptions and often fail to capture the deeper insights which are the fundamental purpose of data visualization. To address this challenge, we propose Chart Insight Agent Flow, a plan-and-execute multi-agent framework effectively leveraging the perceptual and reasoning capabilities of MLLMs to uncover profound insights directly from chart images. Furthermore, to overcome the lack of suitable benchmarks, we introduce ChartSummInsights, a new dataset featuring a diverse collection of real-world charts paired with high-quality, insightful summaries authored by human data analysis experts. Experimental results demonstrate that our method significantly improves the performance of MLLMs on the chart summarization task, producing summaries with deep and diverse insights.
arXiv:2602.18731v1 Announce Type: new
Abstract: Chart summarization is crucial for enhancing data accessibility and the efficient consumption of information. However, existing methods, including those with Multimodal Large Language Models (MLLMs), primarily focus on low-level data descriptions and often fail to capture the deeper insights which are the fundamental purpose of data visualization. To address this challenge, we propose Chart Insight Agent Flow, a plan-and-execute multi-agent framework effectively leveraging the perceptual and reasoning capabilities of MLLMs to uncover profound insights directly from chart images. Furthermore, to overcome the lack of suitable benchmarks, we introduce ChartSummInsights, a new dataset featuring a diverse collection of real-world charts paired with high-quality, insightful summaries authored by human data analysis experts. Experimental results demonstrate that our method significantly improves the performance of MLLMs on the chart summarization task, producing summaries with deep and diverse insights. Read More
HEHRGNN: A Unified Embedding Model for Knowledge Graphs with Hyperedges and Hyper-Relational Edgescs.AI updates on arXiv.org arXiv:2602.18897v1 Announce Type: cross
Abstract: Knowledge Graph(KG) has gained traction as a machine-readable organization of real-world knowledge for analytics using artificial intelligence systems. Graph Neural Network(GNN), is proven to be an effective KG embedding technique that enables various downstream tasks like link prediction, node classification, and graph classification. The focus of research in both KG embedding and GNNs has been mostly oriented towards simple graphs with binary relations. However, real-world knowledge bases have a significant share of complex and n-ary facts that cannot be represented by binary edges. More specifically, real-world knowledge bases are often a mix of two types of n-ary facts – (i) that require hyperedges and (ii) that require hyper-relational edges. Though there are research efforts catering to these n-ary fact types, they are pursued independently for each type. We propose $H$yper$E$dge $H$yper-$R$elational edge $GNN$(HEHRGNN), a unified embedding model for n-ary relational KGs with both hyperedges and hyper-relational edges. The two main components of the model are i)HEHR unified fact representation format, and ii)HEHRGNN encoder, a GNN-based encoder with a novel message propagation model capable of capturing complex graph structures comprising both hyperedges and hyper-relational edges. The experimental results of HEHRGNN on link prediction tasks show its effectiveness as a unified embedding model, with inductive prediction capability, for link prediction across real-world datasets having different types of n-ary facts. The model also shows improved link prediction performance over baseline models for hyperedge and hyper-relational datasets.
arXiv:2602.18897v1 Announce Type: cross
Abstract: Knowledge Graph(KG) has gained traction as a machine-readable organization of real-world knowledge for analytics using artificial intelligence systems. Graph Neural Network(GNN), is proven to be an effective KG embedding technique that enables various downstream tasks like link prediction, node classification, and graph classification. The focus of research in both KG embedding and GNNs has been mostly oriented towards simple graphs with binary relations. However, real-world knowledge bases have a significant share of complex and n-ary facts that cannot be represented by binary edges. More specifically, real-world knowledge bases are often a mix of two types of n-ary facts – (i) that require hyperedges and (ii) that require hyper-relational edges. Though there are research efforts catering to these n-ary fact types, they are pursued independently for each type. We propose $H$yper$E$dge $H$yper-$R$elational edge $GNN$(HEHRGNN), a unified embedding model for n-ary relational KGs with both hyperedges and hyper-relational edges. The two main components of the model are i)HEHR unified fact representation format, and ii)HEHRGNN encoder, a GNN-based encoder with a novel message propagation model capable of capturing complex graph structures comprising both hyperedges and hyper-relational edges. The experimental results of HEHRGNN on link prediction tasks show its effectiveness as a unified embedding model, with inductive prediction capability, for link prediction across real-world datasets having different types of n-ary facts. The model also shows improved link prediction performance over baseline models for hyperedge and hyper-relational datasets. Read More