Retailers like Kroger and Lowe’s test AI agents without handing control to GoogleAI News Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News.
Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News. Read More
The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance riskAI News Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News.
Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News. Read More
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agentscs.AI updates on arXiv.org arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids.
arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids. Read More
SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Modelscs.AI updates on arXiv.org arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs.
arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs. Read More
A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorchMarkTechPost In this tutorial, we demonstrate a realistic data poisoning attack by manipulating labels in the CIFAR-10 dataset and observing its impact on model behavior. We construct a clean and a poisoned training pipeline side by side, using a ResNet-style convolutional network to ensure stable, comparable learning dynamics. By selectively flipping a fraction of samples from
The post A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorch appeared first on MarkTechPost.
In this tutorial, we demonstrate a realistic data poisoning attack by manipulating labels in the CIFAR-10 dataset and observing its impact on model behavior. We construct a clean and a poisoned training pipeline side by side, using a ResNet-style convolutional network to ensure stable, comparable learning dynamics. By selectively flipping a fraction of samples from
The post A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorch appeared first on MarkTechPost. Read More
Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL ToolkitMarkTechPost What does an end to end stack for terminal agents look like when you combine structured toolkits, synthetic RL environments, and benchmark aligned evaluation? A team of researchers from CAMEL AI, Eigent AI and other collaborators have released SETA, a toolkit and environment stack that focuses on reinforcement learning for terminal agents. The project targets
The post Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL Toolkit appeared first on MarkTechPost.
What does an end to end stack for terminal agents look like when you combine structured toolkits, synthetic RL environments, and benchmark aligned evaluation? A team of researchers from CAMEL AI, Eigent AI and other collaborators have released SETA, a toolkit and environment stack that focuses on reinforcement learning for terminal agents. The project targets
The post Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL Toolkit appeared first on MarkTechPost. Read More
Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car ExampleTowards Data Science Walkthrough using open-source prompt optimization algorithms in Python to improve the accuracy of an autonomous vehicle car safety agent running on OpenAI’s GPT 5.2
The post Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car Example appeared first on Towards Data Science.
Walkthrough using open-source prompt optimization algorithms in Python to improve the accuracy of an autonomous vehicle car safety agent running on OpenAI’s GPT 5.2
The post Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car Example appeared first on Towards Data Science. Read More
Beyond the Flat Table: Building an Enterprise-Grade Financial Model in Power BI Towards Data Science
Beyond the Flat Table: Building an Enterprise-Grade Financial Model in Power BITowards Data Science A step-by-step journey through data transformation, star schema modeling, and DAX variance analysis with lessons learned along the way.
The post Beyond the Flat Table: Building an Enterprise-Grade Financial Model in Power BI appeared first on Towards Data Science.
A step-by-step journey through data transformation, star schema modeling, and DAX variance analysis with lessons learned along the way.
The post Beyond the Flat Table: Building an Enterprise-Grade Financial Model in Power BI appeared first on Towards Data Science. Read More
How to Leverage Slash Commands to Code EffectivelyTowards Data Science Learn how I utilize slash commands to be a more efficient engineer
The post How to Leverage Slash Commands to Code Effectively appeared first on Towards Data Science.
Learn how I utilize slash commands to be a more efficient engineer
The post How to Leverage Slash Commands to Code Effectively appeared first on Towards Data Science. Read More
Federated Learning, Part 1: The Basics of Training Models Where the Data LivesTowards Data Science Understanding the foundations of federated learning
The post Federated Learning, Part 1: The Basics of Training Models Where the Data Lives appeared first on Towards Data Science.
Understanding the foundations of federated learning
The post Federated Learning, Part 1: The Basics of Training Models Where the Data Lives appeared first on Towards Data Science. Read More