The Download: what’s next for AI agents, and how Trump protects US tech companies overseasMIT Technology Reviewon July 23, 2025 at 12:10 pm This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Navigating the rise of AI agents AI agents is a buzzy term that essentially refers to AI models and algorithms that can not only provide you with information, but take actions on your…
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Navigating the rise of AI agents AI agents is a buzzy term that essentially refers to AI models and algorithms that can not only provide you with information, but take actions on your… Read More
Sam Altman: AI will cause job losses and national security threatsAI Newson July 23, 2025 at 10:57 am In the halls of power in Washington, OpenAI’s chief, Sam Altman, warned of total job losses from AI and how national security is being rewritten. Altman positions OpenAI as not just a participant, but as the essential architect of our destiny. Holding court at the Federal Reserve’s conference for large banks, Altman clearly stated how
The post Sam Altman: AI will cause job losses and national security threats appeared first on AI News.
In the halls of power in Washington, OpenAI’s chief, Sam Altman, warned of total job losses from AI and how national security is being rewritten. Altman positions OpenAI as not just a participant, but as the essential architect of our destiny. Holding court at the Federal Reserve’s conference for large banks, Altman clearly stated how
The post Sam Altman: AI will cause job losses and national security threats appeared first on AI News. Read More
Beyond Algorethics: Addressing the Ethical and Anthropological Challenges of AI Recommender Systemscs.AI updates on arXiv.orgon July 23, 2025 at 4:00 am arXiv:2507.16430v1 Announce Type: cross
Abstract: In this paper, I examine the ethical and anthropological challenges posed by AI-driven recommender systems (RSs), which have become central to shaping digital environments and social interactions. By curating personalized content, RSs do not merely reflect user preferences but actively construct individual experiences across social media, entertainment platforms, and e-commerce. Despite their ubiquity, the ethical implications of RSs remain insufficiently explored, even as concerns over privacy, autonomy, and mental well-being intensify. I argue that existing ethical approaches, including algorethics, the effort to embed ethical principles into algorithmic design, are necessary but ultimately inadequate. RSs inherently reduce human complexity to quantifiable dimensions, exploit user vulnerabilities, and prioritize engagement over well-being. Addressing these concerns requires moving beyond purely technical solutions. I propose a comprehensive framework for human-centered RS design, integrating interdisciplinary perspectives, regulatory strategies, and educational initiatives to ensure AI systems foster rather than undermine human autonomy and societal flourishing.
arXiv:2507.16430v1 Announce Type: cross
Abstract: In this paper, I examine the ethical and anthropological challenges posed by AI-driven recommender systems (RSs), which have become central to shaping digital environments and social interactions. By curating personalized content, RSs do not merely reflect user preferences but actively construct individual experiences across social media, entertainment platforms, and e-commerce. Despite their ubiquity, the ethical implications of RSs remain insufficiently explored, even as concerns over privacy, autonomy, and mental well-being intensify. I argue that existing ethical approaches, including algorethics, the effort to embed ethical principles into algorithmic design, are necessary but ultimately inadequate. RSs inherently reduce human complexity to quantifiable dimensions, exploit user vulnerabilities, and prioritize engagement over well-being. Addressing these concerns requires moving beyond purely technical solutions. I propose a comprehensive framework for human-centered RS design, integrating interdisciplinary perspectives, regulatory strategies, and educational initiatives to ensure AI systems foster rather than undermine human autonomy and societal flourishing. Read More
A Well-Designed Experiment Can Teach You More Than a Time Machine!Towards Data Scienceon July 23, 2025 at 2:50 am How experimentation is more powerful than knowing counterfactuals
The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science.
How experimentation is more powerful than knowing counterfactuals
The post A Well-Designed Experiment Can Teach You More Than a Time Machine! appeared first on Towards Data Science. Read More
When LLMs Try to Reason: Experiments in Text and Vision-Based AbstractionTowards Data Scienceon July 22, 2025 at 7:35 pm Can large language models learn to reason abstractly from just a few examples? In this piece, I explore this question by testing both text-based (o3-mini) and image-capable (gpt-4.1) models on abstract grid transformation tasks. These experiments reveal the extent to which current models rely on pattern matching, procedural heuristics, and symbolic shortcuts rather than robust generalization. Even with multimodal inputs, reasoning often breaks down in the face of subtle abstraction. The results offer a window into the current capabilities and limitations of in-context meta-learning with LLMs.
The post When LLMs Try to Reason: Experiments in Text and Vision-Based Abstraction appeared first on Towards Data Science.
Can large language models learn to reason abstractly from just a few examples? In this piece, I explore this question by testing both text-based (o3-mini) and image-capable (gpt-4.1) models on abstract grid transformation tasks. These experiments reveal the extent to which current models rely on pattern matching, procedural heuristics, and symbolic shortcuts rather than robust generalization. Even with multimodal inputs, reasoning often breaks down in the face of subtle abstraction. The results offer a window into the current capabilities and limitations of in-context meta-learning with LLMs.
The post When LLMs Try to Reason: Experiments in Text and Vision-Based Abstraction appeared first on Towards Data Science. Read More
The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineeringcs.AI updates on arXiv.orgon July 22, 2025 at 4:00 am arXiv:2507.15003v1 Announce Type: cross
Abstract: The future of software engineering–SE 3.0–is unfolding with the rise of AI teammates: autonomous, goal-driven systems collaborating with human developers. Among these, autonomous coding agents are especially transformative, now actively initiating, reviewing, and evolving code at scale. This paper introduces AIDev, the first large-scale dataset capturing how such agents operate in the wild. Spanning over 456,000 pull requests by five leading agents–OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code–across 61,000 repositories and 47,000 developers, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.
Unlike prior work that has largely theorized the rise of AI-native software engineering, AIDev offers structured, open data to support research in benchmarking, agent readiness, optimization, collaboration modeling, and AI governance. The dataset includes rich metadata on PRs, authorship, review timelines, code changes, and integration outcomes–enabling exploration beyond synthetic benchmarks like SWE-bench. For instance, although agents often outperform humans in speed, their PRs are accepted less frequently, revealing a trust and utility gap. Furthermore, while agents accelerate code submission–one developer submitted as many PRs in three days as they had in three years–these are structurally simpler (via code complexity metrics).
We envision AIDev as a living resource: extensible, analyzable, and ready for the SE and AI communities. Grounding SE 3.0 in real-world evidence, AIDev enables a new generation of research into AI-native workflows and supports building the next wave of symbiotic human-AI collaboration. The dataset is publicly available at https://github.com/SAILResearch/AI_Teammates_in_SE3.
> AI Agent, Agentic AI, Coding Agent, Agentic Coding, Software Engineering Agent
arXiv:2507.15003v1 Announce Type: cross
Abstract: The future of software engineering–SE 3.0–is unfolding with the rise of AI teammates: autonomous, goal-driven systems collaborating with human developers. Among these, autonomous coding agents are especially transformative, now actively initiating, reviewing, and evolving code at scale. This paper introduces AIDev, the first large-scale dataset capturing how such agents operate in the wild. Spanning over 456,000 pull requests by five leading agents–OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code–across 61,000 repositories and 47,000 developers, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.
Unlike prior work that has largely theorized the rise of AI-native software engineering, AIDev offers structured, open data to support research in benchmarking, agent readiness, optimization, collaboration modeling, and AI governance. The dataset includes rich metadata on PRs, authorship, review timelines, code changes, and integration outcomes–enabling exploration beyond synthetic benchmarks like SWE-bench. For instance, although agents often outperform humans in speed, their PRs are accepted less frequently, revealing a trust and utility gap. Furthermore, while agents accelerate code submission–one developer submitted as many PRs in three days as they had in three years–these are structurally simpler (via code complexity metrics).
We envision AIDev as a living resource: extensible, analyzable, and ready for the SE and AI communities. Grounding SE 3.0 in real-world evidence, AIDev enables a new generation of research into AI-native workflows and supports building the next wave of symbiotic human-AI collaboration. The dataset is publicly available at https://github.com/SAILResearch/AI_Teammates_in_SE3.
> AI Agent, Agentic AI, Coding Agent, Agentic Coding, Software Engineering Agent Read More
Benchmarking Foundation Models with Multimodal Public Electronic Health Recordscs.AI updates on arXiv.orgon July 22, 2025 at 4:00 am arXiv:2507.14824v1 Announce Type: cross
Abstract: Foundation models have emerged as a powerful approach for processing electronic health records (EHRs), offering flexibility to handle diverse medical data modalities. In this study, we present a comprehensive benchmark that evaluates the performance, fairness, and interpretability of foundation models, both as unimodal encoders and as multimodal learners, using the publicly available MIMIC-IV database. To support consistent and reproducible evaluation, we developed a standardized data processing pipeline that harmonizes heterogeneous clinical records into an analysis-ready format. We systematically compared eight foundation models, encompassing both unimodal and multimodal models, as well as domain-specific and general-purpose variants. Our findings demonstrate that incorporating multiple data modalities leads to consistent improvements in predictive performance without introducing additional bias. Through this benchmark, we aim to support the development of effective and trustworthy multimodal artificial intelligence (AI) systems for real-world clinical applications. Our code is available at https://github.com/nliulab/MIMIC-Multimodal.
arXiv:2507.14824v1 Announce Type: cross
Abstract: Foundation models have emerged as a powerful approach for processing electronic health records (EHRs), offering flexibility to handle diverse medical data modalities. In this study, we present a comprehensive benchmark that evaluates the performance, fairness, and interpretability of foundation models, both as unimodal encoders and as multimodal learners, using the publicly available MIMIC-IV database. To support consistent and reproducible evaluation, we developed a standardized data processing pipeline that harmonizes heterogeneous clinical records into an analysis-ready format. We systematically compared eight foundation models, encompassing both unimodal and multimodal models, as well as domain-specific and general-purpose variants. Our findings demonstrate that incorporating multiple data modalities leads to consistent improvements in predictive performance without introducing additional bias. Through this benchmark, we aim to support the development of effective and trustworthy multimodal artificial intelligence (AI) systems for real-world clinical applications. Our code is available at https://github.com/nliulab/MIMIC-Multimodal. Read More
A Reproducibility Study of Product-side Fairness in Bundle Recommendationcs.AI updates on arXiv.orgon July 22, 2025 at 4:00 am arXiv:2507.14352v1 Announce Type: cross
Abstract: Recommender systems are known to exhibit fairness issues, particularly on the product side, where products and their associated suppliers receive unequal exposure in recommended results. While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored. This emerging task introduces additional complexity: recommendations are generated at the bundle level, yet user satisfaction and product (or supplier) exposure depend on both the bundle and the individual items it contains. Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting. In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods. We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns. Our results show that exposure patterns differ notably between bundles and items, revealing the need for fairness interventions that go beyond bundle-level assumptions. We also find that fairness assessments vary considerably depending on the metric used, reinforcing the need for multi-faceted evaluation. Furthermore, user behavior plays a critical role: when users interact more frequently with bundles than with individual items, BR systems tend to yield fairer exposure distributions across both levels. Overall, our findings offer actionable insights for building fairer bundle recommender systems and establish a vital foundation for future research in this emerging domain.
arXiv:2507.14352v1 Announce Type: cross
Abstract: Recommender systems are known to exhibit fairness issues, particularly on the product side, where products and their associated suppliers receive unequal exposure in recommended results. While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored. This emerging task introduces additional complexity: recommendations are generated at the bundle level, yet user satisfaction and product (or supplier) exposure depend on both the bundle and the individual items it contains. Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting. In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods. We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns. Our results show that exposure patterns differ notably between bundles and items, revealing the need for fairness interventions that go beyond bundle-level assumptions. We also find that fairness assessments vary considerably depending on the metric used, reinforcing the need for multi-faceted evaluation. Furthermore, user behavior plays a critical role: when users interact more frequently with bundles than with individual items, BR systems tend to yield fairer exposure distributions across both levels. Overall, our findings offer actionable insights for building fairer bundle recommender systems and establish a vital foundation for future research in this emerging domain. Read More
New to LLMs? Start Here Towards Data Scienceon May 23, 2025 at 7:51 pm A guide to Agents, LLMs, RAG, Fine-tuning, LangChain with practical examples to start building
The post New to LLMs? Start Here appeared first on Towards Data Science.
A guide to Agents, LLMs, RAG, Fine-tuning, LangChain with practical examples to start building
The post New to LLMs? Start Here appeared first on Towards Data Science. Read More
Estimating Product-Level Price Elasticities Using Hierarchical BayesianTowards Data Scienceon May 23, 2025 at 11:58 pm Using one model to personalize ML results
The post Estimating Product-Level Price Elasticities Using Hierarchical Bayesian appeared first on Towards Data Science.
Using one model to personalize ML results
The post Estimating Product-Level Price Elasticities Using Hierarchical Bayesian appeared first on Towards Data Science. Read More