Trump jokes about AI while US and UK sign new tech dealAI Newson September 19, 2025 at 8:18 am US President Donald Trump said on Thursday that AI was “taking over the world,” and joked that he hoped tech executives understood it better than he did. The comment came as Trump and UK Prime Minister Keir Starmer hosted a gathering of business and technology leaders in London during the president’s second state visit to
The post Trump jokes about AI while US and UK sign new tech deal appeared first on AI News.
US President Donald Trump said on Thursday that AI was “taking over the world,” and joked that he hoped tech executives understood it better than he did. The comment came as Trump and UK Prime Minister Keir Starmer hosted a gathering of business and technology leaders in London during the president’s second state visit to
The post Trump jokes about AI while US and UK sign new tech deal appeared first on AI News. Read More
Can I Trust This Chatbot? Assessing User Privacy in AI-Healthcare Chatbot Applicationscs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14581v1 Announce Type: cross
Abstract: As Conversational Artificial Intelligence (AI) becomes more integrated into everyday life, AI-powered chatbot mobile applications are increasingly adopted across industries, particularly in the healthcare domain. These chatbots offer accessible and 24/7 support, yet their collection and processing of sensitive health data present critical privacy concerns. While prior research has examined chatbot security, privacy issues specific to AI healthcare chatbots have received limited attention. Our study evaluates the privacy practices of 12 widely downloaded AI healthcare chatbot apps available on the App Store and Google Play in the United States. We conducted a three-step assessment analyzing: (1) privacy settings during sign-up, (2) in-app privacy controls, and (3) the content of privacy policies. The analysis identified significant gaps in user data protection. Our findings reveal that half of the examined apps did not present a privacy policy during sign up, and only two provided an option to disable data sharing at that stage. The majority of apps’ privacy policies failed to address data protection measures. Moreover, users had minimal control over their personal data. The study provides key insights for information science researchers, developers, and policymakers to improve privacy protections in AI healthcare chatbot apps.
arXiv:2509.14581v1 Announce Type: cross
Abstract: As Conversational Artificial Intelligence (AI) becomes more integrated into everyday life, AI-powered chatbot mobile applications are increasingly adopted across industries, particularly in the healthcare domain. These chatbots offer accessible and 24/7 support, yet their collection and processing of sensitive health data present critical privacy concerns. While prior research has examined chatbot security, privacy issues specific to AI healthcare chatbots have received limited attention. Our study evaluates the privacy practices of 12 widely downloaded AI healthcare chatbot apps available on the App Store and Google Play in the United States. We conducted a three-step assessment analyzing: (1) privacy settings during sign-up, (2) in-app privacy controls, and (3) the content of privacy policies. The analysis identified significant gaps in user data protection. Our findings reveal that half of the examined apps did not present a privacy policy during sign up, and only two provided an option to disable data sharing at that stage. The majority of apps’ privacy policies failed to address data protection measures. Moreover, users had minimal control over their personal data. The study provides key insights for information science researchers, developers, and policymakers to improve privacy protections in AI healthcare chatbot apps. Read More
Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authorscs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14543v1 Announce Type: cross
Abstract: As large language models (LLMs) become increasingly integrated into personal writing tools, a critical question arises: can LLMs faithfully imitate an individual’s writing style from just a few examples? Personal style is often subtle and implicit, making it difficult to specify through prompts yet essential for user-aligned generation. This work presents a comprehensive evaluation of state-of-the-art LLMs’ ability to mimic personal writing styles via in-context learning from a small number of user-authored samples. We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation. Our evaluation spans over 40000 generations per model across domains such as news, email, forums, and blogs, covering writing samples from more than 400 real-world authors. Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums. Further analysis on various prompting strategies such as number of demonstrations reveal key limitations in effective personalization. Our findings highlight a fundamental gap in personalized LLM adaptation and the need for improved techniques to support implicit, style-consistent generation. To aid future research and for reproducibility, we open-source our data and code.
arXiv:2509.14543v1 Announce Type: cross
Abstract: As large language models (LLMs) become increasingly integrated into personal writing tools, a critical question arises: can LLMs faithfully imitate an individual’s writing style from just a few examples? Personal style is often subtle and implicit, making it difficult to specify through prompts yet essential for user-aligned generation. This work presents a comprehensive evaluation of state-of-the-art LLMs’ ability to mimic personal writing styles via in-context learning from a small number of user-authored samples. We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation. Our evaluation spans over 40000 generations per model across domains such as news, email, forums, and blogs, covering writing samples from more than 400 real-world authors. Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums. Further analysis on various prompting strategies such as number of demonstrations reveal key limitations in effective personalization. Our findings highlight a fundamental gap in personalized LLM adaptation and the need for improved techniques to support implicit, style-consistent generation. To aid future research and for reproducibility, we open-source our data and code. Read More
BEACON: Behavioral Malware Classification with Large Language Model Embeddings and Deep Learningcs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14519v1 Announce Type: cross
Abstract: Malware is becoming increasingly complex and widespread, making it essential to develop more effective and timely detection methods. Traditional static analysis often fails to defend against modern threats that employ code obfuscation, polymorphism, and other evasion techniques. In contrast, behavioral malware detection, which monitors runtime activities, provides a more reliable and context-aware solution. In this work, we propose BEACON, a novel deep learning framework that leverages large language models (LLMs) to generate dense, contextual embeddings from raw sandbox-generated behavior reports. These embeddings capture semantic and structural patterns of each sample and are processed by a one-dimensional convolutional neural network (1D CNN) for multi-class malware classification. Evaluated on the Avast-CTU Public CAPE Dataset, our framework consistently outperforms existing methods, highlighting the effectiveness of LLM-based behavioral embeddings and the overall design of BEACON for robust malware classification.
arXiv:2509.14519v1 Announce Type: cross
Abstract: Malware is becoming increasingly complex and widespread, making it essential to develop more effective and timely detection methods. Traditional static analysis often fails to defend against modern threats that employ code obfuscation, polymorphism, and other evasion techniques. In contrast, behavioral malware detection, which monitors runtime activities, provides a more reliable and context-aware solution. In this work, we propose BEACON, a novel deep learning framework that leverages large language models (LLMs) to generate dense, contextual embeddings from raw sandbox-generated behavior reports. These embeddings capture semantic and structural patterns of each sample and are processed by a one-dimensional convolutional neural network (1D CNN) for multi-class malware classification. Evaluated on the Avast-CTU Public CAPE Dataset, our framework consistently outperforms existing methods, highlighting the effectiveness of LLM-based behavioral embeddings and the overall design of BEACON for robust malware classification. Read More
The Role of Touch: Towards Optimal Tactile Sensing Distribution in Anthropomorphic Hands for Dexterous In-Hand Manipulationcs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14984v1 Announce Type: cross
Abstract: In-hand manipulation tasks, particularly in human-inspired robotic systems, must rely on distributed tactile sensing to achieve precise control across a wide variety of tasks. However, the optimal configuration of this network of sensors is a complex problem, and while the fingertips are a common choice for placing sensors, the contribution of tactile information from other regions of the hand is often overlooked. This work investigates the impact of tactile feedback from various regions of the fingers and palm in performing in-hand object reorientation tasks. We analyze how sensory feedback from different parts of the hand influences the robustness of deep reinforcement learning control policies and investigate the relationship between object characteristics and optimal sensor placement. We identify which tactile sensing configurations contribute to improving the efficiency and accuracy of manipulation. Our results provide valuable insights for the design and use of anthropomorphic end-effectors with enhanced manipulation capabilities.
arXiv:2509.14984v1 Announce Type: cross
Abstract: In-hand manipulation tasks, particularly in human-inspired robotic systems, must rely on distributed tactile sensing to achieve precise control across a wide variety of tasks. However, the optimal configuration of this network of sensors is a complex problem, and while the fingertips are a common choice for placing sensors, the contribution of tactile information from other regions of the hand is often overlooked. This work investigates the impact of tactile feedback from various regions of the fingers and palm in performing in-hand object reorientation tasks. We analyze how sensory feedback from different parts of the hand influences the robustness of deep reinforcement learning control policies and investigate the relationship between object characteristics and optimal sensor placement. We identify which tactile sensing configurations contribute to improving the efficiency and accuracy of manipulation. Our results provide valuable insights for the design and use of anthropomorphic end-effectors with enhanced manipulation capabilities. Read More
Not All Degradations Are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolutioncs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14841v1 Announce Type: cross
Abstract: Generalizable Image Super-Resolution aims to enhance model generalization capabilities under unknown degradations. To achieve this goal, the models are expected to focus only on image content-related features instead of overfitting degradations. Recently, numerous approaches such as Dropout and Feature Alignment have been proposed to suppress models’ natural tendency to overfit degradations and yield promising results. Nevertheless, these works have assumed that models overfit to all degradation types (e.g., blur, noise, JPEG), while through careful investigations in this paper, we discover that models predominantly overfit to noise, largely attributable to its distinct degradation pattern compared to other degradation types. In this paper, we propose a targeted feature denoising framework, comprising noise detection and denoising modules. Our approach presents a general solution that can be seamlessly integrated with existing super-resolution models without requiring architectural modifications. Our framework demonstrates superior performance compared to previous regularization-based methods across five traditional benchmarks and datasets, encompassing both synthetic and real-world scenarios.
arXiv:2509.14841v1 Announce Type: cross
Abstract: Generalizable Image Super-Resolution aims to enhance model generalization capabilities under unknown degradations. To achieve this goal, the models are expected to focus only on image content-related features instead of overfitting degradations. Recently, numerous approaches such as Dropout and Feature Alignment have been proposed to suppress models’ natural tendency to overfit degradations and yield promising results. Nevertheless, these works have assumed that models overfit to all degradation types (e.g., blur, noise, JPEG), while through careful investigations in this paper, we discover that models predominantly overfit to noise, largely attributable to its distinct degradation pattern compared to other degradation types. In this paper, we propose a targeted feature denoising framework, comprising noise detection and denoising modules. Our approach presents a general solution that can be seamlessly integrated with existing super-resolution models without requiring architectural modifications. Our framework demonstrates superior performance compared to previous regularization-based methods across five traditional benchmarks and datasets, encompassing both synthetic and real-world scenarios. Read More
A Multi-To-One Interview Paradigm for Efficient MLLM Evaluationcs.AI updates on arXiv.orgon September 19, 2025 at 4:00 am arXiv:2509.14886v1 Announce Type: cross
Abstract: The rapid progress of Multi-Modal Large Language Models (MLLMs) has spurred the creation of numerous benchmarks. However, conventional full-coverage Question-Answering evaluations suffer from high redundancy and low efficiency. Inspired by human interview processes, we propose a multi-to-one interview paradigm for efficient MLLM evaluation. Our framework consists of (i) a two-stage interview strategy with pre-interview and formal interview phases, (ii) dynamic adjustment of interviewer weights to ensure fairness, and (iii) an adaptive mechanism for question difficulty-level chosen. Experiments on different benchmarks show that the proposed paradigm achieves significantly higher correlation with full-coverage results than random sampling, with improvements of up to 17.6% in PLCC and 16.7% in SRCC, while reducing the number of required questions. These findings demonstrate that the proposed paradigm provides a reliable and efficient alternative for large-scale MLLM benchmarking.
arXiv:2509.14886v1 Announce Type: cross
Abstract: The rapid progress of Multi-Modal Large Language Models (MLLMs) has spurred the creation of numerous benchmarks. However, conventional full-coverage Question-Answering evaluations suffer from high redundancy and low efficiency. Inspired by human interview processes, we propose a multi-to-one interview paradigm for efficient MLLM evaluation. Our framework consists of (i) a two-stage interview strategy with pre-interview and formal interview phases, (ii) dynamic adjustment of interviewer weights to ensure fairness, and (iii) an adaptive mechanism for question difficulty-level chosen. Experiments on different benchmarks show that the proposed paradigm achieves significantly higher correlation with full-coverage results than random sampling, with improvements of up to 17.6% in PLCC and 16.7% in SRCC, while reducing the number of required questions. These findings demonstrate that the proposed paradigm provides a reliable and efficient alternative for large-scale MLLM benchmarking. Read More
TDS Newsletter: How to Make Smarter Business Decisions with AITowards Data Scienceon September 19, 2025 at 2:07 am Research agents, budget planners, and more
The post TDS Newsletter: How to Make Smarter Business Decisions with AI appeared first on Towards Data Science.
Research agents, budget planners, and more
The post TDS Newsletter: How to Make Smarter Business Decisions with AI appeared first on Towards Data Science. Read More
Large Language Models’ Reasoning Stalls: An Investigation into the Capabilities of Frontier Modelscs.AI updates on arXiv.orgon September 18, 2025 at 4:00 am arXiv:2505.19676v3 Announce Type: replace
Abstract: Empirical methods to examine the capability of Large Language Models (LLMs) to use Automated Theorem Prover (ATP) reasoning strategies are studied. We evaluate the performance of State of the Art models from December 2023 and August 2024 on PRONTOQA steamroller reasoning problems. For that, we develop methods for assessing LLM response accuracy and correct answer correlation.
Our results show that progress in improving LLM reasoning abilities has stalled over the nine month period. By tracking completion tokens, we show that almost all improvement in reasoning ability since GPT-4 was released can be attributed to either hidden system prompts or the training of models to automatically use generic Chain of Thought prompting strategies. Among the ATP reasoning strategies tried, we found that current frontier LLMs are best able to follow the bottom-up (also known as forward-chaining) strategy. A low positive correlation was found between an LLM response containing correct reasoning and arriving at the correct conclusion.
arXiv:2505.19676v3 Announce Type: replace
Abstract: Empirical methods to examine the capability of Large Language Models (LLMs) to use Automated Theorem Prover (ATP) reasoning strategies are studied. We evaluate the performance of State of the Art models from December 2023 and August 2024 on PRONTOQA steamroller reasoning problems. For that, we develop methods for assessing LLM response accuracy and correct answer correlation.
Our results show that progress in improving LLM reasoning abilities has stalled over the nine month period. By tracking completion tokens, we show that almost all improvement in reasoning ability since GPT-4 was released can be attributed to either hidden system prompts or the training of models to automatically use generic Chain of Thought prompting strategies. Among the ATP reasoning strategies tried, we found that current frontier LLMs are best able to follow the bottom-up (also known as forward-chaining) strategy. A low positive correlation was found between an LLM response containing correct reasoning and arriving at the correct conclusion. Read More
GWM: Towards Scalable Gaussian World Models for Robotic Manipulationcs.AI updates on arXiv.orgon September 18, 2025 at 4:00 am arXiv:2508.17600v2 Announce Type: replace-cross
Abstract: Training robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that requires consistent spatial and physical understanding of the three-dimensional world, even pre-trained on internet-scale video sources. To this end, we propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation, which reconstructs the future state by inferring the propagation of Gaussian primitives under the effect of robot actions. At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction with Gaussian Splatting. GWM can not only enhance the visual representation for imitation learning agent by self-supervised future prediction training, but can serve as a neural simulator that supports model-based reinforcement learning. Both simulated and real-world experiments depict that GWM can precisely predict future scenes conditioned on diverse robot actions, and can be further utilized to train policies that outperform the state-of-the-art by impressive margins, showcasing the initial data scaling potential of 3D world model.
arXiv:2508.17600v2 Announce Type: replace-cross
Abstract: Training robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that requires consistent spatial and physical understanding of the three-dimensional world, even pre-trained on internet-scale video sources. To this end, we propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation, which reconstructs the future state by inferring the propagation of Gaussian primitives under the effect of robot actions. At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction with Gaussian Splatting. GWM can not only enhance the visual representation for imitation learning agent by self-supervised future prediction training, but can serve as a neural simulator that supports model-based reinforcement learning. Both simulated and real-world experiments depict that GWM can precisely predict future scenes conditioned on diverse robot actions, and can be further utilized to train policies that outperform the state-of-the-art by impressive margins, showcasing the initial data scaling potential of 3D world model. Read More