The End-to-End Data Scientist’s Prompt PlaybookTowards Data Scienceon September 8, 2025 at 4:00 pm Part 3: Prompts for docs, DevOps, and stakeholder communication
The post The End-to-End Data Scientist’s Prompt Playbook appeared first on Towards Data Science.
Part 3: Prompts for docs, DevOps, and stakeholder communication
The post The End-to-End Data Scientist’s Prompt Playbook appeared first on Towards Data Science. Read More
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contextscs.AI updates on arXiv.orgon September 8, 2025 at 4:00 am arXiv:2509.04500v1 Announce Type: cross
Abstract: Incorporating external context can significantly enhance the response quality of Large Language Models (LLMs). However, real-world contexts often mix relevant information with disproportionate inappropriate content, posing reliability risks. How do LLMs process and prioritize mixed context? To study this, we introduce the Poisoned Context Testbed, pairing queries with real-world contexts containing relevant and inappropriate content. Inspired by associative learning in animals, we adapt the Rescorla-Wagner (RW) model from neuroscience to quantify how competing contextual signals influence LLM outputs. Our adapted model reveals a consistent behavioral pattern: LLMs exhibit a strong tendency to incorporate information that is less prevalent in the context. This susceptibility is harmful in real-world settings, where small amounts of inappropriate content can substantially degrade response quality. Empirical evaluations on our testbed further confirm this vulnerability. To tackle this, we introduce RW-Steering, a two-stage finetuning-based approach that enables the model to internally identify and ignore inappropriate signals. Unlike prior methods that rely on extensive supervision across diverse context mixtures, RW-Steering generalizes robustly across varying proportions of inappropriate content. Experiments show that our best fine-tuned model improves response quality by 39.8% and reverses the undesirable behavior curve, establishing RW-Steering as a robust, generalizable context engineering solution for improving LLM safety in real-world use.
arXiv:2509.04500v1 Announce Type: cross
Abstract: Incorporating external context can significantly enhance the response quality of Large Language Models (LLMs). However, real-world contexts often mix relevant information with disproportionate inappropriate content, posing reliability risks. How do LLMs process and prioritize mixed context? To study this, we introduce the Poisoned Context Testbed, pairing queries with real-world contexts containing relevant and inappropriate content. Inspired by associative learning in animals, we adapt the Rescorla-Wagner (RW) model from neuroscience to quantify how competing contextual signals influence LLM outputs. Our adapted model reveals a consistent behavioral pattern: LLMs exhibit a strong tendency to incorporate information that is less prevalent in the context. This susceptibility is harmful in real-world settings, where small amounts of inappropriate content can substantially degrade response quality. Empirical evaluations on our testbed further confirm this vulnerability. To tackle this, we introduce RW-Steering, a two-stage finetuning-based approach that enables the model to internally identify and ignore inappropriate signals. Unlike prior methods that rely on extensive supervision across diverse context mixtures, RW-Steering generalizes robustly across varying proportions of inappropriate content. Experiments show that our best fine-tuned model improves response quality by 39.8% and reverses the undesirable behavior curve, establishing RW-Steering as a robust, generalizable context engineering solution for improving LLM safety in real-world use. Read More
Toward Accessible Dermatology: Skin Lesion Classification Using Deep Learning Models on Mobile-Acquired Imagescs.AI updates on arXiv.orgon September 8, 2025 at 4:00 am arXiv:2509.04800v1 Announce Type: cross
Abstract: Skin diseases are among the most prevalent health concerns worldwide, yet conventional diagnostic methods are often costly, complex, and unavailable in low-resource settings. Automated classification using deep learning has emerged as a promising alternative, but existing studies are mostly limited to dermoscopic datasets and a narrow range of disease classes. In this work, we curate a large dataset of over 50 skin disease categories captured with mobile devices, making it more representative of real-world conditions. We evaluate multiple convolutional neural networks and Transformer-based architectures, demonstrating that Transformer models, particularly the Swin Transformer, achieve superior performance by effectively capturing global contextual features. To enhance interpretability, we incorporate Gradient-weighted Class Activation Mapping (Grad-CAM), which highlights clinically relevant regions and provides transparency in model predictions. Our results underscore the potential of Transformer-based approaches for mobile-acquired skin lesion classification, paving the way toward accessible AI-assisted dermatological screening and early diagnosis in resource-limited environments.
arXiv:2509.04800v1 Announce Type: cross
Abstract: Skin diseases are among the most prevalent health concerns worldwide, yet conventional diagnostic methods are often costly, complex, and unavailable in low-resource settings. Automated classification using deep learning has emerged as a promising alternative, but existing studies are mostly limited to dermoscopic datasets and a narrow range of disease classes. In this work, we curate a large dataset of over 50 skin disease categories captured with mobile devices, making it more representative of real-world conditions. We evaluate multiple convolutional neural networks and Transformer-based architectures, demonstrating that Transformer models, particularly the Swin Transformer, achieve superior performance by effectively capturing global contextual features. To enhance interpretability, we incorporate Gradient-weighted Class Activation Mapping (Grad-CAM), which highlights clinically relevant regions and provides transparency in model predictions. Our results underscore the potential of Transformer-based approaches for mobile-acquired skin lesion classification, paving the way toward accessible AI-assisted dermatological screening and early diagnosis in resource-limited environments. Read More
The Beauty of Space-Filling Curves: Understanding the Hilbert CurveTowards Data Scienceon September 7, 2025 at 4:00 pm A quick journey from theory to implementation and application
The post The Beauty of Space-Filling Curves: Understanding the Hilbert Curve appeared first on Towards Data Science.
A quick journey from theory to implementation and application
The post The Beauty of Space-Filling Curves: Understanding the Hilbert Curve appeared first on Towards Data Science. Read More
Hands-On with Agents SDK: Safeguarding Input and Output with GuardrailsTowards Data Scienceon September 6, 2025 at 4:00 pm A practical exploration of how guardrails safeguard multi-agent systems in Python using OpenAI Agents SDK, Streamlit, and Pydantic
The post Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails appeared first on Towards Data Science.
A practical exploration of how guardrails safeguard multi-agent systems in Python using OpenAI Agents SDK, Streamlit, and Pydantic
The post Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails appeared first on Towards Data Science. Read More
UK AI sector growth hits record £2.9B investment AI Newson September 5, 2025 at 3:13 pm A government report has found that surging investment has driven UK AI sector growth to outpace the wider economy by 150 times since 2022. The UK’s AI sector is clearly in the throes of a boom, with revenues shattering previous records to hit £23.9 billion in the last year. The engine room of this growth
The post UK AI sector growth hits record £2.9B investment appeared first on AI News.
A government report has found that surging investment has driven UK AI sector growth to outpace the wider economy by 150 times since 2022. The UK’s AI sector is clearly in the throes of a boom, with revenues shattering previous records to hit £23.9 billion in the last year. The engine room of this growth
The post UK AI sector growth hits record £2.9B investment appeared first on AI News. Read More
Zero-Inflated Data: A Comparison of Regression ModelsTowards Data Scienceon September 5, 2025 at 1:30 pm How to detect it and which model to choose.
The post Zero-Inflated Data: A Comparison of Regression Models appeared first on Towards Data Science.
How to detect it and which model to choose.
The post Zero-Inflated Data: A Comparison of Regression Models appeared first on Towards Data Science. Read More
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worldscs.AI updates on arXiv.org
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worldscs.AI updates on arXiv.orgon September 5, 2025 at 4:00 am arXiv:2509.04345v1 Announce Type: cross
Abstract: Speech generation systems can produce remarkably realistic vocalisations that are often indistinguishable from human speech, posing significant authenticity challenges. Although numerous deepfake detection methods have been developed, their effectiveness in real-world environments remains unrealiable due to the domain shift between training and test samples arising from diverse human speech and fast evolving speech synthesis systems. This is not adequately addressed by current datasets, which lack real-world application challenges with diverse and up-to-date audios in both real and deep-fake categories. To fill this gap, we introduce AUDETER (AUdio DEepfake TEst Range), a large-scale, highly diverse deepfake audio dataset for comprehensive evaluation and robust development of generalised models for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns, totalling 3 million audio clips, making it the largest deepfake audio dataset by scale. Through extensive experiments with AUDETER, we reveal that i) state-of-the-art (SOTA) methods trained on existing datasets struggle to generalise to novel deepfake audio samples and suffer from high false positive rates on unseen human voice, underscoring the need for a comprehensive dataset; and ii) these methods trained on AUDETER achieve highly generalised detection performance and significantly reduce detection error rate by 44.1% to 51.6%, achieving an error rate of only 4.17% on diverse cross-domain samples in the popular In-the-Wild dataset, paving the way for training generalist deepfake audio detectors. AUDETER is available on GitHub.
arXiv:2509.04345v1 Announce Type: cross
Abstract: Speech generation systems can produce remarkably realistic vocalisations that are often indistinguishable from human speech, posing significant authenticity challenges. Although numerous deepfake detection methods have been developed, their effectiveness in real-world environments remains unrealiable due to the domain shift between training and test samples arising from diverse human speech and fast evolving speech synthesis systems. This is not adequately addressed by current datasets, which lack real-world application challenges with diverse and up-to-date audios in both real and deep-fake categories. To fill this gap, we introduce AUDETER (AUdio DEepfake TEst Range), a large-scale, highly diverse deepfake audio dataset for comprehensive evaluation and robust development of generalised models for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns, totalling 3 million audio clips, making it the largest deepfake audio dataset by scale. Through extensive experiments with AUDETER, we reveal that i) state-of-the-art (SOTA) methods trained on existing datasets struggle to generalise to novel deepfake audio samples and suffer from high false positive rates on unseen human voice, underscoring the need for a comprehensive dataset; and ii) these methods trained on AUDETER achieve highly generalised detection performance and significantly reduce detection error rate by 44.1% to 51.6%, achieving an error rate of only 4.17% on diverse cross-domain samples in the popular In-the-Wild dataset, paving the way for training generalist deepfake audio detectors. AUDETER is available on GitHub. Read More
Tool Masking: The Layer MCP ForgotTowards Data Scienceon September 5, 2025 at 12:00 pm Tool masking for AI improves AI agents: shape MCP tool surfaces to cut tokens and errors, boost speed and reliability. Start prompt engineering your tools
The post Tool Masking: The Layer MCP Forgot appeared first on Towards Data Science.
Tool masking for AI improves AI agents: shape MCP tool surfaces to cut tokens and errors, boost speed and reliability. Start prompt engineering your tools
The post Tool Masking: The Layer MCP Forgot appeared first on Towards Data Science. Read More
Keypoint-based Diffusion for Robotic Motion Planning on the NICOL Robotcs.AI updates on arXiv.orgon September 5, 2025 at 4:00 am arXiv:2509.04076v1 Announce Type: cross
Abstract: We propose a novel diffusion-based action model for robotic motion planning. Commonly, established numerical planning approaches are used to solve general motion planning problems, but have significant runtime requirements. By leveraging the power of deep learning, we are able to achieve good results in a much smaller runtime by learning from a dataset generated by these planners. While our initial model uses point cloud embeddings in the input to predict keypoint-based joint sequences in its output, we observed in our ablation study that it remained challenging to condition the network on the point cloud embeddings. We identified some biases in our dataset and refined it, which improved the model’s performance. Our model, even without the use of the point cloud encodings, outperforms numerical models by an order of magnitude regarding the runtime, while reaching a success rate of up to 90% of collision free solutions on the test set.
arXiv:2509.04076v1 Announce Type: cross
Abstract: We propose a novel diffusion-based action model for robotic motion planning. Commonly, established numerical planning approaches are used to solve general motion planning problems, but have significant runtime requirements. By leveraging the power of deep learning, we are able to achieve good results in a much smaller runtime by learning from a dataset generated by these planners. While our initial model uses point cloud embeddings in the input to predict keypoint-based joint sequences in its output, we observed in our ablation study that it remained challenging to condition the network on the point cloud embeddings. We identified some biases in our dataset and refined it, which improved the model’s performance. Our model, even without the use of the point cloud encodings, outperforms numerical models by an order of magnitude regarding the runtime, while reaching a success rate of up to 90% of collision free solutions on the test set. Read More