How to Analyze and Optimize Your LLMs in 3 StepsTowards Data Scienceon September 11, 2025 at 2:30 pm Learn to enhance your LLMs with my 3 step process, inspecting, improving and iterating on your LLMs
The post How to Analyze and Optimize Your LLMs in 3 Steps appeared first on Towards Data Science.
Learn to enhance your LLMs with my 3 step process, inspecting, improving and iterating on your LLMs
The post How to Analyze and Optimize Your LLMs in 3 Steps appeared first on Towards Data Science. Read More
Yext Scout Guides Brands Through AI Search ChallengesAI Newson September 11, 2025 at 2:19 pm Customers are discovering brands and learning about products and services in new ways from traditional search to AI search, to AI agents and more, the discovery journey has completely changed, and brands need to adapt to the new paradigm. Launched earlier this year, Yext Scout is an AI search and competitive intelligence agent that’s designed
The post Yext Scout Guides Brands Through AI Search Challenges appeared first on AI News.
Customers are discovering brands and learning about products and services in new ways from traditional search to AI search, to AI agents and more, the discovery journey has completely changed, and brands need to adapt to the new paradigm. Launched earlier this year, Yext Scout is an AI search and competitive intelligence agent that’s designed
The post Yext Scout Guides Brands Through AI Search Challenges appeared first on AI News. Read More
VMware nods to AI but looks to long-termAI Newson September 11, 2025 at 3:44 pm Owner of VMware, Broadcom, announced that its VMware Cloud Foundation platform is now AI native at the VMware Explore conference a few weeks ago. It was the latest move by the company to keep up to speed with the rest of the technology industry’s wide and rapid adoption of large language models, yet came as
The post VMware nods to AI but looks to long-term appeared first on AI News.
Owner of VMware, Broadcom, announced that its VMware Cloud Foundation platform is now AI native at the VMware Explore conference a few weeks ago. It was the latest move by the company to keep up to speed with the rest of the technology industry’s wide and rapid adoption of large language models, yet came as
The post VMware nods to AI but looks to long-term appeared first on AI News. Read More
Fighting Back Against Attacks in Federated Learning Towards Data Scienceon September 10, 2025 at 5:00 pm Lessons from a multi-node simulator
The post Fighting Back Against Attacks in Federated Learning appeared first on Towards Data Science.
Lessons from a multi-node simulator
The post Fighting Back Against Attacks in Federated Learning appeared first on Towards Data Science. Read More
When A Difference Actually Makes A DifferenceTowards Data Scienceon September 10, 2025 at 3:30 pm Bite-Sized Analytics for Business Decision-Makers (1)
The post When A Difference Actually Makes A Difference appeared first on Towards Data Science.
Bite-Sized Analytics for Business Decision-Makers (1)
The post When A Difference Actually Makes A Difference appeared first on Towards Data Science. Read More
SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learningcs.AI updates on arXiv.orgon September 10, 2025 at 4:00 am arXiv:2505.22626v2 Announce Type: replace-cross
Abstract: Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, a self-supervised data curation framework that filters out low-quality state-action pairs to improve the performance of imitation learning policies. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redundant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progression, and a deduplication module operating on joint state-action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://ut-austin-rpl.github.io/SCIZOR/
arXiv:2505.22626v2 Announce Type: replace-cross
Abstract: Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, a self-supervised data curation framework that filters out low-quality state-action pairs to improve the performance of imitation learning policies. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redundant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progression, and a deduplication module operating on joint state-action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://ut-austin-rpl.github.io/SCIZOR/ Read More
Why Task-Based Evaluations MatterTowards Data Scienceon September 10, 2025 at 2:00 pm This article is adapted from a lecture series I gave at Deeplearn 2025: From Prototype to Production: Evaluation Strategies for Agentic Applications.
Task-based evaluations, which measure an AI system’s performance in use-case-specific, real-world settings, are underadopted and understudied. There is still an outsized focus in AI literature on foundation model benchmarks. Benchmarks are essential for advancing research and comparing broad, general capabilities, but they rarely translate cleanly into task-specific performance.
The post Why Task-Based Evaluations Matter appeared first on Towards Data Science.
This article is adapted from a lecture series I gave at Deeplearn 2025: From Prototype to Production: Evaluation Strategies for Agentic Applications.
Task-based evaluations, which measure an AI system’s performance in use-case-specific, real-world settings, are underadopted and understudied. There is still an outsized focus in AI literature on foundation model benchmarks. Benchmarks are essential for advancing research and comparing broad, general capabilities, but they rarely translate cleanly into task-specific performance.
The post Why Task-Based Evaluations Matter appeared first on Towards Data Science. Read More
Breaking the Conventional Forward-Backward Tie in Neural Networks: Activation Functionscs.AI updates on arXiv.orgon September 10, 2025 at 4:00 am arXiv:2509.07236v1 Announce Type: cross
Abstract: Gradient-based neural network training traditionally enforces symmetry between forward and backward propagation, requiring activation functions to be differentiable (or sub-differentiable) and strictly monotonic in certain regions to prevent flat gradient areas. This symmetry, linking forward activations closely to backward gradients, significantly restricts the selection of activation functions, particularly excluding those with substantial flat or non-differentiable regions. In this paper, we challenge this assumption through mathematical analysis, demonstrating that precise gradient magnitudes derived from activation functions are largely redundant, provided the gradient direction is preserved. Empirical experiments conducted on foundational architectures – such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and Binary Neural Networks (BNNs) – confirm that relaxing forward-backward symmetry and substituting traditional gradients with simpler or stochastic alternatives does not impair learning and may even enhance training stability and efficiency. We explicitly demonstrate that neural networks with flat or non-differentiable activation functions, such as the Heaviside step function, can be effectively trained, thereby expanding design flexibility and computational efficiency. Further empirical validation with more complex architectures remains a valuable direction for future research.
arXiv:2509.07236v1 Announce Type: cross
Abstract: Gradient-based neural network training traditionally enforces symmetry between forward and backward propagation, requiring activation functions to be differentiable (or sub-differentiable) and strictly monotonic in certain regions to prevent flat gradient areas. This symmetry, linking forward activations closely to backward gradients, significantly restricts the selection of activation functions, particularly excluding those with substantial flat or non-differentiable regions. In this paper, we challenge this assumption through mathematical analysis, demonstrating that precise gradient magnitudes derived from activation functions are largely redundant, provided the gradient direction is preserved. Empirical experiments conducted on foundational architectures – such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and Binary Neural Networks (BNNs) – confirm that relaxing forward-backward symmetry and substituting traditional gradients with simpler or stochastic alternatives does not impair learning and may even enhance training stability and efficiency. We explicitly demonstrate that neural networks with flat or non-differentiable activation functions, such as the Heaviside step function, can be effectively trained, thereby expanding design flexibility and computational efficiency. Further empirical validation with more complex architectures remains a valuable direction for future research. Read More
Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Modelscs.AI updates on arXiv.orgon September 10, 2025 at 4:00 am arXiv:2509.07027v1 Announce Type: cross
Abstract: We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence.
arXiv:2509.07027v1 Announce Type: cross
Abstract: We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence. Read More
How to Build Effective AI Agents to Process Millions of RequestsTowards Data Scienceon September 9, 2025 at 5:00 pm Learn how to build production ready systems using AI agents
The post How to Build Effective AI Agents to Process Millions of Requests appeared first on Towards Data Science.
Learn how to build production ready systems using AI agents
The post How to Build Effective AI Agents to Process Millions of Requests appeared first on Towards Data Science. Read More