Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Samplingcs.AI updates on arXiv.org arXiv:2601.22636v2 Announce Type: replace
Abstract: Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting, which underestimates real-world risk. In practice, attackers can exploit large-scale parallel sampling to repeatedly probe a model until a harmful response is produced. While recent work shows that attack success increases with repeated sampling, principled methods for predicting large-scale adversarial risk remain limited. We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling. We model sample-level success probabilities using a Beta distribution, the conjugate prior of the Bernoulli distribution, and derive an analytic scaling law that enables reliable extrapolation of large-N attack success rates from small-budget measurements. Using only n=100 samples, our anchored estimator predicts ASR@1000 with a mean absolute error of 1.66, compared to 12.04 for the baseline, which is an 86.2% reduction in estimation error. Our results reveal heterogeneous risk scaling profiles and show that models appearing robust under standard evaluation can experience rapid nonlinear risk amplification under parallel adversarial pressure. This work provides a low-cost, scalable methodology for realistic LLM safety assessment. We will release our code and evaluation scripts upon publication to future research.
arXiv:2601.22636v2 Announce Type: replace
Abstract: Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting, which underestimates real-world risk. In practice, attackers can exploit large-scale parallel sampling to repeatedly probe a model until a harmful response is produced. While recent work shows that attack success increases with repeated sampling, principled methods for predicting large-scale adversarial risk remain limited. We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling. We model sample-level success probabilities using a Beta distribution, the conjugate prior of the Bernoulli distribution, and derive an analytic scaling law that enables reliable extrapolation of large-N attack success rates from small-budget measurements. Using only n=100 samples, our anchored estimator predicts ASR@1000 with a mean absolute error of 1.66, compared to 12.04 for the baseline, which is an 86.2% reduction in estimation error. Our results reveal heterogeneous risk scaling profiles and show that models appearing robust under standard evaluation can experience rapid nonlinear risk amplification under parallel adversarial pressure. This work provides a low-cost, scalable methodology for realistic LLM safety assessment. We will release our code and evaluation scripts upon publication to future research. Read More
Agentic AI in healthcare: How Life Sciences marketing could achieve US$450bn in value by 2028AI News Agentic AI in healthcare is graduating from answering prompts to autonomously executing complex marketing tasks—and life sciences companies are betting their commercial strategies on it. According to a recent report cited by Capgemini Invent, AI agents could generate up to US$450 billion in economic value through revenue uplift and cost savings globally by 2028, with 69% of
The post Agentic AI in healthcare: How Life Sciences marketing could achieve US$450bn in value by 2028 appeared first on AI News.
Agentic AI in healthcare is graduating from answering prompts to autonomously executing complex marketing tasks—and life sciences companies are betting their commercial strategies on it. According to a recent report cited by Capgemini Invent, AI agents could generate up to US$450 billion in economic value through revenue uplift and cost savings globally by 2028, with 69% of
The post Agentic AI in healthcare: How Life Sciences marketing could achieve US$450bn in value by 2028 appeared first on AI News. Read More
Chinese hyperscalers and industry-specific agentic AIAI News Major Chinese technology companies Alibaba, Tencent, and Huawei are pursuing agentic AI (systems that can execute multi-step tasks autonomously and interact with software, data, and services without human instruction), and orienting the technology toward discrete industries and workflows. Alibaba’s open-source strategy for agentic AI Alibaba’s strategy centres on its Qwen AI model family, a set
The post Chinese hyperscalers and industry-specific agentic AI appeared first on AI News.
Major Chinese technology companies Alibaba, Tencent, and Huawei are pursuing agentic AI (systems that can execute multi-step tasks autonomously and interact with software, data, and services without human instruction), and orienting the technology toward discrete industries and workflows. Alibaba’s open-source strategy for agentic AI Alibaba’s strategy centres on its Qwen AI model family, a set
The post Chinese hyperscalers and industry-specific agentic AI appeared first on AI News. Read More
How to Personalize Claude CodeTowards Data Science Learn how to get more out of Claude code by giving it access to more information.
The post How to Personalize Claude Code appeared first on Towards Data Science.
Learn how to get more out of Claude code by giving it access to more information.
The post How to Personalize Claude Code appeared first on Towards Data Science. Read More
The Netherlands’ Dutch Data Protection Authority (AP) and the Council for the Judiciary confirmed both agencies (Rvdr) have disclosed that their systems were impacted by cyber attacks that exploited the recently disclosed security flaws in Ivanti Endpoint Manager Mobile (EPMM), according to a notice sent to the country’s parliament on Friday. “On January 29, the […]
The ransomware group breached SmarterTools through a vulnerability in the company’s own SmarterMail product. Read More
Testing ads in ChatGPTOpenAI News OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.
OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control. Read More
The Death of the “Everything Prompt”: Google’s Move Toward Structured AITowards Data Science How the new Interactions API enables deep-reasoning, stateful, agentic workflows.
The post The Death of the “Everything Prompt”: Google’s Move Toward Structured AI appeared first on Towards Data Science.
How the new Interactions API enables deep-reasoning, stateful, agentic workflows.
The post The Death of the “Everything Prompt”: Google’s Move Toward Structured AI appeared first on Towards Data Science. Read More
7 Python EDA Tricks to Find and Fix Data IssuesKDnuggets 7 Python tricks applicable to your early exploratory data analyses (EDA) to identify and deal with various data quality issues.
7 Python tricks applicable to your early exploratory data analyses (EDA) to identify and deal with various data quality issues. Read More
Accelerate agentic application development with a full-stack starter template for Amazon Bedrock AgentCoreArtificial Intelligence In this post, you will learn how to deploy Fullstack AgentCore Solution Template (FAST) to your Amazon Web Services (AWS) account, understand its architecture, and see how to extend it for your requirements. You will learn how to build your own agent while FAST handles authentication, infrastructure as code (IaC), deployment pipelines, and service integration.
In this post, you will learn how to deploy Fullstack AgentCore Solution Template (FAST) to your Amazon Web Services (AWS) account, understand its architecture, and see how to extend it for your requirements. You will learn how to build your own agent while FAST handles authentication, infrastructure as code (IaC), deployment pipelines, and service integration. Read More