We Tried 5 Missing Data Imputation Methods: The Simplest Method Won (Sort Of)KDnuggets We tested five imputation methods with proper cross-validation and statistical testing. Mean imputation won for prediction but destroyed feature relationships.
We tested five imputation methods with proper cross-validation and statistical testing. Mean imputation won for prediction but destroyed feature relationships. Read More
How AI Can Become Your Personal Language TutorTowards Data Science How I used n8n to build AI study partners for learning Mandarin: vocabulary, listening, and pronunciation correction.
The post How AI Can Become Your Personal Language Tutor appeared first on Towards Data Science.
How I used n8n to build AI study partners for learning Mandarin: vocabulary, listening, and pronunciation correction.
The post How AI Can Become Your Personal Language Tutor appeared first on Towards Data Science. Read More
CISA has ordered government agencies to secure their systems against a high-severity Gogs vulnerability that was exploited in zero-day attacks. […] Read More
The Amsterdam Court of Appeal sentenced a 44-year-old Dutch national to seven years in prison for multiple crimes, including computer hacking and attempted extortion. […] Read More
Massive data dump reveals real identities and details of administrators and members of the notorious hacker forum. Read More
LLMs as verification oracles for Soliditycs.AI updates on arXiv.org arXiv:2509.19153v2 Announce Type: replace-cross
Abstract: Ensuring the correctness of smart contracts is critical, as even subtle flaws can lead to severe financial losses. While bug detection tools able to spot common vulnerability patterns can serve as a first line of defense, most real-world exploits and losses stem from errors in the contract business logic. Formal verification tools such as SolCMC and the Certora Prover address this challenge, but their impact remains limited by steep learning curves and restricted specification languages. Recent works have begun to explore the use of large language models (LLMs) for security-related tasks such as vulnerability detection and test generation. Yet, a fundamental question remains open: can LLMs aid in assessing the validity of arbitrary contract-specific properties? In this paper, we provide the first systematic empirical evaluation of GPT-5, a state-of-the-art reasoning LLM, in this role. We benchmark its performance on a large dataset of verification tasks, compare its outputs against those of established formal verification tools, and assess its practical effectiveness in real-world auditing scenarios. Our study combines quantitative metrics with qualitative analysis, and shows that recent reasoning-oriented LLMs – although lacking soundness guarantees – can be surprisingly effective at predicting the (in)validity of complex properties, suggesting a new frontier in the convergence of AI and formal methods for secure smart contract development and auditing.
arXiv:2509.19153v2 Announce Type: replace-cross
Abstract: Ensuring the correctness of smart contracts is critical, as even subtle flaws can lead to severe financial losses. While bug detection tools able to spot common vulnerability patterns can serve as a first line of defense, most real-world exploits and losses stem from errors in the contract business logic. Formal verification tools such as SolCMC and the Certora Prover address this challenge, but their impact remains limited by steep learning curves and restricted specification languages. Recent works have begun to explore the use of large language models (LLMs) for security-related tasks such as vulnerability detection and test generation. Yet, a fundamental question remains open: can LLMs aid in assessing the validity of arbitrary contract-specific properties? In this paper, we provide the first systematic empirical evaluation of GPT-5, a state-of-the-art reasoning LLM, in this role. We benchmark its performance on a large dataset of verification tasks, compare its outputs against those of established formal verification tools, and assess its practical effectiveness in real-world auditing scenarios. Our study combines quantitative metrics with qualitative analysis, and shows that recent reasoning-oriented LLMs – although lacking soundness guarantees – can be surprisingly effective at predicting the (in)validity of complex properties, suggesting a new frontier in the convergence of AI and formal methods for secure smart contract development and auditing. Read More
Simulating Multi-Stakeholder Decision-Making with Generative Agents in Urban Planningcs.AI updates on arXiv.org arXiv:2402.11314v2 Announce Type: replace-cross
Abstract: Reaching consensus in urban planning is a complex process often hindered by prolonged negotiations, trade-offs, power dynamics, and competing stakeholder interests, resulting in inefficiencies and inequities. Advances in large language models (LLMs), with their increasing capabilities in knowledge transfer, reasoning, and planning, have enabled the development of multi-generative agent systems, offering a promising approach to simulating discussions and interactions among diverse stakeholders on contentious topics. However, applying such systems also carries significant societal and ethical risks, including misrepresentation, privacy concerns, and biases stemming from opinion convergence among agents, hallucinations caused by insufficient or biased prompts, and the inherent limitations of foundation models. To evaluate the influence of these factors, we incorporate varying levels of real-world survey data and demographic detail to test agents’ performance under two decision-making value frameworks: altruism-driven and interest-driven, using a real-world urban rezoning challenge. This approach evaluates the influence of demographic factors such as race, gender, and age on collective decision-making in the design of multi-generative agent systems. Our experimental results reveal that integrating demographic and life-value data enhances the diversity and stability of agent outputs. In addition, communication among generated agents improves the quality of collective reasoning. These findings provide a predictive framework for decision-makers to anticipate stakeholder reactions, including concerns, objections, and support. By enabling iterative refinement of proposals before public release, the simulated approach fosters more equitable and cost-effective decisions in urban planning.
arXiv:2402.11314v2 Announce Type: replace-cross
Abstract: Reaching consensus in urban planning is a complex process often hindered by prolonged negotiations, trade-offs, power dynamics, and competing stakeholder interests, resulting in inefficiencies and inequities. Advances in large language models (LLMs), with their increasing capabilities in knowledge transfer, reasoning, and planning, have enabled the development of multi-generative agent systems, offering a promising approach to simulating discussions and interactions among diverse stakeholders on contentious topics. However, applying such systems also carries significant societal and ethical risks, including misrepresentation, privacy concerns, and biases stemming from opinion convergence among agents, hallucinations caused by insufficient or biased prompts, and the inherent limitations of foundation models. To evaluate the influence of these factors, we incorporate varying levels of real-world survey data and demographic detail to test agents’ performance under two decision-making value frameworks: altruism-driven and interest-driven, using a real-world urban rezoning challenge. This approach evaluates the influence of demographic factors such as race, gender, and age on collective decision-making in the design of multi-generative agent systems. Our experimental results reveal that integrating demographic and life-value data enhances the diversity and stability of agent outputs. In addition, communication among generated agents improves the quality of collective reasoning. These findings provide a predictive framework for decision-makers to anticipate stakeholder reactions, including concerns, objections, and support. By enabling iterative refinement of proposals before public release, the simulated approach fosters more equitable and cost-effective decisions in urban planning. Read More
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Languagecs.AI updates on arXiv.org arXiv:2505.10055v2 Announce Type: replace-cross
Abstract: This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images annotated with bounding boxes at word, line, and document levels, suitable for training and evaluating models based on different architectures, including Convolutional Neural Networks (CNNs) and Transformers. PsOCR covers variations across 1,000 unique font families, colors, image sizes, and layouts. A benchmark subset of 10K images was selected to evaluate the performance of several LMMs, including seven open-source models: DeepSeek’s Janus, InternVL, MiniCPM, Florence, and Qwen (3B and 7B), and four closed-source models: GPT-4o, Gemini, Claude, and Grok. Experimental results demonstrate that Gemini achieves the best performance among all models, whereas among open-source models, Qwen-7B stands out. This work provides an insightful assessment of the capabilities and limitations of current LMMs for OCR tasks in Pashto and establishes a foundation for further research not only in Pashto OCR but also for other similar scripts such as Arabic, Persian, and Urdu. PsOCR is available at https://github.com/zirak-ai/PashtoOCR.
arXiv:2505.10055v2 Announce Type: replace-cross
Abstract: This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images annotated with bounding boxes at word, line, and document levels, suitable for training and evaluating models based on different architectures, including Convolutional Neural Networks (CNNs) and Transformers. PsOCR covers variations across 1,000 unique font families, colors, image sizes, and layouts. A benchmark subset of 10K images was selected to evaluate the performance of several LMMs, including seven open-source models: DeepSeek’s Janus, InternVL, MiniCPM, Florence, and Qwen (3B and 7B), and four closed-source models: GPT-4o, Gemini, Claude, and Grok. Experimental results demonstrate that Gemini achieves the best performance among all models, whereas among open-source models, Qwen-7B stands out. This work provides an insightful assessment of the capabilities and limitations of current LMMs for OCR tasks in Pashto and establishes a foundation for further research not only in Pashto OCR but also for other similar scripts such as Arabic, Persian, and Urdu. PsOCR is available at https://github.com/zirak-ai/PashtoOCR. Read More
ART: Adaptive Reasoning Trees for Explainable Claim Verificationcs.AI updates on arXiv.org arXiv:2601.05455v1 Announce Type: new
Abstract: Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes environments is hindered by their opacity; their outputs lack faithful explanations and cannot be effectively contested to correct errors, undermining trustworthiness. In this paper, we propose ART (Adaptive Reasoning Trees), a hierarchical method for claim verification. The process begins with a root claim, which branches into supporting and attacking child arguments. An argument’s strength is determined bottom-up via a pairwise tournament of its children, adjudicated by a judge LLM, allowing a final, transparent and contestable verdict to be systematically derived which is missing in methods like Chain-of-Thought (CoT). We empirically validate ART on multiple datasets, analyzing different argument generators and comparison strategies. Our findings show that ART’s structured reasoning outperforms strong baselines, establishing a new benchmark for explainable claim verification which is more reliable and ensures clarity in the overall decision making step.
arXiv:2601.05455v1 Announce Type: new
Abstract: Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes environments is hindered by their opacity; their outputs lack faithful explanations and cannot be effectively contested to correct errors, undermining trustworthiness. In this paper, we propose ART (Adaptive Reasoning Trees), a hierarchical method for claim verification. The process begins with a root claim, which branches into supporting and attacking child arguments. An argument’s strength is determined bottom-up via a pairwise tournament of its children, adjudicated by a judge LLM, allowing a final, transparent and contestable verdict to be systematically derived which is missing in methods like Chain-of-Thought (CoT). We empirically validate ART on multiple datasets, analyzing different argument generators and comparison strategies. Our findings show that ART’s structured reasoning outperforms strong baselines, establishing a new benchmark for explainable claim verification which is more reliable and ensures clarity in the overall decision making step. Read More
Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Datasetcs.AI updates on arXiv.org arXiv:2601.05918v1 Announce Type: cross
Abstract: On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about their use of AI for research. Focusing on the scientist subset, I show that widely available LLMs with web search and agentic capabilities can link six out of twenty-four interviews to specific scientific works, recovering associated authors and, in some cases, uniquely identifying the interviewees. My contribution is to show that modern LLM-based agents make such re-identification attacks easy and low-effort: off-the-shelf tools can, with a few natural-language prompts, search the web, cross-reference details, and propose likely matches, effectively lowering the technical barrier. Existing safeguards can be bypassed by breaking down the re-identification into benign tasks. I outline the attack at a high level, discuss implications for releasing rich qualitative data in the age of LLM agents, and propose mitigation recommendations and open problems. I have notified Anthropic of my findings.
arXiv:2601.05918v1 Announce Type: cross
Abstract: On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about their use of AI for research. Focusing on the scientist subset, I show that widely available LLMs with web search and agentic capabilities can link six out of twenty-four interviews to specific scientific works, recovering associated authors and, in some cases, uniquely identifying the interviewees. My contribution is to show that modern LLM-based agents make such re-identification attacks easy and low-effort: off-the-shelf tools can, with a few natural-language prompts, search the web, cross-reference details, and propose likely matches, effectively lowering the technical barrier. Existing safeguards can be bypassed by breaking down the re-identification into benign tasks. I outline the attack at a high level, discuss implications for releasing rich qualitative data in the age of LLM agents, and propose mitigation recommendations and open problems. I have notified Anthropic of my findings. Read More