Differential syntactic and semantic encoding in LLMscs.AI updates on arXiv.org arXiv:2601.04765v2 Announce Type: replace-cross
Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and semantic information contained in the representations. In particular, subtracting these syntactic and semantic “centroids” from sentence vectors strongly affects their similarity with syntactically and semantically matched sentences, respectively, suggesting that syntax and semantics are, at least partially, linearly encoded. We also find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled, suggesting differential encoding of these two types of linguistic information in LLM representations.
arXiv:2601.04765v2 Announce Type: replace-cross
Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and semantic information contained in the representations. In particular, subtracting these syntactic and semantic “centroids” from sentence vectors strongly affects their similarity with syntactically and semantically matched sentences, respectively, suggesting that syntax and semantics are, at least partially, linearly encoded. We also find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled, suggesting differential encoding of these two types of linguistic information in LLM representations. Read More
Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Featurescs.AI updates on arXiv.org arXiv:2502.17749v3 Announce Type: replace
Abstract: Recent progress in large language models (LLMs) for code generation has raised serious concerns about intellectual property protection. Malicious users can exploit LLMs to produce paraphrased versions of proprietary code that closely resemble the original. While the potential for LLM-assisted code paraphrasing continues to grow, research on detecting it remains limited, underscoring an urgent need for detection system. We respond to this need by proposing two tasks. The first task is to detect whether code generated by an LLM is a paraphrased version of original human-written code. The second task is to identify which LLM is used to paraphrase the original code. For these tasks, we construct a dataset LPcode consisting of pairs of human-written code and LLM-paraphrased code using various LLMs.
We statistically confirm significant differences in the coding styles of human-written and LLM-paraphrased code, particularly in terms of naming consistency, code structure, and readability. Based on these findings, we develop LPcodedec, a detection method that identifies paraphrase relationships between human-written and LLM-generated code, and discover which LLM is used for the paraphrasing. LPcodedec outperforms the best baselines in two tasks, improving F1 scores by 2.64% and 15.17% while achieving speedups of 1,343x and 213x, respectively. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features.
arXiv:2502.17749v3 Announce Type: replace
Abstract: Recent progress in large language models (LLMs) for code generation has raised serious concerns about intellectual property protection. Malicious users can exploit LLMs to produce paraphrased versions of proprietary code that closely resemble the original. While the potential for LLM-assisted code paraphrasing continues to grow, research on detecting it remains limited, underscoring an urgent need for detection system. We respond to this need by proposing two tasks. The first task is to detect whether code generated by an LLM is a paraphrased version of original human-written code. The second task is to identify which LLM is used to paraphrase the original code. For these tasks, we construct a dataset LPcode consisting of pairs of human-written code and LLM-paraphrased code using various LLMs.
We statistically confirm significant differences in the coding styles of human-written and LLM-paraphrased code, particularly in terms of naming consistency, code structure, and readability. Based on these findings, we develop LPcodedec, a detection method that identifies paraphrase relationships between human-written and LLM-generated code, and discover which LLM is used for the paraphrasing. LPcodedec outperforms the best baselines in two tasks, improving F1 scores by 2.64% and 15.17% while achieving speedups of 1,343x and 213x, respectively. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features. Read More
Optimizing Data Transfer in Batched AI/ML Inference WorkloadsTowards Data Science A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems – part 2
The post Optimizing Data Transfer in Batched AI/ML Inference Workloads appeared first on Towards Data Science.
A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems – part 2
The post Optimizing Data Transfer in Batched AI/ML Inference Workloads appeared first on Towards Data Science. Read More
Retailers like Kroger and Lowe’s test AI agents without handing control to GoogleAI News Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News.
Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News. Read More
The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance riskAI News Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News.
Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News. Read More
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agentscs.AI updates on arXiv.org arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids.
arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids. Read More
SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Modelscs.AI updates on arXiv.org arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs.
arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs. Read More
A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorchMarkTechPost In this tutorial, we demonstrate a realistic data poisoning attack by manipulating labels in the CIFAR-10 dataset and observing its impact on model behavior. We construct a clean and a poisoned training pipeline side by side, using a ResNet-style convolutional network to ensure stable, comparable learning dynamics. By selectively flipping a fraction of samples from
The post A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorch appeared first on MarkTechPost.
In this tutorial, we demonstrate a realistic data poisoning attack by manipulating labels in the CIFAR-10 dataset and observing its impact on model behavior. We construct a clean and a poisoned training pipeline side by side, using a ResNet-style convolutional network to ensure stable, comparable learning dynamics. By selectively flipping a fraction of samples from
The post A Coding Guide to Demonstrate Targeted Data Poisoning Attacks in Deep Learning by Label Flipping on CIFAR-10 with PyTorch appeared first on MarkTechPost. Read More
Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL ToolkitMarkTechPost What does an end to end stack for terminal agents look like when you combine structured toolkits, synthetic RL environments, and benchmark aligned evaluation? A team of researchers from CAMEL AI, Eigent AI and other collaborators have released SETA, a toolkit and environment stack that focuses on reinforcement learning for terminal agents. The project targets
The post Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL Toolkit appeared first on MarkTechPost.
What does an end to end stack for terminal agents look like when you combine structured toolkits, synthetic RL environments, and benchmark aligned evaluation? A team of researchers from CAMEL AI, Eigent AI and other collaborators have released SETA, a toolkit and environment stack that focuses on reinforcement learning for terminal agents. The project targets
The post Meet SETA: Open Source Training Reinforcement Learning Environments for Terminal Agents with 400 Tasks and CAMEL Toolkit appeared first on MarkTechPost. Read More
Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car ExampleTowards Data Science Walkthrough using open-source prompt optimization algorithms in Python to improve the accuracy of an autonomous vehicle car safety agent running on OpenAI’s GPT 5.2
The post Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car Example appeared first on Towards Data Science.
Walkthrough using open-source prompt optimization algorithms in Python to improve the accuracy of an autonomous vehicle car safety agent running on OpenAI’s GPT 5.2
The post Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car Example appeared first on Towards Data Science. Read More