Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Daily AI News
AI News & Insights Featured Image

Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub AI updates on arXiv.org

Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHubcs.AI updates on arXiv.org arXiv:2601.15195v1 Announce Type: cross
Abstract: AI coding agents are now submitting pull requests (PRs) to software projects, acting not just as assistants but as autonomous contributors. As these agentic contributions are rapidly increasing across real repositories, little is known about how they behave in practice and why many of them fail to be merged. In this paper, we conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub. (RQ1) We first quantitatively characterize merged and not-merged PRs along four broad dimensions: 1) merge outcomes across task types, 2) code changes, 3) CI build results, and 4) review dynamics. We observe that tasks related to documentation, CI, and build update achieve the highest merge success, whereas performance and bug-fix tasks perform the worst. Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project’s CI/CD pipeline validation. (RQ2) To further investigate why some agentic PRs are not merged, we qualitatively analyze 600 PRs to derive a hierarchical taxonomy of rejection patterns. This analysis complements the quantitative findings in RQ1 by uncovering rejection reasons not captured by quantitative metrics, including lack of meaningful reviewer engagement, duplicate PRs, unwanted feature implementations, and agent misalignment. Together, our findings highlight key socio-technical and human-AI collaboration factors that are critical to improving the success of future agentic workflows.

 arXiv:2601.15195v1 Announce Type: cross
Abstract: AI coding agents are now submitting pull requests (PRs) to software projects, acting not just as assistants but as autonomous contributors. As these agentic contributions are rapidly increasing across real repositories, little is known about how they behave in practice and why many of them fail to be merged. In this paper, we conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub. (RQ1) We first quantitatively characterize merged and not-merged PRs along four broad dimensions: 1) merge outcomes across task types, 2) code changes, 3) CI build results, and 4) review dynamics. We observe that tasks related to documentation, CI, and build update achieve the highest merge success, whereas performance and bug-fix tasks perform the worst. Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project’s CI/CD pipeline validation. (RQ2) To further investigate why some agentic PRs are not merged, we qualitatively analyze 600 PRs to derive a hierarchical taxonomy of rejection patterns. This analysis complements the quantitative findings in RQ1 by uncovering rejection reasons not captured by quantitative metrics, including lack of meaningful reviewer engagement, duplicate PRs, unwanted feature implementations, and agent misalignment. Together, our findings highlight key socio-technical and human-AI collaboration factors that are critical to improving the success of future agentic workflows. Read More  

Daily AI News
AI News & Insights Featured Image

A Brain-inspired Embodied Intelligence for Fluid and Fast Reflexive Robotics Control AI updates on arXiv.org

A Brain-inspired Embodied Intelligence for Fluid and Fast Reflexive Robotics Controlcs.AI updates on arXiv.org arXiv:2601.14628v1 Announce Type: cross
Abstract: Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and temporal memory inherent in biological motion. Here we present Neuromorphic Vision-Language-Action (NeuroVLA), a framework that mimics the structural organization of the bio-nervous system between the cortex, cerebellum, and spinal cord. We adopt a system-level bio-inspired design: a high-level model plans goals, an adaptive cerebellum module stabilizes motion using high-frequency sensors feedback, and a bio-inspired spinal layer executes lightning-fast actions generation. NeuroVLA represents the first deployment of a neuromorphic VLA on physical robotics, achieving state-of-the-art performance. We observe the emergence of biological motor characteristics without additional data or special guidance: it stops the shaking in robotic arms, saves significant energy(only 0.4w on Neuromorphic Processor), shows temporal memory ability and triggers safety reflexes in less than 20 milliseconds.

 arXiv:2601.14628v1 Announce Type: cross
Abstract: Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and temporal memory inherent in biological motion. Here we present Neuromorphic Vision-Language-Action (NeuroVLA), a framework that mimics the structural organization of the bio-nervous system between the cortex, cerebellum, and spinal cord. We adopt a system-level bio-inspired design: a high-level model plans goals, an adaptive cerebellum module stabilizes motion using high-frequency sensors feedback, and a bio-inspired spinal layer executes lightning-fast actions generation. NeuroVLA represents the first deployment of a neuromorphic VLA on physical robotics, achieving state-of-the-art performance. We observe the emergence of biological motor characteristics without additional data or special guidance: it stops the shaking in robotic arms, saves significant energy(only 0.4w on Neuromorphic Processor), shows temporal memory ability and triggers safety reflexes in less than 20 milliseconds. Read More  

Daily AI News
AI News & Insights Featured Image

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection AI updates on arXiv.org

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protectioncs.AI updates on arXiv.org arXiv:2601.08223v3 Announce Type: replace-cross
Abstract: The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens — leading to high-perplexity inputs susceptible to filtering — or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.

 arXiv:2601.08223v3 Announce Type: replace-cross
Abstract: The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens — leading to high-perplexity inputs susceptible to filtering — or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection. Read More  

Daily AI News
AI News & Insights Featured Image

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs AI updates on arXiv.org

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphscs.AI updates on arXiv.org arXiv:2601.15279v1 Announce Type: cross
Abstract: A molecule’s properties are fundamentally determined by its composition and structure encoded in its molecular graph. Thus, reasoning about molecular properties requires the ability to parse and understand the molecular graph. Large Language Models (LLMs) are increasingly applied to chemistry, tackling tasks such as molecular name conversion, captioning, text-guided generation, and property or reaction prediction. Most existing benchmarks emphasize general chemical knowledge, rely on literature or surrogate labels that risk leakage or bias, or reduce evaluation to multiple-choice questions. We introduce MolecularIQ, a molecular structure reasoning benchmark focused exclusively on symbolically verifiable tasks. MolecularIQ enables fine-grained evaluation of reasoning over molecular graphs and reveals capability patterns that localize model failures to specific tasks and molecular structures. This provides actionable insights into the strengths and limitations of current chemistry LLMs and guides the development of models that reason faithfully over molecular structure.

 arXiv:2601.15279v1 Announce Type: cross
Abstract: A molecule’s properties are fundamentally determined by its composition and structure encoded in its molecular graph. Thus, reasoning about molecular properties requires the ability to parse and understand the molecular graph. Large Language Models (LLMs) are increasingly applied to chemistry, tackling tasks such as molecular name conversion, captioning, text-guided generation, and property or reaction prediction. Most existing benchmarks emphasize general chemical knowledge, rely on literature or surrogate labels that risk leakage or bias, or reduce evaluation to multiple-choice questions. We introduce MolecularIQ, a molecular structure reasoning benchmark focused exclusively on symbolically verifiable tasks. MolecularIQ enables fine-grained evaluation of reasoning over molecular graphs and reveals capability patterns that localize model failures to specific tasks and molecular structures. This provides actionable insights into the strengths and limitations of current chemistry LLMs and guides the development of models that reason faithfully over molecular structure. Read More  

Daily AI News
AI News & Insights Featured Image

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight AI updates on arXiv.org

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversightcs.AI updates on arXiv.org arXiv:2512.19691v2 Announce Type: replace
Abstract: We examine the reliability of a widely used clinical AI benchmark whose reference labels were partially generated by LLMs, and find that a substantial fraction are clinically misaligned. We introduce a phased stewardship procedure to amplify the positive impact of physician experts’ feedback and then demonstrate, via a controlled RL experiment, how uncaught label bias can materially affect downstream LLM evaluation and alignment. Our results demonstrate that partially LLM-generated labels can embed systemic errors that distort not only evaluation but also downstream model alignment. By adopting a hybrid oversight system, we can prioritize scarce expert feedback to maintain benchmarks as living, clinically-grounded documents. Ensuring this alignment is a prerequisite for the safe deployment of LLMs in high-stakes medical decision support.

 arXiv:2512.19691v2 Announce Type: replace
Abstract: We examine the reliability of a widely used clinical AI benchmark whose reference labels were partially generated by LLMs, and find that a substantial fraction are clinically misaligned. We introduce a phased stewardship procedure to amplify the positive impact of physician experts’ feedback and then demonstrate, via a controlled RL experiment, how uncaught label bias can materially affect downstream LLM evaluation and alignment. Our results demonstrate that partially LLM-generated labels can embed systemic errors that distort not only evaluation but also downstream model alignment. By adopting a hybrid oversight system, we can prioritize scarce expert feedback to maintain benchmarks as living, clinically-grounded documents. Ensuring this alignment is a prerequisite for the safe deployment of LLMs in high-stakes medical decision support. Read More  

Daily AI News
AI News & Insights Featured Image

Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy AI updates on arXiv.org

Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policycs.AI updates on arXiv.org arXiv:2412.04426v3 Announce Type: replace-cross
Abstract: The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions.

 arXiv:2412.04426v3 Announce Type: replace-cross
Abstract: The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions. Read More  

Daily AI News
AI News & Insights Featured Image

Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames Towards Data Science

Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFramesTowards Data Science Master the art of readable, high-performance data selection using .query(), .isin(), and advanced vectorized logic.
The post Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames appeared first on Towards Data Science.

 Master the art of readable, high-performance data selection using .query(), .isin(), and advanced vectorized logic.
The post Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames appeared first on Towards Data Science. Read More  

Daily AI News
5 Breakthroughs in Graph Neural Networks to Watch in 2026 KDnuggets

5 Breakthroughs in Graph Neural Networks to Watch in 2026 KDnuggets

5 Breakthroughs in Graph Neural Networks to Watch in 2026KDnuggets This article outlines 5 recent breakthroughs in GNNs that are worth watching in the year ahead: from integration with LLMs to interdisciplinary scientific discoveries.

 This article outlines 5 recent breakthroughs in GNNs that are worth watching in the year ahead: from integration with LLMs to interdisciplinary scientific discoveries. Read More  

Daily AI News
AI News & Insights Featured Image

What Other Industries Can Learn from Healthcare’s Knowledge Graphs Towards Data Science

What Other Industries Can Learn from Healthcare’s Knowledge GraphsTowards Data Science How shared meaning, evidence, and standards create durable semantic infrastructure
The post What Other Industries Can Learn from Healthcare’s Knowledge Graphs appeared first on Towards Data Science.

 How shared meaning, evidence, and standards create durable semantic infrastructure
The post What Other Industries Can Learn from Healthcare’s Knowledge Graphs appeared first on Towards Data Science. Read More  

Daily AI News
AI News & Insights Featured Image

Gates Foundation and OpenAI test AI in African healthcare AI News

Gates Foundation and OpenAI test AI in African healthcareAI News Primary healthcare systems across parts of Africa are under growing strain, caught between rising demand, chronic staff shortages, and shrinking international aid budgets. In that context, AI is being tested in healthcare less as a breakthrough technology and more as a way to keep basic services running. According to reporting by Reuters, the Gates Foundation
The post Gates Foundation and OpenAI test AI in African healthcare appeared first on AI News.

 Primary healthcare systems across parts of Africa are under growing strain, caught between rising demand, chronic staff shortages, and shrinking international aid budgets. In that context, AI is being tested in healthcare less as a breakthrough technology and more as a way to keep basic services running. According to reporting by Reuters, the Gates Foundation
The post Gates Foundation and OpenAI test AI in African healthcare appeared first on AI News. Read More