Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Daily AI News
AI News & Insights Featured Image

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs AI updates on arXiv.org

MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphscs.AI updates on arXiv.org arXiv:2601.15279v1 Announce Type: cross
Abstract: A molecule’s properties are fundamentally determined by its composition and structure encoded in its molecular graph. Thus, reasoning about molecular properties requires the ability to parse and understand the molecular graph. Large Language Models (LLMs) are increasingly applied to chemistry, tackling tasks such as molecular name conversion, captioning, text-guided generation, and property or reaction prediction. Most existing benchmarks emphasize general chemical knowledge, rely on literature or surrogate labels that risk leakage or bias, or reduce evaluation to multiple-choice questions. We introduce MolecularIQ, a molecular structure reasoning benchmark focused exclusively on symbolically verifiable tasks. MolecularIQ enables fine-grained evaluation of reasoning over molecular graphs and reveals capability patterns that localize model failures to specific tasks and molecular structures. This provides actionable insights into the strengths and limitations of current chemistry LLMs and guides the development of models that reason faithfully over molecular structure.

 arXiv:2601.15279v1 Announce Type: cross
Abstract: A molecule’s properties are fundamentally determined by its composition and structure encoded in its molecular graph. Thus, reasoning about molecular properties requires the ability to parse and understand the molecular graph. Large Language Models (LLMs) are increasingly applied to chemistry, tackling tasks such as molecular name conversion, captioning, text-guided generation, and property or reaction prediction. Most existing benchmarks emphasize general chemical knowledge, rely on literature or surrogate labels that risk leakage or bias, or reduce evaluation to multiple-choice questions. We introduce MolecularIQ, a molecular structure reasoning benchmark focused exclusively on symbolically verifiable tasks. MolecularIQ enables fine-grained evaluation of reasoning over molecular graphs and reveals capability patterns that localize model failures to specific tasks and molecular structures. This provides actionable insights into the strengths and limitations of current chemistry LLMs and guides the development of models that reason faithfully over molecular structure. Read More  

Daily AI News
AI News & Insights Featured Image

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight AI updates on arXiv.org

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversightcs.AI updates on arXiv.org arXiv:2512.19691v2 Announce Type: replace
Abstract: We examine the reliability of a widely used clinical AI benchmark whose reference labels were partially generated by LLMs, and find that a substantial fraction are clinically misaligned. We introduce a phased stewardship procedure to amplify the positive impact of physician experts’ feedback and then demonstrate, via a controlled RL experiment, how uncaught label bias can materially affect downstream LLM evaluation and alignment. Our results demonstrate that partially LLM-generated labels can embed systemic errors that distort not only evaluation but also downstream model alignment. By adopting a hybrid oversight system, we can prioritize scarce expert feedback to maintain benchmarks as living, clinically-grounded documents. Ensuring this alignment is a prerequisite for the safe deployment of LLMs in high-stakes medical decision support.

 arXiv:2512.19691v2 Announce Type: replace
Abstract: We examine the reliability of a widely used clinical AI benchmark whose reference labels were partially generated by LLMs, and find that a substantial fraction are clinically misaligned. We introduce a phased stewardship procedure to amplify the positive impact of physician experts’ feedback and then demonstrate, via a controlled RL experiment, how uncaught label bias can materially affect downstream LLM evaluation and alignment. Our results demonstrate that partially LLM-generated labels can embed systemic errors that distort not only evaluation but also downstream model alignment. By adopting a hybrid oversight system, we can prioritize scarce expert feedback to maintain benchmarks as living, clinically-grounded documents. Ensuring this alignment is a prerequisite for the safe deployment of LLMs in high-stakes medical decision support. Read More  

Daily AI News
AI News & Insights Featured Image

Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy AI updates on arXiv.org

Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policycs.AI updates on arXiv.org arXiv:2412.04426v3 Announce Type: replace-cross
Abstract: The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions.

 arXiv:2412.04426v3 Announce Type: replace-cross
Abstract: The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions. Read More  

Daily AI News
AI News & Insights Featured Image

What Other Industries Can Learn from Healthcare’s Knowledge Graphs Towards Data Science

What Other Industries Can Learn from Healthcare’s Knowledge GraphsTowards Data Science How shared meaning, evidence, and standards create durable semantic infrastructure
The post What Other Industries Can Learn from Healthcare’s Knowledge Graphs appeared first on Towards Data Science.

 How shared meaning, evidence, and standards create durable semantic infrastructure
The post What Other Industries Can Learn from Healthcare’s Knowledge Graphs appeared first on Towards Data Science. Read More  

Daily AI News
AI News & Insights Featured Image

Gates Foundation and OpenAI test AI in African healthcare AI News

Gates Foundation and OpenAI test AI in African healthcareAI News Primary healthcare systems across parts of Africa are under growing strain, caught between rising demand, chronic staff shortages, and shrinking international aid budgets. In that context, AI is being tested in healthcare less as a breakthrough technology and more as a way to keep basic services running. According to reporting by Reuters, the Gates Foundation
The post Gates Foundation and OpenAI test AI in African healthcare appeared first on AI News.

 Primary healthcare systems across parts of Africa are under growing strain, caught between rising demand, chronic staff shortages, and shrinking international aid budgets. In that context, AI is being tested in healthcare less as a breakthrough technology and more as a way to keep basic services running. According to reporting by Reuters, the Gates Foundation
The post Gates Foundation and OpenAI test AI in African healthcare appeared first on AI News. Read More  

Daily AI News
AI News & Insights Featured Image

Generative AI Purpose-built for Social and Mental Health: A Real-World Pilot AI updates on arXiv.org

Generative AI Purpose-built for Social and Mental Health: A Real-World Pilotcs.AI updates on arXiv.org arXiv:2511.11689v3 Announce Type: replace-cross
Abstract: Generative artificial intelligence (GAI) chatbots built for mental health could deliver safe, personalized, and scalable mental health support. We evaluate a foundation model designed for mental health. Adults completed mental health measures while engaging with the chatbot between May 15, 2025 and September 15, 2025. Users completed an opt-in consent, demographic information, mental health symptoms, social connection, and self-identified goals. Measures were repeated every two weeks up to 6 weeks, and a final follow-up at 10 weeks. Analyses included effect sizes, and growth mixture models to identify participant groups and their characteristic engagement, severity, and demographic factors. Users demonstrated significant reductions in PHQ-9 and GAD-7 that were sustained at follow-up. Significant improvements in Hope, Behavioral Activation, Social Interaction, Loneliness, and Perceived Social Support were observed throughout and maintained at 10 week follow-up. Engagement was high and predicted outcomes. Working alliance was comparable to traditional care and predicted outcomes. Automated safety guardrails functioned as designed, with 76 sessions flagged for risk and all handled according to escalation policies. This single arm naturalistic observational study provides initial evidence that a GAI foundation model for mental health can deliver accessible, engaging, effective, and safe mental health support. These results lend support to findings from early randomized designs and offer promise for future study of mental health GAI in real world settings.

 arXiv:2511.11689v3 Announce Type: replace-cross
Abstract: Generative artificial intelligence (GAI) chatbots built for mental health could deliver safe, personalized, and scalable mental health support. We evaluate a foundation model designed for mental health. Adults completed mental health measures while engaging with the chatbot between May 15, 2025 and September 15, 2025. Users completed an opt-in consent, demographic information, mental health symptoms, social connection, and self-identified goals. Measures were repeated every two weeks up to 6 weeks, and a final follow-up at 10 weeks. Analyses included effect sizes, and growth mixture models to identify participant groups and their characteristic engagement, severity, and demographic factors. Users demonstrated significant reductions in PHQ-9 and GAD-7 that were sustained at follow-up. Significant improvements in Hope, Behavioral Activation, Social Interaction, Loneliness, and Perceived Social Support were observed throughout and maintained at 10 week follow-up. Engagement was high and predicted outcomes. Working alliance was comparable to traditional care and predicted outcomes. Automated safety guardrails functioned as designed, with 76 sessions flagged for risk and all handled according to escalation policies. This single arm naturalistic observational study provides initial evidence that a GAI foundation model for mental health can deliver accessible, engaging, effective, and safe mental health support. These results lend support to findings from early randomized designs and offer promise for future study of mental health GAI in real world settings. Read More  

Daily AI News
AI News & Insights Featured Image

Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames Towards Data Science

Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFramesTowards Data Science Master the art of readable, high-performance data selection using .query(), .isin(), and advanced vectorized logic.
The post Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames appeared first on Towards Data Science.

 Master the art of readable, high-performance data selection using .query(), .isin(), and advanced vectorized logic.
The post Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames appeared first on Towards Data Science. Read More  

Daily AI News
5 Breakthroughs in Graph Neural Networks to Watch in 2026 KDnuggets

5 Breakthroughs in Graph Neural Networks to Watch in 2026 KDnuggets

5 Breakthroughs in Graph Neural Networks to Watch in 2026KDnuggets This article outlines 5 recent breakthroughs in GNNs that are worth watching in the year ahead: from integration with LLMs to interdisciplinary scientific discoveries.

 This article outlines 5 recent breakthroughs in GNNs that are worth watching in the year ahead: from integration with LLMs to interdisciplinary scientific discoveries. Read More