Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

Think Your Python Code Is Slow? Stop Guessing and Start Measuring Towards Data Science

Think Your Python Code Is Slow? Stop Guessing and Start MeasuringTowards Data Science A hands-on tour of using cProfile + SnakeViz to find (and fix) the “hot” paths in your code.
The post Think Your Python Code Is Slow? Stop Guessing and Start Measuring appeared first on Towards Data Science.

 A hands-on tour of using cProfile + SnakeViz to find (and fix) the “hot” paths in your code.
The post Think Your Python Code Is Slow? Stop Guessing and Start Measuring appeared first on Towards Data Science. Read More  

News
AI News & Insights Featured Image

Agentic AI for Scaling Diagnosis and Care in Neurodegenerative Disease AI updates on arXiv.org

Agentic AI for Scaling Diagnosis and Care in Neurodegenerative Diseasecs.AI updates on arXiv.org arXiv:2502.06842v4 Announce Type: replace-cross
Abstract: United States healthcare systems are struggling to meet the growing demand for neurological care, particularly in Alzheimer’s disease and related dementias (ADRD). Generative AI built on language models (LLMs) now enables agentic AI systems that can enhance clinician capabilities to approach specialist-level assessment and decision-making in ADRD care at scale. This article presents a comprehensive six-phase roadmap for responsible design and integration of such systems into ADRD care: (1) high-quality standardized data collection across modalities; (2) decision support; (3) clinical integration enhancing workflows; (4) rigorous validation and monitoring protocols; (5) continuous learning through clinical feedback; and (6) robust ethics and risk management frameworks. This human centered approach optimizes clinicians’ capabilities in comprehensive data collection, interpretation of complex clinical information, and timely application of relevant medical knowledge while prioritizing patient safety, healthcare equity, and transparency. Though focused on ADRD, these principles offer broad applicability across medical specialties facing similar systemic challenges.

 arXiv:2502.06842v4 Announce Type: replace-cross
Abstract: United States healthcare systems are struggling to meet the growing demand for neurological care, particularly in Alzheimer’s disease and related dementias (ADRD). Generative AI built on language models (LLMs) now enables agentic AI systems that can enhance clinician capabilities to approach specialist-level assessment and decision-making in ADRD care at scale. This article presents a comprehensive six-phase roadmap for responsible design and integration of such systems into ADRD care: (1) high-quality standardized data collection across modalities; (2) decision support; (3) clinical integration enhancing workflows; (4) rigorous validation and monitoring protocols; (5) continuous learning through clinical feedback; and (6) robust ethics and risk management frameworks. This human centered approach optimizes clinicians’ capabilities in comprehensive data collection, interpretation of complex clinical information, and timely application of relevant medical knowledge while prioritizing patient safety, healthcare equity, and transparency. Though focused on ADRD, these principles offer broad applicability across medical specialties facing similar systemic challenges. Read More  

News
AI News & Insights Featured Image

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning AI updates on arXiv.org

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learningcs.AI updates on arXiv.org arXiv:2512.20605v2 Announce Type: replace-cross
Abstract: Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term “internal RL”, enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.

 arXiv:2512.20605v2 Announce Type: replace-cross
Abstract: Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term “internal RL”, enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models. Read More  

News
AI News & Insights Featured Image

WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networks AI updates on arXiv.org

WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networkscs.AI updates on arXiv.org arXiv:2506.08602v2 Announce Type: replace-cross
Abstract: Graph Neural Networks (GNNs) are increasingly deployed in real-world applications, making ownership verification critical to protect their intellectual property against model theft. Fingerprinting and black-box watermarking are two main methods. However, the former relies on determining model similarity, which is computationally expensive and prone to ownership collisions after model post-processing. The latter embeds backdoors, exposing watermarked models to the risk of backdoor attacks. Moreover, both previous methods enable ownership verification but do not convey additional information about the copy model. If the owner has multiple models, each model requires a distinct trigger graph.
To address these challenges, this paper proposes WGLE, a novel black-box watermarking paradigm for GNNs that enables embedding the multi-bit string in GNN models without using backdoors. WGLE builds on a key insight we term Layer-wise Distance Difference on an Edge (LDDE), which quantifies the difference between the feature distance and the prediction distance of two connected nodes in a graph. By assigning unique LDDE values to the edges and employing the LDDE sequence as the watermark, WGLE supports multi-bit capacity without relying on backdoor mechanisms. We evaluate WGLE on six public datasets across six mainstream GNN architectures, and compare WGLE with state-of-the-art GNN watermarking and fingerprinting methods. WGLE achieves 100% ownership verification accuracy, with an average fidelity degradation of only 1.41%. Additionally, WGLE exhibits robust resilience against potential attacks. The code is available in the repository.

 arXiv:2506.08602v2 Announce Type: replace-cross
Abstract: Graph Neural Networks (GNNs) are increasingly deployed in real-world applications, making ownership verification critical to protect their intellectual property against model theft. Fingerprinting and black-box watermarking are two main methods. However, the former relies on determining model similarity, which is computationally expensive and prone to ownership collisions after model post-processing. The latter embeds backdoors, exposing watermarked models to the risk of backdoor attacks. Moreover, both previous methods enable ownership verification but do not convey additional information about the copy model. If the owner has multiple models, each model requires a distinct trigger graph.
To address these challenges, this paper proposes WGLE, a novel black-box watermarking paradigm for GNNs that enables embedding the multi-bit string in GNN models without using backdoors. WGLE builds on a key insight we term Layer-wise Distance Difference on an Edge (LDDE), which quantifies the difference between the feature distance and the prediction distance of two connected nodes in a graph. By assigning unique LDDE values to the edges and employing the LDDE sequence as the watermark, WGLE supports multi-bit capacity without relying on backdoor mechanisms. We evaluate WGLE on six public datasets across six mainstream GNN architectures, and compare WGLE with state-of-the-art GNN watermarking and fingerprinting methods. WGLE achieves 100% ownership verification accuracy, with an average fidelity degradation of only 1.41%. Additionally, WGLE exhibits robust resilience against potential attacks. The code is available in the repository. Read More  

News
AI News & Insights Featured Image

Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy AI updates on arXiv.org

Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracycs.AI updates on arXiv.org arXiv:2512.21017v1 Announce Type: cross
Abstract: With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.

 arXiv:2512.21017v1 Announce Type: cross
Abstract: With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens. Read More  

News
AI News & Insights Featured Image

Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracy AI updates on arXiv.org

Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracycs.AI updates on arXiv.org arXiv:2512.13725v2 Announce Type: replace
Abstract: Causal reasoning in Large Language Models spanning association, intervention, and counterfactual inference is essential for reliable decision making in high stakes settings. As deployment shifts toward edge and resource constrained environments, quantized models such as INT8 and NF4 are becoming standard. Yet the impact of precision reduction on formal causal reasoning is poorly understood. To our knowledge, this is the first study to systematically evaluate quantization effects across all three levels of Pearls Causal Ladder. Using a 3000 sample stratified CLadder benchmark, we find that rung level accuracy in Llama 3 8B remains broadly stable under quantization, with NF4 showing less than one percent overall degradation. Interventional queries at rung 2 are the most sensitive to precision loss, whereas counterfactual reasoning at rung 3 is comparatively stable but exhibits heterogeneous weaknesses across query types such as collider bias and backdoor adjustment. Experiments on the CRASS benchmark show near identical performance across precisions, indicating that existing commonsense counterfactual datasets lack the structural sensitivity needed to reveal quantization induced reasoning drift. We further evaluate Graph Retrieval Augmented Generation using ground truth causal graphs and observe a consistent improvement in NF4 interventional accuracy of plus 1.7 percent, partially offsetting compression related degradation. These results suggest that causal reasoning is unexpectedly robust to four bit quantization, graph structured augmentation can selectively reinforce interventional reasoning, and current counterfactual benchmarks fail to capture deeper causal brittleness. This work provides an initial empirical map of compressed causal reasoning and practical guidance for deploying efficient and structurally supported causal AI systems.

 arXiv:2512.13725v2 Announce Type: replace
Abstract: Causal reasoning in Large Language Models spanning association, intervention, and counterfactual inference is essential for reliable decision making in high stakes settings. As deployment shifts toward edge and resource constrained environments, quantized models such as INT8 and NF4 are becoming standard. Yet the impact of precision reduction on formal causal reasoning is poorly understood. To our knowledge, this is the first study to systematically evaluate quantization effects across all three levels of Pearls Causal Ladder. Using a 3000 sample stratified CLadder benchmark, we find that rung level accuracy in Llama 3 8B remains broadly stable under quantization, with NF4 showing less than one percent overall degradation. Interventional queries at rung 2 are the most sensitive to precision loss, whereas counterfactual reasoning at rung 3 is comparatively stable but exhibits heterogeneous weaknesses across query types such as collider bias and backdoor adjustment. Experiments on the CRASS benchmark show near identical performance across precisions, indicating that existing commonsense counterfactual datasets lack the structural sensitivity needed to reveal quantization induced reasoning drift. We further evaluate Graph Retrieval Augmented Generation using ground truth causal graphs and observe a consistent improvement in NF4 interventional accuracy of plus 1.7 percent, partially offsetting compression related degradation. These results suggest that causal reasoning is unexpectedly robust to four bit quantization, graph structured augmentation can selectively reinforce interventional reasoning, and current counterfactual benchmarks fail to capture deeper causal brittleness. This work provides an initial empirical map of compressed causal reasoning and practical guidance for deploying efficient and structurally supported causal AI systems. Read More  

News
AI News & Insights Featured Image

AutoBaxBuilder: Bootstrapping Code Security Benchmarking AI updates on arXiv.org

AutoBaxBuilder: Bootstrapping Code Security Benchmarkingcs.AI updates on arXiv.org arXiv:2512.21132v1 Announce Type: cross
Abstract: As LLMs see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work has demonstrated that security is often overlooked, exposing that LLMs are prone to generating code with security vulnerabilities. These insights were enabled by specialized benchmarks, crafted through significant manual effort by security experts. However, relying on manually-crafted benchmarks is insufficient in the long term, because benchmarks (i) naturally end up contaminating training data, (ii) must extend to new tasks to provide a more complete picture, and (iii) must increase in difficulty to challenge more capable LLMs. In this work, we address these challenges and present AutoBaxBuilder, a framework that generates tasks and tests for code security benchmarking from scratch. We introduce a robust pipeline with fine-grained plausibility checks, leveraging the code understanding capabilities of LLMs to construct functionality tests and end-to-end security-probing exploits. To confirm the quality of the generated benchmark, we conduct both a qualitative analysis and perform quantitative experiments, comparing it against tasks constructed by human experts. We use AutoBaxBuilder to construct entirely new tasks and release them to the public as AutoBaxBench, together with a thorough evaluation of the security capabilities of LLMs on these tasks. We find that a new task can be generated in under 2 hours, costing less than USD 10.

 arXiv:2512.21132v1 Announce Type: cross
Abstract: As LLMs see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work has demonstrated that security is often overlooked, exposing that LLMs are prone to generating code with security vulnerabilities. These insights were enabled by specialized benchmarks, crafted through significant manual effort by security experts. However, relying on manually-crafted benchmarks is insufficient in the long term, because benchmarks (i) naturally end up contaminating training data, (ii) must extend to new tasks to provide a more complete picture, and (iii) must increase in difficulty to challenge more capable LLMs. In this work, we address these challenges and present AutoBaxBuilder, a framework that generates tasks and tests for code security benchmarking from scratch. We introduce a robust pipeline with fine-grained plausibility checks, leveraging the code understanding capabilities of LLMs to construct functionality tests and end-to-end security-probing exploits. To confirm the quality of the generated benchmark, we conduct both a qualitative analysis and perform quantitative experiments, comparing it against tasks constructed by human experts. We use AutoBaxBuilder to construct entirely new tasks and release them to the public as AutoBaxBench, together with a thorough evaluation of the security capabilities of LLMs on these tasks. We find that a new task can be generated in under 2 hours, costing less than USD 10. Read More  

News
AI News & Insights Featured Image

A Physics Informed Neural Network For Deriving MHD State Vectors From Global Active Regions Observations AI updates on arXiv.org

A Physics Informed Neural Network For Deriving MHD State Vectors From Global Active Regions Observationscs.AI updates on arXiv.org arXiv:2512.20747v1 Announce Type: cross
Abstract: Solar active regions (ARs) do not appear randomly but cluster along longitudinally warped toroidal bands (‘toroids’) that encode information about magnetic structures in the tachocline, where global-scale organization likely originates. Global MagnetoHydroDynamic Shallow-Water Tachocline (MHD-SWT) models have shown potential to simulate such toroids, matching observations qualitatively. For week-scale early prediction of flare-producing AR emergence, forward-integration of these toroids is necessary. This requires model initialization with a dynamically self-consistent MHD state-vector that includes magnetic, flow fields, and shell-thickness variations. However, synoptic magnetograms provide only geometric shape of toroids, not the state-vector needed to initialize MHD-SWT models. To address this challenging task, we develop PINNBARDS, a novel Physics-Informed Neural Network (PINN)-Based AR Distribution Simulator, that uses observational toroids and MHD-SWT equations to derive initial state-vector. Using Feb-14-2024 SDO/HMI synoptic map, we show that PINN converges to physically consistent, predominantly antisymmetric toroids, matching observed ones. Although surface data provides north and south toroids’ central latitudes, and their latitudinal widths, they cannot determine tachocline field strengths, connected to AR emergence. We explore here solutions across a broad parameter range, finding hydrodynamically-dominated structures for weak fields (~2 kG) and overly rigid behavior for strong fields (~100 kG). We obtain best agreement with observations for 20-30 kG toroidal fields, and ~10 degree bandwidth, consistent with low-order longitudinal mode excitation. To our knowledge, PINNBARDS serves as the first method for reconstructing state-vectors for hidden tachocline magnetic structures from surface patterns; potentially leading to weeks ahead prediction of flare-producing AR-emergence.

 arXiv:2512.20747v1 Announce Type: cross
Abstract: Solar active regions (ARs) do not appear randomly but cluster along longitudinally warped toroidal bands (‘toroids’) that encode information about magnetic structures in the tachocline, where global-scale organization likely originates. Global MagnetoHydroDynamic Shallow-Water Tachocline (MHD-SWT) models have shown potential to simulate such toroids, matching observations qualitatively. For week-scale early prediction of flare-producing AR emergence, forward-integration of these toroids is necessary. This requires model initialization with a dynamically self-consistent MHD state-vector that includes magnetic, flow fields, and shell-thickness variations. However, synoptic magnetograms provide only geometric shape of toroids, not the state-vector needed to initialize MHD-SWT models. To address this challenging task, we develop PINNBARDS, a novel Physics-Informed Neural Network (PINN)-Based AR Distribution Simulator, that uses observational toroids and MHD-SWT equations to derive initial state-vector. Using Feb-14-2024 SDO/HMI synoptic map, we show that PINN converges to physically consistent, predominantly antisymmetric toroids, matching observed ones. Although surface data provides north and south toroids’ central latitudes, and their latitudinal widths, they cannot determine tachocline field strengths, connected to AR emergence. We explore here solutions across a broad parameter range, finding hydrodynamically-dominated structures for weak fields (~2 kG) and overly rigid behavior for strong fields (~100 kG). We obtain best agreement with observations for 20-30 kG toroidal fields, and ~10 degree bandwidth, consistent with low-order longitudinal mode excitation. To our knowledge, PINNBARDS serves as the first method for reconstructing state-vectors for hidden tachocline magnetic structures from surface patterns; potentially leading to weeks ahead prediction of flare-producing AR-emergence. Read More  

News
AI News & Insights Featured Image

How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To Dashboard Towards Data Science

How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To DashboardTowards Data Science A step-by-step guide from weather API ETL to dashboard on Databricks
The post How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To Dashboard appeared first on Towards Data Science.

 A step-by-step guide from weather API ETL to dashboard on Databricks
The post How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To Dashboard appeared first on Towards Data Science. Read More