Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration AI updates on arXiv.org

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integrationcs.AI updates on arXiv.org arXiv:2512.22010v1 Announce Type: cross
Abstract: Unmanned aerial vehicles (UAVs) are crucial tools for post-disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures, especially in long-horizon navigation. However, current UAV vision-and-language navigation(VLN) methods struggle to model long-horizon spatiotemporal context in complex environments, resulting in inaccurate semantic alignment and unstable path planning. To this end, we propose LongFly, a spatiotemporal context modeling framework for long-horizon UAV VLN. LongFly proposes a history-aware spatiotemporal modeling strategy that transforms fragmented and redundant historical data into structured, compact, and expressive representations. First, we propose the slot-based historical image compression module, which dynamically distills multi-view historical observations into fixed-length contextual representations. Then, the spatiotemporal trajectory encoding module is introduced to capture the temporal dynamics and spatial structure of UAV trajectories. Finally, to integrate existing spatiotemporal context with current observations, we design the prompt-guided multimodal integration module to support time-based reasoning and robust waypoint prediction. Experimental results demonstrate that LongFly outperforms state-of-the-art UAV VLN baselines by 7.89% in success rate and 6.33% in success weighted by path length, consistently across both seen and unseen environments.

 arXiv:2512.22010v1 Announce Type: cross
Abstract: Unmanned aerial vehicles (UAVs) are crucial tools for post-disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures, especially in long-horizon navigation. However, current UAV vision-and-language navigation(VLN) methods struggle to model long-horizon spatiotemporal context in complex environments, resulting in inaccurate semantic alignment and unstable path planning. To this end, we propose LongFly, a spatiotemporal context modeling framework for long-horizon UAV VLN. LongFly proposes a history-aware spatiotemporal modeling strategy that transforms fragmented and redundant historical data into structured, compact, and expressive representations. First, we propose the slot-based historical image compression module, which dynamically distills multi-view historical observations into fixed-length contextual representations. Then, the spatiotemporal trajectory encoding module is introduced to capture the temporal dynamics and spatial structure of UAV trajectories. Finally, to integrate existing spatiotemporal context with current observations, we design the prompt-guided multimodal integration module to support time-based reasoning and robust waypoint prediction. Experimental results demonstrate that LongFly outperforms state-of-the-art UAV VLN baselines by 7.89% in success rate and 6.33% in success weighted by path length, consistently across both seen and unseen environments. Read More  

News
AI News & Insights Featured Image

NEMO-4-PAYPAL: Leveraging NVIDIA’s Nemo Framework for empowering PayPal’s Commerce Agent AI updates on arXiv.org

NEMO-4-PAYPAL: Leveraging NVIDIA’s Nemo Framework for empowering PayPal’s Commerce Agentcs.AI updates on arXiv.org arXiv:2512.21578v1 Announce Type: new
Abstract: We present the development and optimization of PayPal’s Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Through our strategic partnership with NVIDIA, we leveraged the NeMo Framework for LLM model fine-tuning to enhance agent performance. Specifically, we optimized the Search and Discovery agent by replacing our base model with a fine-tuned Nemotron small language model (SLM).
We conducted comprehensive experiments using the llama3.1-nemotron-nano-8B-v1 architecture, training LoRA-based models through systematic hyperparameter sweeps across learning rates, optimizers (Adam, AdamW), cosine annealing schedules, and LoRA ranks. Our contributions include: (1) the first application of NVIDIA’s NeMo Framework to commerce-specific agent optimization, (2) LLM powered fine-tuning strategy for retrieval-focused commerce tasks, (3) demonstration of significant improvements in latency and cost while maintaining agent quality, and (4) a scalable framework for multi-agent system optimization in production e-commerce environments. Our results demonstrate that the fine-tuned Nemotron SLM effectively resolves the key performance issue in the retrieval component, which represents over 50% of total agent response time, while maintaining or enhancing overall system performance.

 arXiv:2512.21578v1 Announce Type: new
Abstract: We present the development and optimization of PayPal’s Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Through our strategic partnership with NVIDIA, we leveraged the NeMo Framework for LLM model fine-tuning to enhance agent performance. Specifically, we optimized the Search and Discovery agent by replacing our base model with a fine-tuned Nemotron small language model (SLM).
We conducted comprehensive experiments using the llama3.1-nemotron-nano-8B-v1 architecture, training LoRA-based models through systematic hyperparameter sweeps across learning rates, optimizers (Adam, AdamW), cosine annealing schedules, and LoRA ranks. Our contributions include: (1) the first application of NVIDIA’s NeMo Framework to commerce-specific agent optimization, (2) LLM powered fine-tuning strategy for retrieval-focused commerce tasks, (3) demonstration of significant improvements in latency and cost while maintaining agent quality, and (4) a scalable framework for multi-agent system optimization in production e-commerce environments. Our results demonstrate that the fine-tuned Nemotron SLM effectively resolves the key performance issue in the retrieval component, which represents over 50% of total agent response time, while maintaining or enhancing overall system performance. Read More  

News
AI News & Insights Featured Image

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics AI updates on arXiv.org

When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamicscs.AI updates on arXiv.org arXiv:2512.18209v2 Announce Type: replace-cross
Abstract: Empirical power–law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution–Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse–grained dynamical description of training. Within GRSD, power–law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process.
In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse–grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log–shift invariance of renormalized shell couplings. We further show that power–law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log–shift invariance is combined with the intrinsic time–rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power–law form.

 arXiv:2512.18209v2 Announce Type: replace-cross
Abstract: Empirical power–law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution–Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse–grained dynamical description of training. Within GRSD, power–law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process.
In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse–grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log–shift invariance of renormalized shell couplings. We further show that power–law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log–shift invariance is combined with the intrinsic time–rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power–law form. Read More  

News
AI News & Insights Featured Image

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning AI updates on arXiv.org

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learningcs.AI updates on arXiv.org arXiv:2512.16917v2 Announce Type: replace
Abstract: Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plausible but invalid steps. In this paper, we introduce Generative Adversarial Reasoner, an on-policy joint training framework designed to enhance reasoning by co-evolving an LLM reasoner and an LLM-based discriminator through adversarial reinforcement learning. A compute-efficient review schedule partitions each reasoning chain into logically complete slices of comparable length, and the discriminator evaluates each slice’s soundness with concise, structured justifications. Learning couples complementary signals: the LLM reasoner is rewarded for logically consistent steps that yield correct answers, while the discriminator earns rewards for correctly detecting errors or distinguishing traces in the reasoning process. This produces dense, well-calibrated, on-policy step-level rewards that supplement sparse exact-match signals, improving credit assignment, increasing sample efficiency, and enhancing overall reasoning quality of LLMs. Across various mathematical benchmarks, the method delivers consistent gains over strong baselines with standard RL post-training. Specifically, on AIME24, we improve DeepSeek-R1-Distill-Qwen-7B from 54.0 to 61.3 (+7.3) and DeepSeek-R1-Distill-Llama-8B from 43.7 to 53.7 (+10.0). The modular discriminator also enables flexible reward shaping for objectives such as teacher distillation, preference alignment, and mathematical proof-based reasoning.

 arXiv:2512.16917v2 Announce Type: replace
Abstract: Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plausible but invalid steps. In this paper, we introduce Generative Adversarial Reasoner, an on-policy joint training framework designed to enhance reasoning by co-evolving an LLM reasoner and an LLM-based discriminator through adversarial reinforcement learning. A compute-efficient review schedule partitions each reasoning chain into logically complete slices of comparable length, and the discriminator evaluates each slice’s soundness with concise, structured justifications. Learning couples complementary signals: the LLM reasoner is rewarded for logically consistent steps that yield correct answers, while the discriminator earns rewards for correctly detecting errors or distinguishing traces in the reasoning process. This produces dense, well-calibrated, on-policy step-level rewards that supplement sparse exact-match signals, improving credit assignment, increasing sample efficiency, and enhancing overall reasoning quality of LLMs. Across various mathematical benchmarks, the method delivers consistent gains over strong baselines with standard RL post-training. Specifically, on AIME24, we improve DeepSeek-R1-Distill-Qwen-7B from 54.0 to 61.3 (+7.3) and DeepSeek-R1-Distill-Llama-8B from 43.7 to 53.7 (+10.0). The modular discriminator also enables flexible reward shaping for objectives such as teacher distillation, preference alignment, and mathematical proof-based reasoning. Read More  

News
AI News & Insights Featured Image

Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications AI updates on arXiv.org

Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applicationscs.AI updates on arXiv.org arXiv:2512.22113v1 Announce Type: cross
Abstract: Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. Prior research identifies code- and configuration-related issues as the predominant category of root causes in cloud incidents. This paper introduces PRAXIS, an orchestrator that manages and deploys an agentic workflow for diagnosing code- and configuration-caused cloud incidents. PRAXIS employs an LLM-driven structured traversal over two types of graph: (1) a service dependency graph (SDG) that captures microservice-level dependencies; and (2) a hammock-block program dependence graph (PDG) that captures code-level dependencies for each microservice. Together, these graphs encode microservice- and code-level dependencies and the LLM acts as a traversal policy over these graphs, moving between services and code dependencies to localize and explain failures. Compared to state-of-the-art ReAct baselines, PRAXIS improves RCA accuracy by up to 3.1x while reducing token consumption by 3.8x. PRAXIS is demonstrated on a set of 30 comprehensive real-world incidents that is being compiled into an RCA benchmark.

 arXiv:2512.22113v1 Announce Type: cross
Abstract: Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. Prior research identifies code- and configuration-related issues as the predominant category of root causes in cloud incidents. This paper introduces PRAXIS, an orchestrator that manages and deploys an agentic workflow for diagnosing code- and configuration-caused cloud incidents. PRAXIS employs an LLM-driven structured traversal over two types of graph: (1) a service dependency graph (SDG) that captures microservice-level dependencies; and (2) a hammock-block program dependence graph (PDG) that captures code-level dependencies for each microservice. Together, these graphs encode microservice- and code-level dependencies and the LLM acts as a traversal policy over these graphs, moving between services and code dependencies to localize and explain failures. Compared to state-of-the-art ReAct baselines, PRAXIS improves RCA accuracy by up to 3.1x while reducing token consumption by 3.8x. PRAXIS is demonstrated on a set of 30 comprehensive real-world incidents that is being compiled into an RCA benchmark. Read More  

News
AI News & Insights Featured Image

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism AI updates on arXiv.org

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelismcs.AI updates on arXiv.org arXiv:2512.21487v1 Announce Type: cross
Abstract: The mixture-of-experts (MoE) architecture scales model size with sublinear computational increase but suffers from memory-intensive inference due to KV caches and sparse expert activation. Recent disaggregated expert parallelism (DEP) distributes attention and experts to dedicated GPU groups but lacks support for shared experts and efficient task scheduling, limiting performance.
We propose FinDEP, a fine-grained task scheduling algorithm for DEP that maximizes task overlap to improve MoE inference throughput. FinDEP introduces three innovations: 1) partitioning computation/communication into smaller tasks for fine-grained pipelining, 2) formulating a scheduling optimization supporting variable granularity and ordering, and 3) developing an efficient solver for this large search space.
Experiments on four GPU systems with DeepSeek-V2 and Qwen3-MoE show FinDEP improves throughput by up to 1.61x over prior methods, achieving up to 1.24x speedup on a 32-GPU system.

 arXiv:2512.21487v1 Announce Type: cross
Abstract: The mixture-of-experts (MoE) architecture scales model size with sublinear computational increase but suffers from memory-intensive inference due to KV caches and sparse expert activation. Recent disaggregated expert parallelism (DEP) distributes attention and experts to dedicated GPU groups but lacks support for shared experts and efficient task scheduling, limiting performance.
We propose FinDEP, a fine-grained task scheduling algorithm for DEP that maximizes task overlap to improve MoE inference throughput. FinDEP introduces three innovations: 1) partitioning computation/communication into smaller tasks for fine-grained pipelining, 2) formulating a scheduling optimization supporting variable granularity and ordering, and 3) developing an efficient solver for this large search space.
Experiments on four GPU systems with DeepSeek-V2 and Qwen3-MoE show FinDEP improves throughput by up to 1.61x over prior methods, achieving up to 1.24x speedup on a 32-GPU system. Read More  

News
AI News & Insights Featured Image

Feasible strategies in three-way conflict analysis with three-valued ratings AI updates on arXiv.org

Feasible strategies in three-way conflict analysis with three-valued ratingscs.AI updates on arXiv.org arXiv:2512.21420v1 Announce Type: new
Abstract: Most existing work on three-way conflict analysis has focused on trisecting agent pairs, agents, or issues, which contributes to understanding the nature of conflicts but falls short in addressing their resolution. Specifically, the formulation of feasible strategies, as an essential component of conflict resolution and mitigation, has received insufficient scholarly attention. Therefore, this paper aims to investigate feasible strategies from two perspectives of consistency and non-consistency. Particularly, we begin with computing the overall rating of a clique of agents based on positive and negative similarity degrees. Afterwards, considering the weights of both agents and issues, we propose weighted consistency and non-consistency measures, which are respectively used to identify the feasible strategies for a clique of agents. Algorithms are developed to identify feasible strategies, $L$-order feasible strategies, and the corresponding optimal ones. Finally, to demonstrate the practicality, effectiveness, and superiority of the proposed models, we apply them to two commonly used case studies on NBA labor negotiations and development plans for Gansu Province and conduct a sensitivity analysis on parameters and a comparative analysis with existing state-of-the-art conflict analysis approaches. The comparison results demonstrate that our conflict resolution models outperform the conventional approaches by unifying weighted agent-issue evaluation with consistency and non-consistency measures to enable the systematic identification of not only feasible strategies but also optimal solutions.

 arXiv:2512.21420v1 Announce Type: new
Abstract: Most existing work on three-way conflict analysis has focused on trisecting agent pairs, agents, or issues, which contributes to understanding the nature of conflicts but falls short in addressing their resolution. Specifically, the formulation of feasible strategies, as an essential component of conflict resolution and mitigation, has received insufficient scholarly attention. Therefore, this paper aims to investigate feasible strategies from two perspectives of consistency and non-consistency. Particularly, we begin with computing the overall rating of a clique of agents based on positive and negative similarity degrees. Afterwards, considering the weights of both agents and issues, we propose weighted consistency and non-consistency measures, which are respectively used to identify the feasible strategies for a clique of agents. Algorithms are developed to identify feasible strategies, $L$-order feasible strategies, and the corresponding optimal ones. Finally, to demonstrate the practicality, effectiveness, and superiority of the proposed models, we apply them to two commonly used case studies on NBA labor negotiations and development plans for Gansu Province and conduct a sensitivity analysis on parameters and a comparative analysis with existing state-of-the-art conflict analysis approaches. The comparison results demonstrate that our conflict resolution models outperform the conventional approaches by unifying weighted agent-issue evaluation with consistency and non-consistency measures to enable the systematic identification of not only feasible strategies but also optimal solutions. Read More  

News
AI News & Insights Featured Image

A Comparison of DeepSeek and Other LLMs AI updates on arXiv.org

A Comparison of DeepSeek and Other LLMscs.AI updates on arXiv.org arXiv:2502.03688v3 Announce Type: replace-cross
Abstract: Recently, DeepSeek has been the focus of attention in and beyond the AI community. An interesting problem is how DeepSeek compares to other large language models (LLMs). There are many tasks an LLM can do, and in this paper, we use the task of “predicting an outcome using a short text” for comparison. We consider two settings, an authorship classification setting and a citation classification setting. In the first one, the goal is to determine whether a short text is written by human or AI. In the second one, the goal is to classify a citation to one of four types using the textual content. For each experiment, we compare DeepSeek with $4$ popular LLMs: Claude, Gemini, GPT, and Llama.
We find that, in terms of classification accuracy, DeepSeek outperforms Gemini, GPT, and Llama in most cases, but underperforms Claude. We also find that DeepSeek is comparably slower than others but with a low cost to use, while Claude is much more expensive than all the others. Finally, we find that in terms of similarity, the output of DeepSeek is most similar to those of Gemini and Claude (and among all $5$ LLMs, Claude and Gemini have the most similar outputs).
In this paper, we also present a fully-labeled dataset collected by ourselves, and propose a recipe where we can use the LLMs and a recent data set, MADStat, to generate new data sets. The datasets in our paper can be used as benchmarks for future study on LLMs.

 arXiv:2502.03688v3 Announce Type: replace-cross
Abstract: Recently, DeepSeek has been the focus of attention in and beyond the AI community. An interesting problem is how DeepSeek compares to other large language models (LLMs). There are many tasks an LLM can do, and in this paper, we use the task of “predicting an outcome using a short text” for comparison. We consider two settings, an authorship classification setting and a citation classification setting. In the first one, the goal is to determine whether a short text is written by human or AI. In the second one, the goal is to classify a citation to one of four types using the textual content. For each experiment, we compare DeepSeek with $4$ popular LLMs: Claude, Gemini, GPT, and Llama.
We find that, in terms of classification accuracy, DeepSeek outperforms Gemini, GPT, and Llama in most cases, but underperforms Claude. We also find that DeepSeek is comparably slower than others but with a low cost to use, while Claude is much more expensive than all the others. Finally, we find that in terms of similarity, the output of DeepSeek is most similar to those of Gemini and Claude (and among all $5$ LLMs, Claude and Gemini have the most similar outputs).
In this paper, we also present a fully-labeled dataset collected by ourselves, and propose a recipe where we can use the LLMs and a recent data set, MADStat, to generate new data sets. The datasets in our paper can be used as benchmarks for future study on LLMs. Read More  

News
AI News & Insights Featured Image

FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinationscs.AI updates on arXiv.org

FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinationscs.AI updates on arXiv.org arXiv:2512.07015v2 Announce Type: replace-cross
Abstract: Retrieval-Augmented Generation (RAG) reduces hallucinations by grounding answers in retrieved evidence, yet standard retrievers often exhibit retrieval sycophancy: they preferentially surface evidence that supports a user’s premise, even when the premise is false. We propose FVA-RAG (Falsification-Verification Alignment RAG), a pipeline that inverts the standard RAG workflow by treating the initial response as a draft hypothesis and explicitly retrieving anti-context to stress-test it. We evaluate on the full TruthfulQA-Generation benchmark (N=817) under a fully frozen protocol with 0 live web calls and identical retrieval budgets across methods. Using gpt-4o for generation and deterministic judging, FVA-RAG achieves 79.80-80.05% accuracy across two independently built frozen corpora , significantly outperforming prompted variants of Self-RAG (71.11-72.22%) and CRAG (71.36-73.93%) with p < 10^-6 according to McNemar’s test. FVA-RAG triggers falsification on 24.5-29.3% of queries, demonstrating that targeted counter-evidence retrieval is decisive for mitigating premise-confirming hallucinations.

 arXiv:2512.07015v2 Announce Type: replace-cross
Abstract: Retrieval-Augmented Generation (RAG) reduces hallucinations by grounding answers in retrieved evidence, yet standard retrievers often exhibit retrieval sycophancy: they preferentially surface evidence that supports a user’s premise, even when the premise is false. We propose FVA-RAG (Falsification-Verification Alignment RAG), a pipeline that inverts the standard RAG workflow by treating the initial response as a draft hypothesis and explicitly retrieving anti-context to stress-test it. We evaluate on the full TruthfulQA-Generation benchmark (N=817) under a fully frozen protocol with 0 live web calls and identical retrieval budgets across methods. Using gpt-4o for generation and deterministic judging, FVA-RAG achieves 79.80-80.05% accuracy across two independently built frozen corpora , significantly outperforming prompted variants of Self-RAG (71.11-72.22%) and CRAG (71.36-73.93%) with p < 10^-6 according to McNemar’s test. FVA-RAG triggers falsification on 24.5-29.3% of queries, demonstrating that targeted counter-evidence retrieval is decisive for mitigating premise-confirming hallucinations. Read More  

News
AI News & Insights Featured Image

How to Build Production-Grade Agentic Workflows with GraphBit Using Deterministic Tools, Validated Execution Graphs, and Optional LLM OrchestrationMarkTechPost

How to Build Production-Grade Agentic Workflows with GraphBit Using Deterministic Tools, Validated Execution Graphs, and Optional LLM OrchestrationMarkTechPost In this tutorial, we build an end-to-end, production-style agentic workflow using GraphBit that demonstrates how graph-structured execution, tool calling, and optional LLM-driven agents can coexist in a single system. We start by initializing and inspecting the GraphBit runtime, then define a realistic customer-support ticket domain with typed data structures and deterministic, offline-executable tools. We show
The post How to Build Production-Grade Agentic Workflows with GraphBit Using Deterministic Tools, Validated Execution Graphs, and Optional LLM Orchestration appeared first on MarkTechPost.

 In this tutorial, we build an end-to-end, production-style agentic workflow using GraphBit that demonstrates how graph-structured execution, tool calling, and optional LLM-driven agents can coexist in a single system. We start by initializing and inspecting the GraphBit runtime, then define a realistic customer-support ticket domain with typed data structures and deterministic, offline-executable tools. We show
The post How to Build Production-Grade Agentic Workflows with GraphBit Using Deterministic Tools, Validated Execution Graphs, and Optional LLM Orchestration appeared first on MarkTechPost. Read More