Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel Towards Data Science

The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in ExcelTowards Data Science From local distance to global probability
The post The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel appeared first on Towards Data Science.

 From local distance to global probability
The post The Machine Learning “Advent Calendar” Day 3: GNB, LDA and QDA in Excel appeared first on Towards Data Science. Read More  

News
AI News & Insights Featured Image

How to Turn Your LLM Prototype into a Production-Ready System Towards Data Science

How to Turn Your LLM Prototype into a Production-Ready SystemTowards Data Science The most famous applications of LLMs are the ones that I like to call the “wow effect LLMs.” There are plenty of viral LinkedIn posts about them, and they all sound like this: “I built [x] that does [y] in [z] minutes using AI.” Where: If you notice carefully, the focus of the sentence is
The post How to Turn Your LLM Prototype into a Production-Ready System appeared first on Towards Data Science.

 The most famous applications of LLMs are the ones that I like to call the “wow effect LLMs.” There are plenty of viral LinkedIn posts about them, and they all sound like this: “I built [x] that does [y] in [z] minutes using AI.” Where: If you notice carefully, the focus of the sentence is
The post How to Turn Your LLM Prototype into a Production-Ready System appeared first on Towards Data Science. Read More  

News
HTB AI Range offers experiments in cyber-resilience training AI News

HTB AI Range offers experiments in cyber-resilience training AI News

HTB AI Range offers experiments in cyber-resilience trainingAI News The cybersecurity training provider Hack The Box (HTB) has launched the HTB AI Range, designed to let organisations test autonomous AI security agents under realistic conditions, albeit with oversight from human cybersecurity professionals. Its goal is to help users assess how well AI, and mixed human–AI teams might defend infrastructure. Vulnerabilities in AI models add
The post HTB AI Range offers experiments in cyber-resilience training appeared first on AI News.

 The cybersecurity training provider Hack The Box (HTB) has launched the HTB AI Range, designed to let organisations test autonomous AI security agents under realistic conditions, albeit with oversight from human cybersecurity professionals. Its goal is to help users assess how well AI, and mixed human–AI teams might defend infrastructure. Vulnerabilities in AI models add
The post HTB AI Range offers experiments in cyber-resilience training appeared first on AI News. Read More  

News
AI News & Insights Featured Image

How confessions can keep language models honest OpenAI News

How confessions can keep language models honestOpenAI News OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.

 OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs. Read More  

News
AI News & Insights Featured Image

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful? AI updates on arXiv.org

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?cs.AI updates on arXiv.org arXiv:2512.02261v1 Announce Type: new
Abstract: LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.

 arXiv:2512.02261v1 Announce Type: new
Abstract: LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap. Read More  

News
AI News & Insights Featured Image

DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses AI updates on arXiv.org

DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responsescs.AI updates on arXiv.org arXiv:2512.02282v1 Announce Type: new
Abstract: Large language models (LLMs) now mediate many web-based mental- health, crisis, and other emotionally sensitive services, yet their psychosocial safety in these settings remains poorly understood and weakly evaluated. We present DialogGuard, a multi-agent frame- work for assessing psychosocial risks in LLM-generated responses along five high-severity dimensions: privacy violations, discrimi- natory behaviour, mental manipulation, psychological harm, and insulting behaviour. DialogGuard can be applied to diverse gen- erative models through four LLM-as-a-judge pipelines, including single-agent scoring, dual-agent correction, multi-agent debate, and stochastic majority voting, grounded in a shared three-level rubric usable by both human annotators and LLM judges. Using PKU-SafeRLHF with human safety annotations, we show that multi- agent mechanisms detect psychosocial risks more accurately than non-LLM baselines and single-agent judging; dual-agent correction and majority voting provide the best trade-off between accuracy, alignment with human ratings, and robustness, while debate attains higher recall but over-flags borderline cases. We release Dialog- Guard as open-source software with a web interface that provides per-dimension risk scores and explainable natural-language ratio- nales. A formative study with 12 practitioners illustrates how it supports prompt design, auditing, and supervision of web-facing applications for vulnerable users.

 arXiv:2512.02282v1 Announce Type: new
Abstract: Large language models (LLMs) now mediate many web-based mental- health, crisis, and other emotionally sensitive services, yet their psychosocial safety in these settings remains poorly understood and weakly evaluated. We present DialogGuard, a multi-agent frame- work for assessing psychosocial risks in LLM-generated responses along five high-severity dimensions: privacy violations, discrimi- natory behaviour, mental manipulation, psychological harm, and insulting behaviour. DialogGuard can be applied to diverse gen- erative models through four LLM-as-a-judge pipelines, including single-agent scoring, dual-agent correction, multi-agent debate, and stochastic majority voting, grounded in a shared three-level rubric usable by both human annotators and LLM judges. Using PKU-SafeRLHF with human safety annotations, we show that multi- agent mechanisms detect psychosocial risks more accurately than non-LLM baselines and single-agent judging; dual-agent correction and majority voting provide the best trade-off between accuracy, alignment with human ratings, and robustness, while debate attains higher recall but over-flags borderline cases. We release Dialog- Guard as open-source software with a web interface that provides per-dimension risk scores and explainable natural-language ratio- nales. A formative study with 12 practitioners illustrates how it supports prompt design, auditing, and supervision of web-facing applications for vulnerable users. Read More  

News
AI News & Insights Featured Image

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision AI updates on arXiv.org

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervisioncs.AI updates on arXiv.org arXiv:2512.02339v1 Announce Type: cross
Abstract: Distinguishing visually similar objects by their motion remains a critical challenge in computer vision. Although supervised trackers show promise, contemporary self-supervised trackers struggle when visual cues become ambiguous, limiting their scalability and generalization without extensive labeled data. We find that pre-trained video diffusion models inherently learn motion representations suitable for tracking without task-specific training. This ability arises because their denoising process isolates motion in early, high-noise stages, distinct from later appearance refinement. Capitalizing on this discovery, our self-supervised tracker significantly improves performance in distinguishing visually similar objects, an underexplored failure point for existing methods. Our method achieves up to a 6-point improvement over recent self-supervised approaches on established benchmarks and our newly introduced tests focused on tracking visually similar items. Visualizations confirm that these diffusion-derived motion representations enable robust tracking of even identical objects across challenging viewpoint changes and deformations.

 arXiv:2512.02339v1 Announce Type: cross
Abstract: Distinguishing visually similar objects by their motion remains a critical challenge in computer vision. Although supervised trackers show promise, contemporary self-supervised trackers struggle when visual cues become ambiguous, limiting their scalability and generalization without extensive labeled data. We find that pre-trained video diffusion models inherently learn motion representations suitable for tracking without task-specific training. This ability arises because their denoising process isolates motion in early, high-noise stages, distinct from later appearance refinement. Capitalizing on this discovery, our self-supervised tracker significantly improves performance in distinguishing visually similar objects, an underexplored failure point for existing methods. Our method achieves up to a 6-point improvement over recent self-supervised approaches on established benchmarks and our newly introduced tests focused on tracking visually similar items. Visualizations confirm that these diffusion-derived motion representations enable robust tracking of even identical objects across challenging viewpoint changes and deformations. Read More  

News
AI News & Insights Featured Image

Model Recovery at the Edge under Resource Constraints for Physical AIcs.AI updates on arXiv.org

Model Recovery at the Edge under Resource Constraints for Physical AIcs.AI updates on arXiv.org arXiv:2512.02283v1 Announce Type: new
Abstract: Model Recovery (MR) enables safe, explainable decision making in mission-critical autonomous systems (MCAS) by learning governing dynamical equations, but its deployment on edge devices is hindered by the iterative nature of neural ordinary differential equations (NODEs), which are inefficient on FPGAs. Memory and energy consumption are the main concerns when applying MR on edge devices for real-time operation. We propose MERINDA, a novel FPGA-accelerated MR framework that replaces iterative solvers with a parallelizable neural architecture equivalent to NODEs. MERINDA achieves nearly 11x lower DRAM usage and 2.2x faster runtime compared to mobile GPUs. Experiments reveal an inverse relationship between memory and energy at fixed accuracy, highlighting MERINDA’s suitability for resource-constrained, real-time MCAS.

 arXiv:2512.02283v1 Announce Type: new
Abstract: Model Recovery (MR) enables safe, explainable decision making in mission-critical autonomous systems (MCAS) by learning governing dynamical equations, but its deployment on edge devices is hindered by the iterative nature of neural ordinary differential equations (NODEs), which are inefficient on FPGAs. Memory and energy consumption are the main concerns when applying MR on edge devices for real-time operation. We propose MERINDA, a novel FPGA-accelerated MR framework that replaces iterative solvers with a parallelizable neural architecture equivalent to NODEs. MERINDA achieves nearly 11x lower DRAM usage and 2.2x faster runtime compared to mobile GPUs. Experiments reveal an inverse relationship between memory and energy at fixed accuracy, highlighting MERINDA’s suitability for resource-constrained, real-time MCAS. Read More  

News
AI News & Insights Featured Image

GRAFT: GRaPH and Table Reasoning for Textual Alignment — A Benchmark for Structured Instruction Following and Visual Reasoning AI updates on arXiv.org

GRAFT: GRaPH and Table Reasoning for Textual Alignment — A Benchmark for Structured Instruction Following and Visual Reasoningcs.AI updates on arXiv.org arXiv:2508.15690v4 Announce Type: replace
Abstract: GRAFT is a structured multimodal benchmark designed to probe how well LLMs handle instruction following, visual reasoning, and tasks requiring tight visual textual alignment. The dataset is built around programmatically generated charts and synthetically rendered tables, each paired with a carefully constructed, multi step analytical question that depends solely on what can be inferred from the image itself. Responses are formatted in structured outputs such as JSON or YAML, enabling consistent and fine grained evaluation of both reasoning processes and adherence to output specifications. The benchmark further introduces a taxonomy of reasoning operations ranging from comparison and trend identification to ranking, aggregation, proportional estimation, and anomaly detection to support a comprehensive assessment of model capabilities. Taken together, GRAFT provides a unified and scalable framework for evaluating multimodal LLMs on visually grounded, structured reasoning tasks, offering a more rigorous standard for future benchmarking efforts.

 arXiv:2508.15690v4 Announce Type: replace
Abstract: GRAFT is a structured multimodal benchmark designed to probe how well LLMs handle instruction following, visual reasoning, and tasks requiring tight visual textual alignment. The dataset is built around programmatically generated charts and synthetically rendered tables, each paired with a carefully constructed, multi step analytical question that depends solely on what can be inferred from the image itself. Responses are formatted in structured outputs such as JSON or YAML, enabling consistent and fine grained evaluation of both reasoning processes and adherence to output specifications. The benchmark further introduces a taxonomy of reasoning operations ranging from comparison and trend identification to ranking, aggregation, proportional estimation, and anomaly detection to support a comprehensive assessment of model capabilities. Taken together, GRAFT provides a unified and scalable framework for evaluating multimodal LLMs on visually grounded, structured reasoning tasks, offering a more rigorous standard for future benchmarking efforts. Read More