Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

You Only Need 3 Things to Turn AI Experiments into AI Advantage Towards Data Science

You Only Need 3 Things to Turn AI Experiments into AI AdvantageTowards Data Scienceon September 15, 2025 at 6:14 pm Trapped in a purgatory of POCs enterprises need to focus and build just 3 pillars to realize value from AI
The post You Only Need 3 Things to Turn AI Experiments into AI Advantage appeared first on Towards Data Science.

 Trapped in a purgatory of POCs enterprises need to focus and build just 3 pillars to realize value from AI
The post You Only Need 3 Things to Turn AI Experiments into AI Advantage appeared first on Towards Data Science. Read More 

News
AI News & Insights Featured Image

HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasetscs.AI updates on arXiv.org

HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasetscs.AI updates on arXiv.orgon September 15, 2025 at 4:00 am arXiv:2509.09740v1 Announce Type: cross
Abstract: Large-scale single-cell and Perturb-seq investigations routinely involve clustering cells and subsequently annotating each cluster with Gene-Ontology (GO) terms to elucidate the underlying biological programs. However, both stages, resolution selection and functional annotation, are inherently subjective, relying on heuristics and expert curation. We present HYPOGENEAGENT, a large language model (LLM)-driven framework, transforming cluster annotation into a quantitatively optimizable task. Initially, an LLM functioning as a gene-set analyst analyzes the content of each gene program or perturbation module and generates a ranked list of GO-based hypotheses, accompanied by calibrated confidence scores. Subsequently, we embed every predicted description with a sentence-embedding model, compute pair-wise cosine similarities, and let the agent referee panel score (i) the internal consistency of the predictions, high average similarity within the same cluster, termed intra-cluster agreement (ii) their external distinctiveness, low similarity between clusters, termed inter-cluster separation. These two quantities are combined to produce an agent-derived resolution score, which is maximized when clusters exhibit simultaneous coherence and mutual exclusivity. When applied to a public K562 CRISPRi Perturb-seq dataset as a preliminary test, our Resolution Score selects clustering granularities that exhibit alignment with known pathway compared to classical metrics such silhouette score, modularity score for gene functional enrichment summary. These findings establish LLM agents as objective adjudicators of cluster resolution and functional annotation, thereby paving the way for fully automated, context-aware interpretation pipelines in single-cell multi-omics studies.

 arXiv:2509.09740v1 Announce Type: cross
Abstract: Large-scale single-cell and Perturb-seq investigations routinely involve clustering cells and subsequently annotating each cluster with Gene-Ontology (GO) terms to elucidate the underlying biological programs. However, both stages, resolution selection and functional annotation, are inherently subjective, relying on heuristics and expert curation. We present HYPOGENEAGENT, a large language model (LLM)-driven framework, transforming cluster annotation into a quantitatively optimizable task. Initially, an LLM functioning as a gene-set analyst analyzes the content of each gene program or perturbation module and generates a ranked list of GO-based hypotheses, accompanied by calibrated confidence scores. Subsequently, we embed every predicted description with a sentence-embedding model, compute pair-wise cosine similarities, and let the agent referee panel score (i) the internal consistency of the predictions, high average similarity within the same cluster, termed intra-cluster agreement (ii) their external distinctiveness, low similarity between clusters, termed inter-cluster separation. These two quantities are combined to produce an agent-derived resolution score, which is maximized when clusters exhibit simultaneous coherence and mutual exclusivity. When applied to a public K562 CRISPRi Perturb-seq dataset as a preliminary test, our Resolution Score selects clustering granularities that exhibit alignment with known pathway compared to classical metrics such silhouette score, modularity score for gene functional enrichment summary. These findings establish LLM agents as objective adjudicators of cluster resolution and functional annotation, thereby paving the way for fully automated, context-aware interpretation pipelines in single-cell multi-omics studies. Read More 

News
AI News & Insights Featured Image

ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)cs.AI updates on arXiv.org

ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)cs.AI updates on arXiv.orgon September 15, 2025 at 4:00 am arXiv:2509.09787v1 Announce Type: cross
Abstract: Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs) by offloading most layers to a central server while keeping in- and output layers on the client-side. This setup enables SL to leverage server computation capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data. However, the distributed nature enables malicious clients to manipulate the training process. By sending poisoned intermediate gradients, they can inject backdoors into the shared DNN. Existing defenses are limited by often focusing on server-side protection and introducing additional overhead for the server. A significant challenge for client-side defenses is enforcing malicious clients to correctly execute the defense algorithm.
We present ZORRO, a private, verifiable, and robust SL defense scheme. Through our novel design and application of interactive zero-knowledge proofs (ZKPs), clients prove their correct execution of a client-located defense algorithm, resulting in proofs of computational integrity attesting to the benign nature of locally trained DNN portions. Leveraging the frequency representation of model partitions enables ZORRO to conduct an in-depth inspection of the locally trained models in an untrusted environment, ensuring that each client forwards a benign checkpoint to its succeeding client. In our extensive evaluation, covering different model architectures as well as various attack strategies and data scenarios, we show ZORRO’s effectiveness, as it reduces the attack success rate to less than 6% while causing even for models storing numprint{1000000} parameters on the client-side an overhead of less than 10 seconds.

 arXiv:2509.09787v1 Announce Type: cross
Abstract: Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs) by offloading most layers to a central server while keeping in- and output layers on the client-side. This setup enables SL to leverage server computation capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data. However, the distributed nature enables malicious clients to manipulate the training process. By sending poisoned intermediate gradients, they can inject backdoors into the shared DNN. Existing defenses are limited by often focusing on server-side protection and introducing additional overhead for the server. A significant challenge for client-side defenses is enforcing malicious clients to correctly execute the defense algorithm.
We present ZORRO, a private, verifiable, and robust SL defense scheme. Through our novel design and application of interactive zero-knowledge proofs (ZKPs), clients prove their correct execution of a client-located defense algorithm, resulting in proofs of computational integrity attesting to the benign nature of locally trained DNN portions. Leveraging the frequency representation of model partitions enables ZORRO to conduct an in-depth inspection of the locally trained models in an untrusted environment, ensuring that each client forwards a benign checkpoint to its succeeding client. In our extensive evaluation, covering different model architectures as well as various attack strategies and data scenarios, we show ZORRO’s effectiveness, as it reduces the attack success rate to less than 6% while causing even for models storing numprint{1000000} parameters on the client-side an overhead of less than 10 seconds. Read More 

News
AI News & Insights Featured Image

How to Become a Machine Learning Engineer (Step-by-Step)Towards Data Science

How to Become a Machine Learning Engineer (Step-by-Step)Towards Data Scienceon September 15, 2025 at 12:00 pm Your one-stop guide to becoming a machine learning engineer
The post How to Become a Machine Learning Engineer (Step-by-Step) appeared first on Towards Data Science.

 Your one-stop guide to becoming a machine learning engineer
The post How to Become a Machine Learning Engineer (Step-by-Step) appeared first on Towards Data Science. Read More 

News
AI News & Insights Featured Image

MultimodalHugs: Enabling Sign Language Processing in Hugging Facecs.AI updates on arXiv.org

MultimodalHugs: Enabling Sign Language Processing in Hugging Facecs.AI updates on arXiv.orgon September 15, 2025 at 4:00 am arXiv:2509.09729v1 Announce Type: cross
Abstract: In recent years, sign language processing (SLP) has gained importance in the general field of Natural Language Processing. However, compared to research on spoken languages, SLP research is hindered by complex ad-hoc code, inadvertently leading to low reproducibility and unfair comparisons. Existing tools that are built for fast and reproducible experimentation, such as Hugging Face, are not flexible enough to seamlessly integrate sign language experiments. This view is confirmed by a survey we conducted among SLP researchers.
To address these challenges, we introduce MultimodalHugs, a framework built on top of Hugging Face that enables more diverse data modalities and tasks, while inheriting the well-known advantages of the Hugging Face ecosystem. Even though sign languages are our primary focus, MultimodalHugs adds a layer of abstraction that makes it more widely applicable to other use cases that do not fit one of the standard templates of Hugging Face. We provide quantitative experiments to illustrate how MultimodalHugs can accommodate diverse modalities such as pose estimation data for sign languages, or pixel data for text characters.

 arXiv:2509.09729v1 Announce Type: cross
Abstract: In recent years, sign language processing (SLP) has gained importance in the general field of Natural Language Processing. However, compared to research on spoken languages, SLP research is hindered by complex ad-hoc code, inadvertently leading to low reproducibility and unfair comparisons. Existing tools that are built for fast and reproducible experimentation, such as Hugging Face, are not flexible enough to seamlessly integrate sign language experiments. This view is confirmed by a survey we conducted among SLP researchers.
To address these challenges, we introduce MultimodalHugs, a framework built on top of Hugging Face that enables more diverse data modalities and tasks, while inheriting the well-known advantages of the Hugging Face ecosystem. Even though sign languages are our primary focus, MultimodalHugs adds a layer of abstraction that makes it more widely applicable to other use cases that do not fit one of the standard templates of Hugging Face. We provide quantitative experiments to illustrate how MultimodalHugs can accommodate diverse modalities such as pose estimation data for sign languages, or pixel data for text characters. Read More 

News
AI News & Insights Featured Image

Meta-Learning Reinforcement Learning for Crypto-Return Predictioncs.AI updates on arXiv.org

Meta-Learning Reinforcement Learning for Crypto-Return Predictioncs.AI updates on arXiv.orgon September 15, 2025 at 4:00 am arXiv:2509.09751v1 Announce Type: cross
Abstract: Predicting cryptocurrency returns is notoriously difficult: price movements are driven by a fast-shifting blend of on-chain activity, news flow, and social sentiment, while labeled training data are scarce and expensive. In this paper, we present Meta-RL-Crypto, a unified transformer-based architecture that unifies meta-learning and reinforcement learning (RL) to create a fully self-improving trading agent. Starting from a vanilla instruction-tuned LLM, the agent iteratively alternates between three roles-actor, judge, and meta-judge-in a closed-loop architecture. This learning process requires no additional human supervision. It can leverage multimodal market inputs and internal preference feedback. The agent in the system continuously refines both the trading policy and evaluation criteria. Experiments across diverse market regimes demonstrate that Meta-RL-Crypto shows good performance on the technical indicators of the real market and outperforming other LLM-based baselines.

 arXiv:2509.09751v1 Announce Type: cross
Abstract: Predicting cryptocurrency returns is notoriously difficult: price movements are driven by a fast-shifting blend of on-chain activity, news flow, and social sentiment, while labeled training data are scarce and expensive. In this paper, we present Meta-RL-Crypto, a unified transformer-based architecture that unifies meta-learning and reinforcement learning (RL) to create a fully self-improving trading agent. Starting from a vanilla instruction-tuned LLM, the agent iteratively alternates between three roles-actor, judge, and meta-judge-in a closed-loop architecture. This learning process requires no additional human supervision. It can leverage multimodal market inputs and internal preference feedback. The agent in the system continuously refines both the trading policy and evaluation criteria. Experiments across diverse market regimes demonstrate that Meta-RL-Crypto shows good performance on the technical indicators of the real market and outperforming other LLM-based baselines. Read More 

News
AI News & Insights Featured Image

The Rise of Semantic Entity Resolution Towards Data Science

The Rise of Semantic Entity ResolutionTowards Data Scienceon September 14, 2025 at 4:00 pm Semantic entity resolution uses language models to bring an increased level of automation to schema alignment, blocking (grouping records into smaller, efficient blocks for all-pairs comparison at quadratic, n² complexity), matching and even merging duplicate nodes and edges. In the past, entity resolution systems relied on statistical tricks such as string distance, static rules or complex ETL to schema align, block, match and merge records. Semantic entity resolution uses representation learning to gain a deeper understanding of records’ meaning in the domain of a business to automate the same process as part of a knowledge graph factory.
The post The Rise of Semantic Entity Resolution appeared first on Towards Data Science.

 Semantic entity resolution uses language models to bring an increased level of automation to schema alignment, blocking (grouping records into smaller, efficient blocks for all-pairs comparison at quadratic, n² complexity), matching and even merging duplicate nodes and edges. In the past, entity resolution systems relied on statistical tricks such as string distance, static rules or complex ETL to schema align, block, match and merge records. Semantic entity resolution uses representation learning to gain a deeper understanding of records’ meaning in the domain of a business to automate the same process as part of a knowledge graph factory.
The post The Rise of Semantic Entity Resolution appeared first on Towards Data Science. Read More 

News
AI News & Insights Featured Image

Building Research Agents for Tech InsightsTowards Data Science

Building Research Agents for Tech InsightsTowards Data Scienceon September 13, 2025 at 2:30 pm Using a controlled workflow, unique data & prompt chaining
The post Building Research Agents for Tech Insights appeared first on Towards Data Science.

 Using a controlled workflow, unique data & prompt chaining
The post Building Research Agents for Tech Insights appeared first on Towards Data Science. Read More 

News
AI News & Insights Featured Image

No Peeking Ahead: Time-Aware Graph Fraud DetectionTowards Data Science

No Peeking Ahead: Time-Aware Graph Fraud DetectionTowards Data Scienceon September 14, 2025 at 2:30 pm How to implement leak-free graph fraud detection
The post No Peeking Ahead: Time-Aware Graph Fraud Detection appeared first on Towards Data Science.

 How to implement leak-free graph fraud detection
The post No Peeking Ahead: Time-Aware Graph Fraud Detection appeared first on Towards Data Science. Read More 

News
Discovering physical laws with parallel symbolic enumerationcs.AI updates on arXiv.org

Discovering physical laws with parallel symbolic enumerationcs.AI updates on arXiv.org

Discovering physical laws with parallel symbolic enumerationcs.AI updates on arXiv.orgon September 12, 2025 at 4:00 am arXiv:2407.04405v4 Announce Type: replace-cross
Abstract: Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (e.g., improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws), and improves the scalability of symbolic learning.

 arXiv:2407.04405v4 Announce Type: replace-cross
Abstract: Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (e.g., improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws), and improves the scalability of symbolic learning. Read More