Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation AI updates on arXiv.org

When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentationcs.AI updates on arXiv.org arXiv:2512.17083v2 Announce Type: replace-cross
Abstract: Dialogue topic segmentation supports summarization, retrieval, memory management, and conversational continuity. Despite decades of work, evaluation practice remains dominated by strict boundary matching and F1-based metrics. Modern large language model (LLM) based conversational systems increasingly rely on segmentation to manage conversation history beyond fixed context windows. In such systems, unstructured context accumulation degrades efficiency and coherence.
This paper introduces an evaluation framework that reports boundary density and segment alignment diagnostics (purity and coverage) alongside window-tolerant F1 (W-F1). By separating boundary scoring from boundary selection, we evaluate segmentation quality across density regimes rather than at a single operating point. Cross-dataset evaluation shows that reported performance differences often reflect annotation granularity mismatch rather than boundary placement quality alone.
We evaluate structurally distinct segmentation strategies across eight dialogue datasets spanning task-oriented, open-domain, meeting-style, and synthetic interactions. Boundary-based metrics are strongly coupled to boundary density: threshold sweeps produce larger W-F1 changes than switching between methods. These findings support viewing topic segmentation as a granularity selection problem rather than prediction of a single correct boundary set. This motivates separating boundary scoring from boundary selection for analyzing and tuning segmentation under varying annotation granularities.

 arXiv:2512.17083v2 Announce Type: replace-cross
Abstract: Dialogue topic segmentation supports summarization, retrieval, memory management, and conversational continuity. Despite decades of work, evaluation practice remains dominated by strict boundary matching and F1-based metrics. Modern large language model (LLM) based conversational systems increasingly rely on segmentation to manage conversation history beyond fixed context windows. In such systems, unstructured context accumulation degrades efficiency and coherence.
This paper introduces an evaluation framework that reports boundary density and segment alignment diagnostics (purity and coverage) alongside window-tolerant F1 (W-F1). By separating boundary scoring from boundary selection, we evaluate segmentation quality across density regimes rather than at a single operating point. Cross-dataset evaluation shows that reported performance differences often reflect annotation granularity mismatch rather than boundary placement quality alone.
We evaluate structurally distinct segmentation strategies across eight dialogue datasets spanning task-oriented, open-domain, meeting-style, and synthetic interactions. Boundary-based metrics are strongly coupled to boundary density: threshold sweeps produce larger W-F1 changes than switching between methods. These findings support viewing topic segmentation as a granularity selection problem rather than prediction of a single correct boundary set. This motivates separating boundary scoring from boundary selection for analyzing and tuning segmentation under varying annotation granularities. Read More  

News
AI News & Insights Featured Image

Learning Treatment Policies From Multimodal Electronic Health Records AI updates on arXiv.org

Learning Treatment Policies From Multimodal Electronic Health Recordscs.AI updates on arXiv.org arXiv:2507.20993v2 Announce Type: replace-cross
Abstract: We study how to learn effective treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators assume tabular covariates that satisfy strong causal assumptions, which are typically violated in the multimodal setting. As a result, predictive models of baseline risk are commonly used in practice to guide such decisions, as they extend naturally to multimodal data. However, such risk-based policies are not designed to identify which patients benefit most from treatment. We propose an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, thereby offering practical insights into applying causal machine learning to realistic clinical data.

 arXiv:2507.20993v2 Announce Type: replace-cross
Abstract: We study how to learn effective treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators assume tabular covariates that satisfy strong causal assumptions, which are typically violated in the multimodal setting. As a result, predictive models of baseline risk are commonly used in practice to guide such decisions, as they extend naturally to multimodal data. However, such risk-based policies are not designed to identify which patients benefit most from treatment. We propose an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, thereby offering practical insights into applying causal machine learning to realistic clinical data. Read More  

News
AI News & Insights Featured Image

Zero-Shot Segmentation through Prototype-Guidance for Multi-Label Plant Species Identification AI updates on arXiv.org

Zero-Shot Segmentation through Prototype-Guidance for Multi-Label Plant Species Identificationcs.AI updates on arXiv.org arXiv:2512.19957v1 Announce Type: new
Abstract: This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solution focused on employing class prototypes obtained from the training dataset as a proxy guidance for training a segmentation Vision Transformer (ViT) on the test set images. To obtain these representations, the proposed method extracts features from training dataset images and create clusters, by applying K-Means, with $K$ equals to the number of classes in the dataset. The segmentation model is a customized narrow ViT, built by replacing the patch embedding layer with a frozen DinoV2, pre-trained on the training dataset for individual species classification. This model is trained to reconstruct the class prototypes of the training dataset from the test dataset images. We then use this model to obtain attention scores that enable to identify and localize areas of interest and consequently guide the classification process. The proposed approach enabled a domain-adaptation from multi-class identification with individual species, into multi-label classification from high-resolution vegetation plots. Our method achieved fifth place in the PlantCLEF 2025 challenge on the private leaderboard, with an F1 score of 0.33331. Besides that, in absolute terms our method scored 0.03 lower than the top-performing submission, suggesting that it may achieved competitive performance in the benchmark task. Our code is available at href{https://github.com/ADAM-UEFS/PlantCLEF2025}{https://github.com/ADAM-UEFS/PlantCLEF2025}.

 arXiv:2512.19957v1 Announce Type: new
Abstract: This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solution focused on employing class prototypes obtained from the training dataset as a proxy guidance for training a segmentation Vision Transformer (ViT) on the test set images. To obtain these representations, the proposed method extracts features from training dataset images and create clusters, by applying K-Means, with $K$ equals to the number of classes in the dataset. The segmentation model is a customized narrow ViT, built by replacing the patch embedding layer with a frozen DinoV2, pre-trained on the training dataset for individual species classification. This model is trained to reconstruct the class prototypes of the training dataset from the test dataset images. We then use this model to obtain attention scores that enable to identify and localize areas of interest and consequently guide the classification process. The proposed approach enabled a domain-adaptation from multi-class identification with individual species, into multi-label classification from high-resolution vegetation plots. Our method achieved fifth place in the PlantCLEF 2025 challenge on the private leaderboard, with an F1 score of 0.33331. Besides that, in absolute terms our method scored 0.03 lower than the top-performing submission, suggesting that it may achieved competitive performance in the benchmark task. Our code is available at href{https://github.com/ADAM-UEFS/PlantCLEF2025}{https://github.com/ADAM-UEFS/PlantCLEF2025}. Read More  

News
AI News & Insights Featured Image

FGDCC: Fine-Grained Deep Cluster Categorization — A Framework for Intra-Class Variability Problems in Plant Classification AI updates on arXiv.org

FGDCC: Fine-Grained Deep Cluster Categorization — A Framework for Intra-Class Variability Problems in Plant Classificationcs.AI updates on arXiv.org arXiv:2512.19960v1 Announce Type: new
Abstract: Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variability can hinder the learning process for DL models, specially when such classes are also underrepresented, which is a very common scenario in Fine-Grained Visual Categorization (FGVC) tasks. This paper proposes a novel method that aims at leveraging classification performance in FGVC tasks by learning fine-grained features via classification of class-wise cluster assignments. Our goal is to apply clustering over each class individually, which can allow to discover pseudo-labels that encodes a latent degree of similarity between images. In turn, those labels can be employed in a hierarchical classification process that allows to learn more fine-grained visual features and thereby mitigating intra-class variability issues. Initial experiments over the PlantNet300k enabled to shed light upon several key points in which future work will have to be developed in order to find more conclusive evidence regarding the effectiveness of our method. Our method still achieves state-of-the-art performance on the PlantNet300k dataset even though some of its components haven’t been shown to be fully optimized. Our code is available at href{https://github.com/ADAM-UEFS/FGDCC}{https://github.com/ADAM-UEFS/FGDCC}.

 arXiv:2512.19960v1 Announce Type: new
Abstract: Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variability can hinder the learning process for DL models, specially when such classes are also underrepresented, which is a very common scenario in Fine-Grained Visual Categorization (FGVC) tasks. This paper proposes a novel method that aims at leveraging classification performance in FGVC tasks by learning fine-grained features via classification of class-wise cluster assignments. Our goal is to apply clustering over each class individually, which can allow to discover pseudo-labels that encodes a latent degree of similarity between images. In turn, those labels can be employed in a hierarchical classification process that allows to learn more fine-grained visual features and thereby mitigating intra-class variability issues. Initial experiments over the PlantNet300k enabled to shed light upon several key points in which future work will have to be developed in order to find more conclusive evidence regarding the effectiveness of our method. Our method still achieves state-of-the-art performance on the PlantNet300k dataset even though some of its components haven’t been shown to be fully optimized. Our code is available at href{https://github.com/ADAM-UEFS/FGDCC}{https://github.com/ADAM-UEFS/FGDCC}. Read More  

News
AI News & Insights Featured Image

Keeping Probabilities Honest: The Jacobian Adjustment Towards Data Science

Keeping Probabilities Honest: The Jacobian AdjustmentTowards Data Science An intuitive explanation of transforming random variables correctly.
The post Keeping Probabilities Honest: The Jacobian Adjustment appeared first on Towards Data Science.

 An intuitive explanation of transforming random variables correctly.
The post Keeping Probabilities Honest: The Jacobian Adjustment appeared first on Towards Data Science. Read More  

News
MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured Coding MarkTechPost

MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured Coding MarkTechPost

MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured CodingMarkTechPost Just months after releasing M2—a fast, low-cost model designed for agents and code—MiniMax has introduced an enhanced version: MiniMax M2.1. M2 already stood out for its efficiency, running at roughly 8% of the cost of Claude Sonnet while delivering significantly higher speed. More importantly, it introduced a different computational and reasoning pattern, particularly in how
The post MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured Coding appeared first on MarkTechPost.

 Just months after releasing M2—a fast, low-cost model designed for agents and code—MiniMax has introduced an enhanced version: MiniMax M2.1. M2 already stood out for its efficiency, running at roughly 8% of the cost of Claude Sonnet while delivering significantly higher speed. More importantly, it introduced a different computational and reasoning pattern, particularly in how
The post MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured Coding appeared first on MarkTechPost. Read More  

News
AI News & Insights Featured Image

Why MAP and MRR Fail for Search Ranking (and What to Use Instead) Towards Data Science

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)Towards Data Science MAP and MRR look intuitive, but they quietly break ranking evaluation. Here’s why these metrics mislead—and how better alternatives fix it.
The post Why MAP and MRR Fail for Search Ranking (and What to Use Instead) appeared first on Towards Data Science.

 MAP and MRR look intuitive, but they quietly break ranking evaluation. Here’s why these metrics mislead—and how better alternatives fix it.
The post Why MAP and MRR Fail for Search Ranking (and What to Use Instead) appeared first on Towards Data Science. Read More  

News
AI News & Insights Featured Image

Dialectics for Artificial Intelligence AI updates on arXiv.org

Dialectics for Artificial Intelligencecs.AI updates on arXiv.org arXiv:2512.17373v2 Announce Type: replace
Abstract: Can artificial intelligence discover, from raw experience and without human supervision, concepts that humans have discovered? One challenge is that human concepts themselves are fluid: conceptual boundaries can shift, split, and merge as inquiry progresses (e.g., Pluto is no longer considered a planet). To make progress, we need a definition of “concept” that is not merely a dictionary label, but a structure that can be revised, compared, and aligned across agents. We propose an algorithmic-information viewpoint that treats a concept as an information object defined only through its structural relation to an agent’s total experience. The core constraint is determination: a set of parts forms a reversible consistency relation if any missing part is recoverable from the others (up to the standard logarithmic slack in Kolmogorov-style identities). This reversibility prevents “concepts” from floating free of experience and turns concept existence into a checkable structural claim. To judge whether a decomposition is natural, we define excess information, measuring the redundancy overhead introduced by splitting experience into multiple separately described parts. On top of these definitions, we formulate dialectics as an optimization dynamics: as new patches of information appear (or become contested), competing concepts bid to explain them via shorter conditional descriptions, driving systematic expansion, contraction, splitting, and merging. Finally, we formalize low-cost concept transmission and multi-agent alignment using small grounds/seeds that allow another agent to reconstruct the same concept under a shared protocol, making communication a concrete compute-bits trade-off.

 arXiv:2512.17373v2 Announce Type: replace
Abstract: Can artificial intelligence discover, from raw experience and without human supervision, concepts that humans have discovered? One challenge is that human concepts themselves are fluid: conceptual boundaries can shift, split, and merge as inquiry progresses (e.g., Pluto is no longer considered a planet). To make progress, we need a definition of “concept” that is not merely a dictionary label, but a structure that can be revised, compared, and aligned across agents. We propose an algorithmic-information viewpoint that treats a concept as an information object defined only through its structural relation to an agent’s total experience. The core constraint is determination: a set of parts forms a reversible consistency relation if any missing part is recoverable from the others (up to the standard logarithmic slack in Kolmogorov-style identities). This reversibility prevents “concepts” from floating free of experience and turns concept existence into a checkable structural claim. To judge whether a decomposition is natural, we define excess information, measuring the redundancy overhead introduced by splitting experience into multiple separately described parts. On top of these definitions, we formulate dialectics as an optimization dynamics: as new patches of information appear (or become contested), competing concepts bid to explain them via shorter conditional descriptions, driving systematic expansion, contraction, splitting, and merging. Finally, we formalize low-cost concept transmission and multi-agent alignment using small grounds/seeds that allow another agent to reconstruct the same concept under a shared protocol, making communication a concrete compute-bits trade-off. Read More  

News
AI News & Insights Featured Image

CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support AI updates on arXiv.org

CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Supportcs.AI updates on arXiv.org arXiv:2508.13256v2 Announce Type: replace
Abstract: Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems.

 arXiv:2508.13256v2 Announce Type: replace
Abstract: Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems. Read More  

News
AI News & Insights Featured Image

Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs AI updates on arXiv.org

Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMscs.AI updates on arXiv.org arXiv:2512.20573v1 Announce Type: cross
Abstract: Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressive (AR) verifiers. Our core insight is that dLLM’s speed from parallel decoding drastically lowers the risk of costly rejections, providing a practical mechanism to effectively realize the (elusive) lengthy drafts that lead to large speedups with speculative decoding. We present FailFast, a dLLM-based speculative decoding framework that realizes this approach by dynamically adapting its speculation length. It “fails fast” by spending minimal compute in hard-to-speculate regions to shrink speculation latency and “wins big” by aggressively extending draft lengths in easier regions to reduce verification latency (in many cases, speculating and accepting 70 tokens at a time!). Without any fine-tuning, FailFast delivers lossless acceleration of AR LLMs and achieves up to 4.9$times$ speedup over vanilla decoding, 1.7$times$ over the best naive dLLM drafter, and 1.4$times$ over EAGLE-3 across diverse models and workloads. We open-source FailFast at https://github.com/ruipeterpan/failfast.

 arXiv:2512.20573v1 Announce Type: cross
Abstract: Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressive (AR) verifiers. Our core insight is that dLLM’s speed from parallel decoding drastically lowers the risk of costly rejections, providing a practical mechanism to effectively realize the (elusive) lengthy drafts that lead to large speedups with speculative decoding. We present FailFast, a dLLM-based speculative decoding framework that realizes this approach by dynamically adapting its speculation length. It “fails fast” by spending minimal compute in hard-to-speculate regions to shrink speculation latency and “wins big” by aggressively extending draft lengths in easier regions to reduce verification latency (in many cases, speculating and accepting 70 tokens at a time!). Without any fine-tuning, FailFast delivers lossless acceleration of AR LLMs and achieves up to 4.9$times$ speedup over vanilla decoding, 1.7$times$ over the best naive dLLM drafter, and 1.4$times$ over EAGLE-3 across diverse models and workloads. We open-source FailFast at https://github.com/ruipeterpan/failfast. Read More