Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Daily AI News
AI News & Insights Featured Image

FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models AI updates on arXiv.org

FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Modelscs.AI updates on arXiv.org arXiv:2509.20624v2 Announce Type: replace-cross
Abstract: Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128 times faster sampling and corresponding latency/throughput gains.

 arXiv:2509.20624v2 Announce Type: replace-cross
Abstract: Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128 times faster sampling and corresponding latency/throughput gains. Read More  

Daily AI News
AI News & Insights Featured Image

PixelArena: A benchmark for Pixel-Precision Visual Intelligence AI updates on arXiv.org

PixelArena: A benchmark for Pixel-Precision Visual Intelligencecs.AI updates on arXiv.org arXiv:2512.16303v2 Announce Type: replace-cross
Abstract: Omni-modal models that have multimodal input and output are emerging. However, benchmarking their multimodal generation, especially in image generation, is challenging due to the subtleties of human preferences and model biases. Many image generation benchmarks focus on aesthetics instead of the fine-grained generation capabilities of these models, failing to evaluate their visual intelligence with objective metrics. In PixelArena, we propose using semantic segmentation tasks to objectively examine their fine-grained generative intelligence with pixel precision. With our benchmark and experiments, we find the latest Gemini 3 Pro Image has emergent image generation capabilities that generate semantic masks with high fidelity under zero-shot settings, showcasing visual intelligence unseen before and true generalization in new image generation tasks. We further investigate its results, compare them qualitatively and quantitatively with those of other models, and present failure cases. The findings not only signal exciting progress in the field but also provide insights into future research related to dataset development, omni-modal model development, and the design of metrics.

 arXiv:2512.16303v2 Announce Type: replace-cross
Abstract: Omni-modal models that have multimodal input and output are emerging. However, benchmarking their multimodal generation, especially in image generation, is challenging due to the subtleties of human preferences and model biases. Many image generation benchmarks focus on aesthetics instead of the fine-grained generation capabilities of these models, failing to evaluate their visual intelligence with objective metrics. In PixelArena, we propose using semantic segmentation tasks to objectively examine their fine-grained generative intelligence with pixel precision. With our benchmark and experiments, we find the latest Gemini 3 Pro Image has emergent image generation capabilities that generate semantic masks with high fidelity under zero-shot settings, showcasing visual intelligence unseen before and true generalization in new image generation tasks. We further investigate its results, compare them qualitatively and quantitatively with those of other models, and present failure cases. The findings not only signal exciting progress in the field but also provide insights into future research related to dataset development, omni-modal model development, and the design of metrics. Read More  

Daily AI News
AI News & Insights Featured Image

Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features AI updates on arXiv.org

Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Featurescs.AI updates on arXiv.org arXiv:2502.17749v3 Announce Type: replace
Abstract: Recent progress in large language models (LLMs) for code generation has raised serious concerns about intellectual property protection. Malicious users can exploit LLMs to produce paraphrased versions of proprietary code that closely resemble the original. While the potential for LLM-assisted code paraphrasing continues to grow, research on detecting it remains limited, underscoring an urgent need for detection system. We respond to this need by proposing two tasks. The first task is to detect whether code generated by an LLM is a paraphrased version of original human-written code. The second task is to identify which LLM is used to paraphrase the original code. For these tasks, we construct a dataset LPcode consisting of pairs of human-written code and LLM-paraphrased code using various LLMs.
We statistically confirm significant differences in the coding styles of human-written and LLM-paraphrased code, particularly in terms of naming consistency, code structure, and readability. Based on these findings, we develop LPcodedec, a detection method that identifies paraphrase relationships between human-written and LLM-generated code, and discover which LLM is used for the paraphrasing. LPcodedec outperforms the best baselines in two tasks, improving F1 scores by 2.64% and 15.17% while achieving speedups of 1,343x and 213x, respectively. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features.

 arXiv:2502.17749v3 Announce Type: replace
Abstract: Recent progress in large language models (LLMs) for code generation has raised serious concerns about intellectual property protection. Malicious users can exploit LLMs to produce paraphrased versions of proprietary code that closely resemble the original. While the potential for LLM-assisted code paraphrasing continues to grow, research on detecting it remains limited, underscoring an urgent need for detection system. We respond to this need by proposing two tasks. The first task is to detect whether code generated by an LLM is a paraphrased version of original human-written code. The second task is to identify which LLM is used to paraphrase the original code. For these tasks, we construct a dataset LPcode consisting of pairs of human-written code and LLM-paraphrased code using various LLMs.
We statistically confirm significant differences in the coding styles of human-written and LLM-paraphrased code, particularly in terms of naming consistency, code structure, and readability. Based on these findings, we develop LPcodedec, a detection method that identifies paraphrase relationships between human-written and LLM-generated code, and discover which LLM is used for the paraphrasing. LPcodedec outperforms the best baselines in two tasks, improving F1 scores by 2.64% and 15.17% while achieving speedups of 1,343x and 213x, respectively. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features. Read More  

Daily AI News
AI News & Insights Featured Image

Differential syntactic and semantic encoding in LLMs AI updates on arXiv.org

Differential syntactic and semantic encoding in LLMscs.AI updates on arXiv.org arXiv:2601.04765v2 Announce Type: replace-cross
Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and semantic information contained in the representations. In particular, subtracting these syntactic and semantic “centroids” from sentence vectors strongly affects their similarity with syntactically and semantically matched sentences, respectively, suggesting that syntax and semantics are, at least partially, linearly encoded. We also find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled, suggesting differential encoding of these two types of linguistic information in LLM representations.

 arXiv:2601.04765v2 Announce Type: replace-cross
Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of sentences sharing syntactic structure or meaning, we obtain vectors that capture a significant proportion of the syntactic and semantic information contained in the representations. In particular, subtracting these syntactic and semantic “centroids” from sentence vectors strongly affects their similarity with syntactically and semantically matched sentences, respectively, suggesting that syntax and semantics are, at least partially, linearly encoded. We also find that the cross-layer encoding profiles of syntax and semantics are different, and that the two signals can to some extent be decoupled, suggesting differential encoding of these two types of linguistic information in LLM representations. Read More  

Daily AI News
AI News & Insights Featured Image

Optimizing Data Transfer in Batched AI/ML Inference Workloads Towards Data Science

Optimizing Data Transfer in Batched AI/ML Inference WorkloadsTowards Data Science A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems – part 2
The post Optimizing Data Transfer in Batched AI/ML Inference Workloads appeared first on Towards Data Science.

 A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems – part 2
The post Optimizing Data Transfer in Batched AI/ML Inference Workloads appeared first on Towards Data Science. Read More  

Daily AI News
Retailers like Kroger and Lowe’s test AI agents without handing control to Google AI News

Retailers like Kroger and Lowe’s test AI agents without handing control to Google AI News

Retailers like Kroger and Lowe’s test AI agents without handing control to GoogleAI News Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News.

 Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled. That concern is pushing some large chains to build or support their
The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News. Read More  

Daily AI News
The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk AI News

The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk AI News

The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance riskAI News Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News.

 Meta’s US$2 billion acquisition of AI agent startup Manus has become every enterprise CTO’s cross-border compliance risk lesson. China’s Ministry of Commerce announced on January 9 that it would assess whether the deal violated export controls, technology transfer rules, and overseas investment regulations, despite Manus relocating from Beijing to Singapore in 2025. The investigation exposes
The post The Meta-Manus review: What enterprise AI buyers need to know about cross-border compliance risk appeared first on AI News. Read More  

Daily AI News
AI News & Insights Featured Image

Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents cs.AI updates on arXiv.org

Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agentscs.AI updates on arXiv.org arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids.

 arXiv:2509.01022v2 Announce Type: replace
Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80×80 grids. Read More  

Daily AI News
AI News & Insights Featured Image

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models AI updates on arXiv.org

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Modelscs.AI updates on arXiv.org arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs.

 arXiv:2511.09993v2 Announce Type: replace
Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across six calendars. To enable time-variant and contamination-free evaluation, we propose a template-driven protocol for dynamic instance generation that enables assessment on a user-specified Gregorian date. We conduct extensive experiments on both open- and closed-source state-of-the-art (SOTA) LLMs over a range of dates spanning 100 years from 1960 to 2060. Our evaluations show that these LLMs achieve an average accuracy of only 34.5%, with none exceeding 80%, indicating that this task remains challenging. Through in-depth analysis of reasoning types, question formats, and temporal reasoning directions, we identify two key obstacles for LLMs: Future-Date Degradation and Calendar Asymmetry Bias. To strengthen LLMs’ cross-calendar temporal reasoning capability, we further develop an LLM-powered Time Agent that leverages tool-augmented code generation. Empirical results show that Time Agent achieves an average accuracy of 95.31%, outperforming several competitive baselines, highlighting the potential of tool-augmented code generation to advance cross-calendar temporal reasoning. We hope this work will inspire further efforts toward more temporally and culturally adaptive LLMs. Read More