Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input MarkTechPost

Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input MarkTechPost

Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision InputMarkTechPost Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the Kimi K2 Thinking reasoning model, OpenAI compatible sampling, and image input through Qwen3-VL vision language models. For AI engineers, this turns Tinker into a practical way to fine tune frontier models without building distributed training
The post Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input appeared first on MarkTechPost.

 Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the Kimi K2 Thinking reasoning model, OpenAI compatible sampling, and image input through Qwen3-VL vision language models. For AI engineers, this turns Tinker into a practical way to fine tune frontier models without building distributed training
The post Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input appeared first on MarkTechPost. Read More  

News
AI News & Insights Featured Image

Standards-Compliant DM-RS Allocation via Temporal Channel Prediction for Massive MIMO Systems AI updates on arXiv.org

Standards-Compliant DM-RS Allocation via Temporal Channel Prediction for Massive MIMO Systemscs.AI updates on arXiv.org arXiv:2507.11064v2 Announce Type: replace-cross
Abstract: Reducing feedback overhead in beyond 5G networks is a critical challenge, as the growing number of antennas in modern massive MIMO systems substantially increases the channel state information (CSI) feedback demand in frequency division duplex (FDD) systems. To address this, extensive research has focused on CSI compression and prediction, with neural network-based approaches gaining momentum and being considered for integration into the 3GPP 5G-Advanced standards. While deep learning has been effectively applied to CSI-limited beamforming and handover optimization, reference signal allocation under such constraints remains surprisingly underexplored. To fill this gap, we introduce the concept of channel prediction-based reference signal allocation (CPRS), which jointly optimizes channel prediction and DM-RS allocation to improve data throughput without requiring CSI feedback. We further propose a standards-compliant ViViT/CNN-based architecture that implements CPRS by treating evolving CSI matrices as sequential image-like data, enabling efficient and adaptive transmission in dynamic environments. Simulation results using ray-tracing channel data generated in NVIDIA Sionna validate the proposed method, showing up to 36.60% throughput improvement over benchmark strategies.

 arXiv:2507.11064v2 Announce Type: replace-cross
Abstract: Reducing feedback overhead in beyond 5G networks is a critical challenge, as the growing number of antennas in modern massive MIMO systems substantially increases the channel state information (CSI) feedback demand in frequency division duplex (FDD) systems. To address this, extensive research has focused on CSI compression and prediction, with neural network-based approaches gaining momentum and being considered for integration into the 3GPP 5G-Advanced standards. While deep learning has been effectively applied to CSI-limited beamforming and handover optimization, reference signal allocation under such constraints remains surprisingly underexplored. To fill this gap, we introduce the concept of channel prediction-based reference signal allocation (CPRS), which jointly optimizes channel prediction and DM-RS allocation to improve data throughput without requiring CSI feedback. We further propose a standards-compliant ViViT/CNN-based architecture that implements CPRS by treating evolving CSI matrices as sequential image-like data, enabling efficient and adaptive transmission in dynamic environments. Simulation results using ray-tracing channel data generated in NVIDIA Sionna validate the proposed method, showing up to 36.60% throughput improvement over benchmark strategies. Read More  

News
AI News & Insights Featured Image

Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance AI updates on arXiv.org

Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerancecs.AI updates on arXiv.org arXiv:2511.10400v2 Announce Type: replace-cross
Abstract: Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7% fault rate). Notably, our approach surpasses traditional methods by attaining remarkable accuracy on various topologies and maintaining strong reliability in both mathematical reasoning and safety assessment tasks.

 arXiv:2511.10400v2 Announce Type: replace-cross
Abstract: Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7% fault rate). Notably, our approach surpasses traditional methods by attaining remarkable accuracy on various topologies and maintaining strong reliability in both mathematical reasoning and safety assessment tasks. Read More  

News
AI News & Insights Featured Image

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models AI updates on arXiv.org

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Modelscs.AI updates on arXiv.org arXiv:2509.22284v3 Announce Type: replace
Abstract: Modern state-space models (SSMs) often utilize transition matrices which enable efficient computation but pose restrictions on the model’s expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs. Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$). Consequently, the computational cost of parallel scans scales linearly with the state size. Theoretically, the model is BIBO-stable and can emulate any $N$-state FSA with one layer of dimension $N$ and a linear readout of size $N times N$, significantly improving on all current structured SSM guarantees. Experimentally, the model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks. On multiclass time-series classification, the performance is comparable to that of neural controlled differential equations, a paradigm explicitly built for time-series analysis. Finally, we integrate PD-SSM into a hybrid Transformer-SSM architecture and demonstrate that the model can effectively track the states of a complex FSA in which transitions are encoded as a set of variable-length English sentences. The code is available at https://github.com/IBM/expressive-sparse-state-space-model

 arXiv:2509.22284v3 Announce Type: replace
Abstract: Modern state-space models (SSMs) often utilize transition matrices which enable efficient computation but pose restrictions on the model’s expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs. Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$). Consequently, the computational cost of parallel scans scales linearly with the state size. Theoretically, the model is BIBO-stable and can emulate any $N$-state FSA with one layer of dimension $N$ and a linear readout of size $N times N$, significantly improving on all current structured SSM guarantees. Experimentally, the model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks. On multiclass time-series classification, the performance is comparable to that of neural controlled differential equations, a paradigm explicitly built for time-series analysis. Finally, we integrate PD-SSM into a hybrid Transformer-SSM architecture and demonstrate that the model can effectively track the states of a complex FSA in which transitions are encoded as a set of variable-length English sentences. The code is available at https://github.com/IBM/expressive-sparse-state-space-model Read More  

News
AI News & Insights Featured Image

Online Multi-modal Root Cause Identification in Microservice Systems AI updates on arXiv.org

Online Multi-modal Root Cause Identification in Microservice Systemscs.AI updates on arXiv.org arXiv:2410.10021v2 Announce Type: replace-cross
Abstract: Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Traditional data-driven RCA methods are typically limited to offline applications due to high computational demands, and existing online RCA methods handle only single-modal data, overlooking complex interactions in multi-modal systems. In this paper, we introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization. OCEAN employs a dilated convolutional neural network to capture long-term temporal dependencies and graph neural networks to learn causal relationships among system entities and key performance indicators. We further design a multi-factor attention mechanism to analyze and reassess the relationships among different metrics and log indicators/attributes for enhanced online causal graph learning. Additionally, a contrastive mutual information maximization-based graph fusion module is developed to effectively model the relationships across various modalities. Extensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of our proposed method.

 arXiv:2410.10021v2 Announce Type: replace-cross
Abstract: Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Traditional data-driven RCA methods are typically limited to offline applications due to high computational demands, and existing online RCA methods handle only single-modal data, overlooking complex interactions in multi-modal systems. In this paper, we introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization. OCEAN employs a dilated convolutional neural network to capture long-term temporal dependencies and graph neural networks to learn causal relationships among system entities and key performance indicators. We further design a multi-factor attention mechanism to analyze and reassess the relationships among different metrics and log indicators/attributes for enhanced online causal graph learning. Additionally, a contrastive mutual information maximization-based graph fusion module is developed to effectively model the relationships across various modalities. Extensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of our proposed method. Read More  

News
AI News & Insights Featured Image

ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making AI updates on arXiv.org

ValuePilot: A Two-Phase Framework for Value-Driven Decision-Makingcs.AI updates on arXiv.org arXiv:2512.13716v1 Announce Type: new
Abstract: Personalized decision-making is essential for human-AI interaction, enabling AI agents to act in alignment with individual users’ value preferences. As AI systems expand into real-world applications, adapting to personalized values beyond task completion or collective alignment has become a critical challenge. We address this by proposing a value-driven approach to personalized decision-making. Human values serve as stable, transferable signals that support consistent and generalizable behavior across contexts. Compared to task-oriented paradigms driven by external rewards and incentives, value-driven decision-making enhances interpretability and enables agents to act appropriately even in novel scenarios. We introduce ValuePilot, a two-phase framework consisting of a dataset generation toolkit (DGT) and a decision-making module (DMM). DGT constructs diverse, value-annotated scenarios from a human-LLM collaborative pipeline. DMM learns to evaluate actions based on personal value preferences, enabling context-sensitive, individualized decisions. When evaluated on previously unseen scenarios, DMM outperforms strong LLM baselines, including GPT-5, Claude-Sonnet-4, Gemini-2-flash, and Llama-3.1-70b, in aligning with human action choices. Our results demonstrate that value-driven decision-making is an effective and extensible engineering pathway toward building interpretable, personalized AI agents.

 arXiv:2512.13716v1 Announce Type: new
Abstract: Personalized decision-making is essential for human-AI interaction, enabling AI agents to act in alignment with individual users’ value preferences. As AI systems expand into real-world applications, adapting to personalized values beyond task completion or collective alignment has become a critical challenge. We address this by proposing a value-driven approach to personalized decision-making. Human values serve as stable, transferable signals that support consistent and generalizable behavior across contexts. Compared to task-oriented paradigms driven by external rewards and incentives, value-driven decision-making enhances interpretability and enables agents to act appropriately even in novel scenarios. We introduce ValuePilot, a two-phase framework consisting of a dataset generation toolkit (DGT) and a decision-making module (DMM). DGT constructs diverse, value-annotated scenarios from a human-LLM collaborative pipeline. DMM learns to evaluate actions based on personal value preferences, enabling context-sensitive, individualized decisions. When evaluated on previously unseen scenarios, DMM outperforms strong LLM baselines, including GPT-5, Claude-Sonnet-4, Gemini-2-flash, and Llama-3.1-70b, in aligning with human action choices. Our results demonstrate that value-driven decision-making is an effective and extensible engineering pathway toward building interpretable, personalized AI agents. Read More  

News
AI News & Insights Featured Image

3 Techniques to Effectively Utilize AI Agents for Coding Towards Data Science

3 Techniques to Effectively Utilize AI Agents for CodingTowards Data Science Learn how to be an effective engineer with coding agents
The post 3 Techniques to Effectively Utilize AI Agents for Coding appeared first on Towards Data Science.

 Learn how to be an effective engineer with coding agents
The post 3 Techniques to Effectively Utilize AI Agents for Coding appeared first on Towards Data Science. Read More  

News
AI News & Insights Featured Image

Roblox brings AI into the Studio to speed up game creation AI News

Roblox brings AI into the Studio to speed up game creationAI News Roblox is often seen as a games platform, but its day-to-day reality looks closer to a production studio. Small teams release new experiences on a rolling basis and then monetise them at scale. That pace creates two persistent problems: time lost to repeatable production work, and friction when moving outputs between tools. Roblox’s 2025 updates
The post Roblox brings AI into the Studio to speed up game creation appeared first on AI News.

 Roblox is often seen as a games platform, but its day-to-day reality looks closer to a production studio. Small teams release new experiences on a rolling basis and then monetise them at scale. That pace creates two persistent problems: time lost to repeatable production work, and friction when moving outputs between tools. Roblox’s 2025 updates
The post Roblox brings AI into the Studio to speed up game creation appeared first on AI News. Read More  

News
AI News & Insights Featured Image

Assessing Greenspace Attractiveness with ChatGPT, Claude, and Gemini: Do AI Models Reflect Human Perceptions? AI updates on arXiv.org

Assessing Greenspace Attractiveness with ChatGPT, Claude, and Gemini: Do AI Models Reflect Human Perceptions?cs.AI updates on arXiv.org arXiv:2512.11827v2 Announce Type: replace-cross
Abstract: Understanding greenspace attractiveness is essential for designing livable and inclusive urban environments, yet existing assessment approaches often overlook informal or transient spaces and remain too resource intensive to capture subjective perceptions at scale. This study examines the ability of multimodal large language models (MLLMs), ChatGPT GPT-4o, Claude 3.5 Haiku, and Gemini 2.0 Flash, to assess greenspace attractiveness similarly to humans using Google Street View imagery. We compared model outputs with responses from a geo-questionnaire of residents in Lodz, Poland, across both formal (for example, parks and managed greenspaces) and informal (for example, meadows and wastelands) greenspaces. Survey respondents and models indicated whether each greenspace was attractive or unattractive and provided up to three free text explanations. Analyses examined how often their attractiveness judgments aligned and compared their explanations after classifying them into shared reasoning categories. Results show high AI human agreement for attractive formal greenspaces and unattractive informal spaces, but low alignment for attractive informal and unattractive formal greenspaces. Models consistently emphasized aesthetic and design oriented features, underrepresenting safety, functional infrastructure, and locally embedded qualities valued by survey respondents. While these findings highlight the potential for scalable pre-assessment, they also underscore the need for human oversight and complementary participatory approaches. We conclude that MLLMs can support, but not replace, context sensitive greenspace evaluation in planning practice.

 arXiv:2512.11827v2 Announce Type: replace-cross
Abstract: Understanding greenspace attractiveness is essential for designing livable and inclusive urban environments, yet existing assessment approaches often overlook informal or transient spaces and remain too resource intensive to capture subjective perceptions at scale. This study examines the ability of multimodal large language models (MLLMs), ChatGPT GPT-4o, Claude 3.5 Haiku, and Gemini 2.0 Flash, to assess greenspace attractiveness similarly to humans using Google Street View imagery. We compared model outputs with responses from a geo-questionnaire of residents in Lodz, Poland, across both formal (for example, parks and managed greenspaces) and informal (for example, meadows and wastelands) greenspaces. Survey respondents and models indicated whether each greenspace was attractive or unattractive and provided up to three free text explanations. Analyses examined how often their attractiveness judgments aligned and compared their explanations after classifying them into shared reasoning categories. Results show high AI human agreement for attractive formal greenspaces and unattractive informal spaces, but low alignment for attractive informal and unattractive formal greenspaces. Models consistently emphasized aesthetic and design oriented features, underrepresenting safety, functional infrastructure, and locally embedded qualities valued by survey respondents. While these findings highlight the potential for scalable pre-assessment, they also underscore the need for human oversight and complementary participatory approaches. We conclude that MLLMs can support, but not replace, context sensitive greenspace evaluation in planning practice. Read More  

News
AI News & Insights Featured Image

Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks AI updates on arXiv.org

Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarkscs.AI updates on arXiv.org arXiv:2505.05638v2 Announce Type: replace-cross
Abstract: Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan.

 arXiv:2505.05638v2 Announce Type: replace-cross
Abstract: Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan. Read More