Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

News
AI News & Insights Featured Image

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics AI updates on arXiv.org

Estimating LLM Consistency: A User Baseline vs Surrogate Metricscs.AI updates on arXiv.org arXiv:2505.23799v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) are prone to hallucinations and sensitive to prompt perturbations, often resulting in inconsistent or unreliable generated text. Different methods have been proposed to mitigate such hallucinations and fragility, one of which is to measure the consistency of LLM responses — the model’s confidence in the response or likelihood of generating a similar response when resampled. In previous work, measuring LLM response consistency often relied on calculating the probability of a response appearing within a pool of resampled responses, analyzing internal states, or evaluating logits of responses. However, it was not clear how well these approaches approximated users’ perceptions of consistency of LLM responses. To find out, we performed a user study ($n=2,976$) demonstrating that current methods for measuring LLM response consistency typically do not align well with humans’ perceptions of LLM consistency. We propose a logit-based ensemble method for estimating LLM consistency and show that our method matches the performance of the best-performing existing metric in estimating human ratings of LLM consistency. Our results suggest that methods for estimating LLM consistency without human evaluation are sufficiently imperfect to warrant broader use of evaluation with human input; this would avoid misjudging the adequacy of models because of the imperfections of automated consistency metrics.

 arXiv:2505.23799v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) are prone to hallucinations and sensitive to prompt perturbations, often resulting in inconsistent or unreliable generated text. Different methods have been proposed to mitigate such hallucinations and fragility, one of which is to measure the consistency of LLM responses — the model’s confidence in the response or likelihood of generating a similar response when resampled. In previous work, measuring LLM response consistency often relied on calculating the probability of a response appearing within a pool of resampled responses, analyzing internal states, or evaluating logits of responses. However, it was not clear how well these approaches approximated users’ perceptions of consistency of LLM responses. To find out, we performed a user study ($n=2,976$) demonstrating that current methods for measuring LLM response consistency typically do not align well with humans’ perceptions of LLM consistency. We propose a logit-based ensemble method for estimating LLM consistency and show that our method matches the performance of the best-performing existing metric in estimating human ratings of LLM consistency. Our results suggest that methods for estimating LLM consistency without human evaluation are sufficiently imperfect to warrant broader use of evaluation with human input; this would avoid misjudging the adequacy of models because of the imperfections of automated consistency metrics. Read More  

News
New Microsoft cloud updates support Indonesia’s long-term AI goals AI News

New Microsoft cloud updates support Indonesia’s long-term AI goals AI News

New Microsoft cloud updates support Indonesia’s long-term AI goalsAI News Indonesia’s push into AI-led growth is gaining momentum as more local organisations look for ways to build their own applications, update their systems, and strengthen data oversight. The country now has broader access to cloud and AI tools after Microsoft expanded the services available in the Indonesia Central cloud region, which first went live six
The post New Microsoft cloud updates support Indonesia’s long-term AI goals appeared first on AI News.

 Indonesia’s push into AI-led growth is gaining momentum as more local organisations look for ways to build their own applications, update their systems, and strengthen data oversight. The country now has broader access to cloud and AI tools after Microsoft expanded the services available in the Indonesia Central cloud region, which first went live six
The post New Microsoft cloud updates support Indonesia’s long-term AI goals appeared first on AI News. Read More  

News
Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines MarkTechPost

Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines MarkTechPost

Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image PipelinesMarkTechPost Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows such as marketing assets, product photography, design layouts, and complex infographics, with editing support up to 4 megapixels and strong control over layout, logos, and typography. FLUX.2 product family and FLUX.2 [dev] The FLUX.2 family
The post Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines appeared first on MarkTechPost.

 Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows such as marketing assets, product photography, design layouts, and complex infographics, with editing support up to 4 megapixels and strong control over layout, logos, and typography. FLUX.2 product family and FLUX.2 [dev] The FLUX.2 family
The post Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines appeared first on MarkTechPost. Read More  

Security News
million dollar soc ikcIMJ

When Your $2M Security Detection Fails: Can your SOC Save You? The Hacker Newsinfo@thehackernews.com (The Hacker News)

Enterprises today are expected to have at least 6-8 detection tools, as detection is considered a standard investment and the first line of defense. Yet security leaders struggle to justify dedicating resources further down the alert lifecycle to their superiors. As a result, most organizations’ security investments are asymmetrical, robust detection tools paired with an […]

Security News
deals m6gWIv

FBI Reports $262M in ATO Fraud as Researchers Cite Growing AI Phishing and Holiday Scams The Hacker Newsinfo@thehackernews.com (The Hacker News)

The U.S. Federal Bureau of Investigation (FBI) has warned that cybercriminals are impersonating financial institutions with an aim to steal money or sensitive information to facilitate account takeover (ATO) fraud schemes. The activity targets individuals, businesses, and organizations of varied sizes and across sectors, the agency said, adding the fraudulent schemes have led to more […]