Tech Jacks Solutions - Tech Jacks Solutions

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

Preference Learning with Lie Detectors can Induce Honesty or Evasion AI updates on arXiv.org

Preference Learning with Lie Detectors can Induce Honesty or Evasioncs.AI updates on arXiv.org arXiv:2505.13787v2 Announce Type: replace-cross
Abstract: As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment.

arXiv:2505.13787v2 Announce Type: replace-cross
Abstract: As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment. Read More

LEARN MORE 6

News

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation AI updates on arXiv.org

SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentationcs.AI updates on arXiv.org arXiv:2508.03411v3 Announce Type: replace-cross
Abstract: Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model, SlotContrast, with our distilled student. The results show that our student based on SlotMatch matches and even outperforms its teacher, while using 3.6x less parameters and running up to 2.7x faster. Moreover, our student surpasses all other state-of-the-art unsupervised video segmentation models.

arXiv:2508.03411v3 Announce Type: replace-cross
Abstract: Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model, SlotContrast, with our distilled student. The results show that our student based on SlotMatch matches and even outperforms its teacher, while using 3.6x less parameters and running up to 2.7x faster. Moreover, our student surpasses all other state-of-the-art unsupervised video segmentation models. Read More

LEARN MORE 5

News

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration AI updates on arXiv.org

Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaborationcs.AI updates on arXiv.org arXiv:2510.16194v2 Announce Type: replace
Abstract: Protected health information (PHI) de-identification is critical for enabling the safe reuse of clinical notes, yet evaluating and comparing PHI de-identification models typically depends on costly, small-scale expert annotations. We present TEAM-PHI, a multi-agent evaluation and selection framework that uses large language models (LLMs) to automatically measure de-identification quality and select the best-performing model without heavy reliance on gold labels. TEAM-PHI deploys multiple Evaluation Agents, each independently judging the correctness of PHI extractions and outputting structured metrics. Their results are then consolidated through an LLM-based majority voting mechanism that integrates diverse evaluator perspectives into a single, stable, and reproducible ranking. Experiments on a real-world clinical note corpus demonstrate that TEAM-PHI produces consistent and accurate rankings: despite variation across individual evaluators, LLM-based voting reliably converges on the same top-performing systems. Further comparison with ground-truth annotations and human evaluation confirms that the framework’s automated rankings closely match supervised evaluation. By combining independent evaluation agents with LLM majority voting, TEAM-PHI offers a practical, secure, and cost-effective solution for automatic evaluation and best-model selection in PHI de-identification, even when ground-truth labels are limited.

arXiv:2510.16194v2 Announce Type: replace
Abstract: Protected health information (PHI) de-identification is critical for enabling the safe reuse of clinical notes, yet evaluating and comparing PHI de-identification models typically depends on costly, small-scale expert annotations. We present TEAM-PHI, a multi-agent evaluation and selection framework that uses large language models (LLMs) to automatically measure de-identification quality and select the best-performing model without heavy reliance on gold labels. TEAM-PHI deploys multiple Evaluation Agents, each independently judging the correctness of PHI extractions and outputting structured metrics. Their results are then consolidated through an LLM-based majority voting mechanism that integrates diverse evaluator perspectives into a single, stable, and reproducible ranking. Experiments on a real-world clinical note corpus demonstrate that TEAM-PHI produces consistent and accurate rankings: despite variation across individual evaluators, LLM-based voting reliably converges on the same top-performing systems. Further comparison with ground-truth annotations and human evaluation confirms that the framework’s automated rankings closely match supervised evaluation. By combining independent evaluation agents with LLM majority voting, TEAM-PHI offers a practical, secure, and cost-effective solution for automatic evaluation and best-model selection in PHI de-identification, even when ground-truth labels are limited. Read More

LEARN MORE 6

Security News

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

ISC Stormcast For Thursday, November 20th, 2025 https://isc.sans.edu/podcastdetail/9708, (Thu, Nov 20th) SANS Internet Storm Center, InfoCON: green

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License. Read More

LEARN MORE 6

Security News

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

OpenAI says its latest GPT-5.1 Codex can code independently for hours BleepingComputerMayank Parmar

OpenAI has started rolling out GPT 5.1-Codex-Max on Codex with a better performance on coding tasks. […] Read More

LEARN MORE 8

Security News

_ November 20, 2025_ Tech Jacks Solutions_ 0 Comments

Multi-threat Android malware Sturnus steals Signal, WhatsApp messages BleepingComputerBill Toulas

A new Android banking trojan named Sturnus can capture communication from end-to-end encrypted messaging platforms like Signal, WhatsApp, and Telegram, as well as take complete control of the device. […] Read More

LEARN MORE 7

Security News

_ November 19, 2025_ Tech Jacks Solutions_ 0 Comments

Unicode: It is more than funny domain names., (Wed, Nov 12th)SANS Internet Storm Center, InfoCON: green

When people discuss the security implications of Unicode, International Domain Names (IDNs) are often highlighted as a risk. However, while visible and often talked about, IDNs are probably not what you should really worry about when it comes to Unicode. There are several issues that impact application security beyond confusing domain names. At first sight, […]

LEARN MORE 9

Security News

Agent Smith United Archives GmbH Alamy hMgn0q

_ November 19, 2025_ Tech Jacks Solutions_ 0 Comments

The AI Attack Surface: How Agents Raise the Cyber Stakes darkreadingAlexander Culafi

Researcher shows how agentic AI is vulnerable to hijacking to subvert an agent’s goals and how agent interaction can be altered to compromise whole networks. Read More

LEARN MORE 11

Security News

_ November 19, 2025_ Tech Jacks Solutions_ 0 Comments

Google Search is now using AI to create interactive UI to answer your questions BleepingComputerMayank Parmar

In a move that could redefine the web, Google is testing AI-powered, UI-based answers for its AI mode. […] Read More

LEARN MORE 11

Security News

_ November 19, 2025_ Tech Jacks Solutions_ 0 Comments

Russian bulletproof hosting provider sanctioned over ransomware ties BleepingComputerSergiu Gatlan

Today, the United States, the United Kingdom, and Australia announced sanctions targeting Russian bulletproof hosting (BPH) providers that have supported ransomware gangs and other cybercrime operations. […] Read More

LEARN MORE 10

Gallery

Contacts

Author: Tech Jacks Solutions

Preference Learning with Lie Detectors can Induce Honesty or Evasion AI updates on arXiv.org

SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation AI updates on arXiv.org

Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration AI updates on arXiv.org

ISC Stormcast For Thursday, November 20th, 2025 https://isc.sans.edu/podcastdetail/9708, (Thu, Nov 20th) SANS Internet Storm Center, InfoCON: green

OpenAI says its latest GPT-5.1 Codex can code independently for hours BleepingComputerMayank Parmar

Multi-threat Android malware Sturnus steals Signal, WhatsApp messages BleepingComputerBill Toulas

Unicode: It is more than funny domain names., (Wed, Nov 12th)SANS Internet Storm Center, InfoCON: green

The AI Attack Surface: How Agents Raise the Cyber Stakes darkreadingAlexander Culafi

Google Search is now using AI to create interactive UI to answer your questions BleepingComputerMayank Parmar

Russian bulletproof hosting provider sanctioned over ransomware ties BleepingComputerSergiu Gatlan

Our Address

Our Mailbox

Our Phone