Tech Jacks Solutions - Tech Jacks Solutions

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction AI updates on arXiv.org

Building the Web for Agents: A Declarative Framework for Agent-Web Interactioncs.AI updates on arXiv.org arXiv:2511.11287v1 Announce Type: cross
Abstract: The increasing deployment of autonomous AI agents on the web is hampered by a fundamental misalignment: agents must infer affordances from human-oriented user interfaces, leading to brittle, inefficient, and insecure interactions. To address this, we introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents through simple, declarative HTML elements. VOIX introduces and tags, allowing developers to explicitly define available actions and relevant state, thereby creating a clear, machine-readable contract for agent behavior. This approach shifts control to the website developer while preserving user privacy by disconnecting the conversational interactions from the website. We evaluated the framework’s practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers. The results demonstrate that participants, regardless of prior experience, were able to rapidly build diverse and functional agent-enabled web applications. Ultimately, this work provides a foundational mechanism for realizing the Agentic Web, enabling a future of seamless and secure human-AI collaboration on the web.

arXiv:2511.11287v1 Announce Type: cross
Abstract: The increasing deployment of autonomous AI agents on the web is hampered by a fundamental misalignment: agents must infer affordances from human-oriented user interfaces, leading to brittle, inefficient, and insecure interactions. To address this, we introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents through simple, declarative HTML elements. VOIX introduces and tags, allowing developers to explicitly define available actions and relevant state, thereby creating a clear, machine-readable contract for agent behavior. This approach shifts control to the website developer while preserving user privacy by disconnecting the conversational interactions from the website. We evaluated the framework’s practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers. The results demonstrate that participants, regardless of prior experience, were able to rapidly build diverse and functional agent-enabled web applications. Ultimately, this work provides a foundational mechanism for realizing the Agentic Web, enabling a future of seamless and secure human-AI collaboration on the web. Read More

LEARN MORE 8

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types AI updates on arXiv.org

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Typescs.AI updates on arXiv.org arXiv:2511.11030v1 Announce Type: cross
Abstract: Artificial intelligence is revealing what medicine never intended to encode. Deep vision models, trained on chest X-rays, can now detect not only disease but also invisible traces of social inequality. In this study, we show that state-of-the-art architectures (DenseNet121, SwinV2-B, MedMamba) can predict a patient’s health insurance type, a strong proxy for socioeconomic status, from normal chest X-rays with significant accuracy (AUC around 0.67 on MIMIC-CXR-JPG, 0.68 on CheXpert). The signal persists even when age, race, and sex are controlled for, and remains detectable when the model is trained exclusively on a single racial group. Patch-based occlusion reveals that the signal is diffuse rather than localized, embedded in the upper and mid-thoracic regions. This suggests that deep networks may be internalizing subtle traces of clinical environments, equipment differences, or care pathways; learning socioeconomic segregation itself. These findings challenge the assumption that medical images are neutral biological data. By uncovering how models perceive and exploit these hidden social signatures, this work reframes fairness in medical AI: the goal is no longer only to balance datasets or adjust thresholds, but to interrogate and disentangle the social fingerprints embedded in clinical data itself.

arXiv:2511.11030v1 Announce Type: cross
Abstract: Artificial intelligence is revealing what medicine never intended to encode. Deep vision models, trained on chest X-rays, can now detect not only disease but also invisible traces of social inequality. In this study, we show that state-of-the-art architectures (DenseNet121, SwinV2-B, MedMamba) can predict a patient’s health insurance type, a strong proxy for socioeconomic status, from normal chest X-rays with significant accuracy (AUC around 0.67 on MIMIC-CXR-JPG, 0.68 on CheXpert). The signal persists even when age, race, and sex are controlled for, and remains detectable when the model is trained exclusively on a single racial group. Patch-based occlusion reveals that the signal is diffuse rather than localized, embedded in the upper and mid-thoracic regions. This suggests that deep networks may be internalizing subtle traces of clinical environments, equipment differences, or care pathways; learning socioeconomic segregation itself. These findings challenge the assumption that medical images are neutral biological data. By uncovering how models perceive and exploit these hidden social signatures, this work reframes fairness in medical AI: the goal is no longer only to balance datasets or adjust thresholds, but to interrogate and disentangle the social fingerprints embedded in clinical data itself. Read More

LEARN MORE 6

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D AI updates on arXiv.org

Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.Dcs.AI updates on arXiv.org arXiv:2511.10912v1 Announce Type: cross
Abstract: Large language models (LLMs) have demonstrated capabilities across diverse domains, yet their performance on rare disease diagnosis from narrative medical cases remains underexplored. We introduce a novel dataset of 176 symptom-diagnosis pairs extracted from House M.D., a medical television series validated for teaching rare disease recognition in medical education. We evaluate four state-of-the-art LLMs such as GPT 4o mini, GPT 5 mini, Gemini 2.5 Flash, and Gemini 2.5 Pro on narrative-based diagnostic reasoning tasks. Results show significant variation in performance, ranging from 16.48% to 38.64% accuracy, with newer model generations demonstrating a 2.3 times improvement. While all models face substantial challenges with rare disease diagnosis, the observed improvement across architectures suggests promising directions for future development. Our educationally validated benchmark establishes baseline performance metrics for narrative medical reasoning and provides a publicly accessible evaluation framework for advancing AI-assisted diagnosis research.

arXiv:2511.10912v1 Announce Type: cross
Abstract: Large language models (LLMs) have demonstrated capabilities across diverse domains, yet their performance on rare disease diagnosis from narrative medical cases remains underexplored. We introduce a novel dataset of 176 symptom-diagnosis pairs extracted from House M.D., a medical television series validated for teaching rare disease recognition in medical education. We evaluate four state-of-the-art LLMs such as GPT 4o mini, GPT 5 mini, Gemini 2.5 Flash, and Gemini 2.5 Pro on narrative-based diagnostic reasoning tasks. Results show significant variation in performance, ranging from 16.48% to 38.64% accuracy, with newer model generations demonstrating a 2.3 times improvement. While all models face substantial challenges with rare disease diagnosis, the observed improvement across architectures suggests promising directions for future development. Our educationally validated benchmark establishes baseline performance metrics for narrative medical reasoning and provides a publicly accessible evaluation framework for advancing AI-assisted diagnosis research. Read More

LEARN MORE 5

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains AI updates on arXiv.org

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domainscs.AI updates on arXiv.org arXiv:2511.10984v1 Announce Type: cross
Abstract: The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus on segment-level accuracy and fluency. To address this limitation, we introduce DiscoX, a new benchmark for discourse-level and expert-level Chinese-English translation. It comprises 200 professionally-curated texts from 7 domains, with an average length exceeding 1700 tokens. To evaluate performance on DiscoX, we also develop Metric-S, a reference-free system that provides fine-grained automatic assessments across accuracy, fluency, and appropriateness. Metric-S demonstrates strong consistency with human judgments, significantly outperforming existing metrics. Our experiments reveal a remarkable performance gap: even the most advanced LLMs still trail human experts on these tasks. This finding validates the difficulty of DiscoX and underscores the challenges that remain in achieving professional-grade machine translation. The proposed benchmark and evaluation system provide a robust framework for more rigorous evaluation, facilitating future advancements in LLM-based translation.

arXiv:2511.10984v1 Announce Type: cross
Abstract: The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus on segment-level accuracy and fluency. To address this limitation, we introduce DiscoX, a new benchmark for discourse-level and expert-level Chinese-English translation. It comprises 200 professionally-curated texts from 7 domains, with an average length exceeding 1700 tokens. To evaluate performance on DiscoX, we also develop Metric-S, a reference-free system that provides fine-grained automatic assessments across accuracy, fluency, and appropriateness. Metric-S demonstrates strong consistency with human judgments, significantly outperforming existing metrics. Our experiments reveal a remarkable performance gap: even the most advanced LLMs still trail human experts on these tasks. This finding validates the difficulty of DiscoX and underscores the challenges that remain in achieving professional-grade machine translation. The proposed benchmark and evaluation system provide a robust framework for more rigorous evaluation, facilitating future advancements in LLM-based translation. Read More

LEARN MORE 5

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

Potential Outcome Rankings for Counterfactual Decision Making AI updates on arXiv.org

Potential Outcome Rankings for Counterfactual Decision Makingcs.AI updates on arXiv.org arXiv:2511.10776v1 Announce Type: new
Abstract: Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potential outcomes (or their corresponding utility and desirability) to compare the preferences of candidate actions. In this paper, we study new counterfactual decision-making rules by introducing two new metrics: the probabilities of potential outcome ranking (PoR) and the probability of achieving the best potential outcome (PoB). PoR reveals the most probable ranking of potential outcomes for an individual, and PoB indicates the action most likely to yield the top-ranked outcome for an individual. We then establish identification theorems and derive bounds for these metrics, and present estimation methods. Finally, we perform numerical experiments to illustrate the finite-sample properties of the estimators and demonstrate their application to a real-world dataset.

arXiv:2511.10776v1 Announce Type: new
Abstract: Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potential outcomes (or their corresponding utility and desirability) to compare the preferences of candidate actions. In this paper, we study new counterfactual decision-making rules by introducing two new metrics: the probabilities of potential outcome ranking (PoR) and the probability of achieving the best potential outcome (PoB). PoR reveals the most probable ranking of potential outcomes for an individual, and PoB indicates the action most likely to yield the top-ranked outcome for an individual. We then establish identification theorems and derive bounds for these metrics, and present estimation methods. Finally, we perform numerical experiments to illustrate the finite-sample properties of the estimators and demonstrate their application to a real-world dataset. Read More

LEARN MORE 7

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labelingcs.AI updates on arXiv.org

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labelingcs.AI updates on arXiv.org arXiv:2511.10866v1 Announce Type: cross
Abstract: This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25%), confirming its strong generalization and real-time applicability in intelligent surveillance systems.

arXiv:2511.10866v1 Announce Type: cross
Abstract: This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25%), confirming its strong generalization and real-time applicability in intelligent surveillance systems. Read More

LEARN MORE 9

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models AI updates on arXiv.org

From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Modelscs.AI updates on arXiv.org arXiv:2511.10788v1 Announce Type: new
Abstract: Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view overlooks a fundamental challenge: current LLMs apply uniform reasoning strategies regardless of task complexity, generating long traces for trivial problems while failing to extend reasoning for difficult tasks. This survey reframes reasoning through the lens of {adaptivity}: the capability to allocate reasoning effort based on input characteristics such as difficulty and uncertainty. We make three contributions. First, we formalize deductive, inductive, and abductive reasoning within the LLM context, connecting these classical cognitive paradigms with their algorithmic realizations. Second, we formalize adaptive reasoning as a control-augmented policy optimization problem balancing task performance with computational cost, distinguishing learned policies from inference-time control mechanisms. Third, we propose a systematic taxonomy organizing existing methods into training-based approaches that internalize adaptivity through reinforcement learning, supervised fine-tuning, and learned controllers, and training-free approaches that achieve adaptivity through prompt conditioning, feedback-driven halting, and modular composition. This framework clarifies how different mechanisms realize adaptive reasoning in practice and enables systematic comparison across diverse strategies. We conclude by identifying open challenges in self-evaluation, meta-reasoning, and human-aligned reasoning control.

arXiv:2511.10788v1 Announce Type: new
Abstract: Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view overlooks a fundamental challenge: current LLMs apply uniform reasoning strategies regardless of task complexity, generating long traces for trivial problems while failing to extend reasoning for difficult tasks. This survey reframes reasoning through the lens of {adaptivity}: the capability to allocate reasoning effort based on input characteristics such as difficulty and uncertainty. We make three contributions. First, we formalize deductive, inductive, and abductive reasoning within the LLM context, connecting these classical cognitive paradigms with their algorithmic realizations. Second, we formalize adaptive reasoning as a control-augmented policy optimization problem balancing task performance with computational cost, distinguishing learned policies from inference-time control mechanisms. Third, we propose a systematic taxonomy organizing existing methods into training-based approaches that internalize adaptivity through reinforcement learning, supervised fine-tuning, and learned controllers, and training-free approaches that achieve adaptivity through prompt conditioning, feedback-driven halting, and modular composition. This framework clarifies how different mechanisms realize adaptive reasoning in practice and enables systematic comparison across diverse strategies. We conclude by identifying open challenges in self-evaluation, meta-reasoning, and human-aligned reasoning control. Read More

LEARN MORE 5

News

_ November 17, 2025_ Tech Jacks Solutions_ 0 Comments

HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environments AI updates on arXiv.org

HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environmentscs.AI updates on arXiv.org arXiv:2511.10810v1 Announce Type: new
Abstract: Operational safety at mission-critical work sites is a top priority given the complex and hazardous nature of daily tasks. This paper presents the Human-Agent Risk Navigation and Event Safety System (HARNESS), a modular AI framework designed to forecast hazardous events and analyze operational risks in U.S. Department of Energy (DOE) environments. HARNESS integrates Large Language Models (LLMs) with structured work data, historical event retrieval, and risk analysis to proactively identify potential hazards. A human-in-the-loop mechanism allows subject matter experts (SMEs) to refine predictions, creating an adaptive learning loop that enhances performance over time. By combining SME collaboration with iterative agentic reasoning, HARNESS improves the reliability and efficiency of predictive safety systems. Preliminary deployment shows promising results, with future work focusing on quantitative evaluation of accuracy, SME agreement, and decision latency reduction.

arXiv:2511.10810v1 Announce Type: new
Abstract: Operational safety at mission-critical work sites is a top priority given the complex and hazardous nature of daily tasks. This paper presents the Human-Agent Risk Navigation and Event Safety System (HARNESS), a modular AI framework designed to forecast hazardous events and analyze operational risks in U.S. Department of Energy (DOE) environments. HARNESS integrates Large Language Models (LLMs) with structured work data, historical event retrieval, and risk analysis to proactively identify potential hazards. A human-in-the-loop mechanism allows subject matter experts (SMEs) to refine predictions, creating an adaptive learning loop that enhances performance over time. By combining SME collaboration with iterative agentic reasoning, HARNESS improves the reliability and efficiency of predictive safety systems. Preliminary deployment shows promising results, with future work focusing on quantitative evaluation of accuracy, SME agreement, and decision latency reduction. Read More

LEARN MORE 6

News

_ November 16, 2025_ Tech Jacks Solutions_ 0 Comments

Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI) Towards Data Science

Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI)Towards Data Science Let’s make conscious and deliberate choices when we use AI.
The post Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI) appeared first on Towards Data Science.

Let’s make conscious and deliberate choices when we use AI.
The post Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI) appeared first on Towards Data Science. Read More

LEARN MORE 7

News

_ November 16, 2025_ Tech Jacks Solutions_ 0 Comments

Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs Comet MarkTechPost

Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs CometMarkTechPost Agentic AI browsers are moving the model from ‘answering about the web’ to operating on the web. In 2025, four AI browsers define this space: OpenAI’s ChatGPT Atlas, Microsoft Edge with Copilot Mode, The Browser Company’s Dia, and Perplexity’s Comet. Each makes different design choices around autonomy, memory, and privacy. This article compares their architectures,
The post Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs Comet appeared first on MarkTechPost.

Agentic AI browsers are moving the model from ‘answering about the web’ to operating on the web. In 2025, four AI browsers define this space: OpenAI’s ChatGPT Atlas, Microsoft Edge with Copilot Mode, The Browser Company’s Dia, and Perplexity’s Comet. Each makes different design choices around autonomy, memory, and privacy. This article compares their architectures,
The post Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs Comet appeared first on MarkTechPost. Read More

LEARN MORE 10

Gallery

Contacts

Author: Tech Jacks Solutions

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction AI updates on arXiv.org

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types AI updates on arXiv.org

Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D AI updates on arXiv.org

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains AI updates on arXiv.org

Potential Outcome Rankings for Counterfactual Decision Making AI updates on arXiv.org

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labelingcs.AI updates on arXiv.org

From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models AI updates on arXiv.org

HARNESS: Human-Agent Risk Navigation and Event Safety System for Proactive Hazard Forecasting in High-Risk DOE Environments AI updates on arXiv.org

Stop Worrying about AGI: The Immediate Danger is Reduced General Intelligence (RGI) Towards Data Science

Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs Comet MarkTechPost

Services

Learn

Company