Dissecting Physics Reasoning in Small Language Models: A Multi-Dimensional Analysis from an Educational Perspective AI updates on arXiv.org

_ January 8, 2026_ Tech Jacks Solutions_ 0 Comments

arXiv:2505.20707v2 Announce Type: replace-cross
Abstract: Small Language Models (SLMs) offer privacy and efficiency for educational deployment, yet their utility depends on reliable multistep reasoning. Existing benchmarks often prioritize final answer accuracy, obscuring ‘right answer, wrong procedure’ failures that can reinforce student misconceptions. This work investigates SLM physics reasoning reliability, stage wise failure modes, and robustness under paired contextual variants. We introduce Physbench, comprising of 3,162 high school and AP level physics questions derived from OpenStax in a structured reference solution format with Bloom’s Taxonomy annotations, plus 2,700 paired culturally contextualized variants. Using P-REFS, a stage wise evaluation rubric, we assess 10 SLMs across 58,000 responses. Results reveal substantial reliability gap: among final answer correct solutions, 75 to 98% contain at least one reasoning error. Failure modes shift with model capability; weaker models fail primarily at interpretation or modeling while stronger models often fail during execution. Paired contextual variations have minimal impact on top models but degrade the performance of mid-tier models. These findings demonstrate that safe educational AI requires evaluation paradigms that prioritize reasoning fidelity over final-answer correctness. Read More

Author

Gallery

Contacts

Dissecting Physics Reasoning in Small Language Models: A Multi-Dimensional Analysis from an Educational Perspective AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone

Gallery

Contacts

Dissecting Physics Reasoning in Small Language Models: A Multi-Dimensional Analysis from an Educational Perspective AI updates on arXiv.org

Tech Jacks Solutions

ChatGPT is losing market share as Google Gemini gains ground BleepingComputerMayank Parmar

Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs AI updates on arXiv.org

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone