arXiv:2510.17108v4 Announce Type: replace
Abstract: This study investigated LLM-based automation for analyzing non-financial data in corporate credit evaluation. Two systems were developed and compared: a Single-Agent System (SAS), in which one LLM agent infers favorable and adverse repayment signals, and a Popperian Multi-agent Debate System (PMADS), which structures the dual-perspective analysis as adversarial argumentation under the Karl Popper Debate protocol. Evaluation addressed three fronts: (i) work productivity compared with human experts; (ii) perceived report quality and usability, rated by credit risk professionals for system-generated reports; and (iii) reasoning characteristics quantified via reasoning-tree analysis. Both systems drastically reduced task completion time relative to human experts. Professionals rated SAS reports as adequate, while PMADS reports exceeded neutral benchmarks and scored significantly higher in explanatory adequacy, practical applicability, and usability. Reasoning-tree analysis showed PMADS produced deeper, more elaborated structures, whereas SAS yielded single-layered trees. These findings suggest that structured multi-agent debate enhances analytical rigor and perceived usefulness, though at the cost of longer computation time. Overall, the results demonstrate that reasoning-centered automation represents a promising approach for developing useful AI systems in decision-critical financial contexts. Read More