ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation AI updates on arXiv.org

_ January 13, 2026_ Tech Jacks Solutions_ 0 Comments

arXiv:2510.12047v3 Announce Type: replace
Abstract: Current code generation benchmarks measure functional correctness on well-formed inputs, as test cases are curated to satisfy input preconditions. This leaves a gap: generated programs may appear correct but fail to satisfy contracts — assertion-level validity constraints for rejecting ill-formed inputs. We introduce ContractEval, a benchmark for evaluating contract-satisfying assertions in code generation, i.e., whether code rejects contract-violating inputs by triggering intended assertions. Built on HumanEval+ and MBPP+, ContractEval augments each task with contract-violation tests derived from reference assertions. We synthesize these via a neuro-symbolic pipeline: an LLM converts assertion clauses into constraints, and an SMT solver enumerates satisfiable violation combinations to generate inputs that violate selected clauses while satisfying the rest. Across five code LLMs, standard prompting yields 0% contract satisfaction, while adding a few contract-violation examples boosts contract satisfaction to 49–53% while maintaining pass@1 by 92% of the original. Our code is available at https://github.com/suhanmen/ContractEval. Read More

Author

Gallery

Contacts

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation AI updates on arXiv.org

Tech Jacks Solutions

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone

Gallery

Contacts

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation AI updates on arXiv.org

Tech Jacks Solutions

Why Apple chose Google over OpenAI: What enterprise AI buyers can learn from the Gemini deal AI News

KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry AI updates on arXiv.org

Leave a comment Cancel reply

Our Address

Our Mailbox

Our Phone