What if an AI agent could localize a root cause, prove a candidate fix via automated analysis and testing, and proactively rewrite related code to eliminate the entire vulnerability class—then open an upstream patch for review? Google DeepMind introduces CodeMender, an AI agent that generates, validates, and upstreams fixes for real-world vulnerabilities using Gemini “Deep
The post Google DeepMind Introduces CodeMender: A New AI Agent that Uses Gemini Deep Think to Automatically Patch Critical Software Vulnerabilities appeared first on MarkTechPost. Read More
BC
October 8, 2025The validation pipeline (root-cause fixes, functional correctness, regression testing, style compliance) assumes these are independently verifiable. Testing similar automated patching locally, “functional correctness” verification is trivial for pure functions but extremely difficult for stateful systems where side effects matter. The “LLM-judge check for functional equivalence” is particularly suspect – LLMs can’t actually prove equivalence; they pattern-match against similar code transformations.
The proactive hardening example (adding Clang bounds-safety annotations) demonstrates the core limitation: this works for well-understood vulnerability classes with compiler-level mitigations. It doesn’t help with logic bugs, authentication bypasses, or application-specific vulnerabilities where no automated annotation exists. The libwebp CVE-2023-4863 example cherry-picks a memory-safety bug that compilers can catch, ignoring the vast majority of vulnerabilities that require semantic understanding.
The multi-agent critique reviewers checking for regressions add another layer of LLM calls, each with potential hallucination. One model generates the patch, another critiques it, triggering self-corrections. Testing similar architectures locally, this creates circular reasoning where models validate each other’s mistakes rather than catching them.