Hirundo's 4B Gemma 4 Variant Claims Prompt Injection Resistance Beyond 685B Models, Benchmarks Are Self-Reported

May 21, 2026 3 min read Business Wire (Hirundo Press Release) Partial Moderate

G S

Tech Jacks Solutions AI News Coverage

Hirundo has released open weights for a security-hardened variant of Google's Gemma 4 E4B instruction-tuned model, using a technique it calls weight-level machine unlearning to reduce prompt injection susceptibility. Hirundo claims the 4B model outperforms models including DeepSeek V3.2-Exp (685B) on adversarial evaluations, results that come exclusively from Hirundo's own testing and have not been independently verified.

open-source-ai ai-safety gemma-4 hirundo machine-unlearning prompt-injection security-hardened google-deepmind gemmaverse

Self-reported size advantage, 170x

Key Takeaways

Hirundo released open weights for Gemma-4-E4B-IT (Unlearned), a 4B security-hardened Gemma 4 variant using weight-level machine unlearning, available free on Hugging Face
Hirundo claims the model outperforms models up to 685B parameters on prompt injection resistance, per Hirundo's own evaluation only; no independent verification exists
Google DeepMind reportedly featured the model in Gemmaverse, which is an endorsement of novelty, not a technical evaluation of the benchmark claims
No Epoch AI evaluation, no arXiv technical paper, and no third-party security firm assessment is currently available, treat comparative benchmarks as vendor-reported until independently verified

Model Release

Gemma-4-E4B-IT (Unlearned)

OrganizationHirundo (base: Google DeepMind Gemma 4)

TypeOpen Source LLM

Parameters4B

Benchmark[SELF-REPORTED] Outperforms DeepSeek V3.2-Exp (685B), Qwen3-235B, GPT-OSS-120B on adversarial prompt injection evaluation, per Hirundo internal testing only

AvailabilityOpen weights, Hugging Face (free)

Disputed Claim

4B model outperforms models 170x its size on prompt injection resistance, including DeepSeek V3.2-Exp (685B) and Qwen3-235B

All benchmark data is self-reported by Hirundo via its own press release. No independent evaluation by Epoch AI, peer-reviewed arXiv submission, or third-party security research firm exists.

Add to internal red-team evaluation queue. Do not adopt for production security use cases based on vendor benchmarks alone. Wait for Epoch AI or independent security firm evaluation.

Self-reported benchmarks. Read carefully.

Hirundo announced the release of Gemma-4-E4B-IT (Unlearned) on May 21, 2026, a security-hardened fine-tune of Google’s Gemma 4 E4B instruction-tuned model, available as open weights on Hugging Face. The methodology Hirundo describes is weight-level machine unlearning: rather than applying inference-time guardrails or system-prompt filters, Hirundo states its process directly modifies model weights to eliminate the patterns responsible for adversarial susceptibility. That’s a technically interesting distinction. Guardrails can be circumvented by prompt engineering around them. Weight-level modifications, if the technique works as described, are harder to route around because the susceptibility is removed from the model itself rather than filtered at the surface.

The benchmark claim is striking: according to Hirundo’s press release distributed via Business Wire, the 4B model resists prompt injection attacks more robustly than DeepSeek V3.2-Exp (685B), Qwen3-235B, and GPT-OSS-120B, models 170 times its parameter count. Hirundo also states, per its own evaluation, that general instruction-following performance is preserved after the unlearning process. The weight-level approach is described as using adversarial evaluation frameworks in Hirundo’s methodology.

The part nobody mentions in coverage like this: every one of those claims comes from Hirundo’s own testing. No Epoch AI evaluation. No arXiv technical paper with peer-reviewable methodology. No independent security firm has published results. Google DeepMind reportedly featured the model in its Gemmaverse showcase, which is meaningful as an endorsement of the work’s novelty, but Gemmaverse is a Google curation and promotion program, not a technical evaluation. Being featured there doesn’t validate the benchmark claims.

Machine unlearning as a concept is a legitimate and active research area, the idea that you can surgically modify what a model has learned, rather than training it from scratch, has real theoretical backing. Security-specific model variants are an emerging category that AI security engineers should track. What’s unresolved is whether Hirundo’s specific implementation achieves the claimed results at the scale and consistency that production security environments require.

The cost picture is clear on one dimension: open weights on Hugging Face means no licensing fee. For teams running on-premise or in air-gapped environments where commercial model APIs aren’t viable, a 4B model with credible (if unverified) security hardening is worth evaluating. The resource requirement for inference at 4B parameters is modest compared to the 685B models it’s being benchmarked against.

What to Watch

Epoch AI publishes independent evaluation of Gemma-4-E4B-ITTBD

arXiv technical paper on weight-level unlearning methodology from HirundoTBD

Third-party security firm red-team results publishedTBD

What to watch

Epoch AI has not yet evaluated this model. If an independent evaluation appears, from Epoch AI, a named security research firm, or a peer-reviewed arXiv submission, that’s the trigger to revisit the benchmark claims seriously. Until then, treat the 170x comparison as a vendor-framed data point, not an established result.

TJS synthesis

The technique is worth watching; the benchmarks aren’t ready to act on. AI security engineers evaluating open-source models for prompt injection resistance should add Gemma-4-E4B-IT to their testing queue, the weight-level unlearning approach is architecturally distinct from guardrail-based alternatives and deserves a fair evaluation. Don’t deploy to production based on Hirundo’s internal numbers alone. Run your own red-team evaluation against your specific threat model. Independent verification will determine whether this is a genuine security advance or a well-framed press release.