AI Research News: USC Shows AI Can Learn Low-Resource Languages With Far Less Training Data

March 11, 2026 2 min read USC Viterbi School of Engineering Partial

USC Viterbi researchers have demonstrated that an AI model can substantially improve its performance in a programming language it was barely trained on - using a fraction of the data available for mainstream languages, with a paper accepted at IEEE SoutheastCon 2026.

Researchers at the USC Viterbi School of Engineering have shown that AI models don’t need massive training corpora to perform well in low-resource programming languages. Their work, led by undergraduate researcher Minda Li under faculty advisor Bhaskar Krishnamachari, focused on Idris, a functional programming language with a small developer community and limited public code repositories compared to mainstream languages like Python.

USC Viterbi’s news office reported that the team’s method allowed the AI model to “dramatically improve its performance in territory it was barely trained on, pushing well past what its training data alone would ever allow.” The paper has been accepted at IEEE SoutheastCon 2026, scheduled for March 12-15, the formal presentation follows this news release.

The USC team describes the data imbalance in stark terms: Idris has roughly 10,000 times less available training data than Python, according to the researchers’ own figures, approximately 2,000 code repositories versus Python’s 24 million. Those specific numbers come from the USC source and have not been independently confirmed from the available published content; the general claim (dramatically less data, obscure language) is consistent with what’s publicly known about Idris’s footprint.

The research addresses a practical limitation of current code-generating AI systems: they work well for the languages they’ve seen at scale. For everything else, quality drops sharply. A method that pushes past training data limits in low-resource languages has direct implications for developers working in specialized, domain-specific, or legacy language environments where large corpora simply don’t exist.

The work comes from a USC undergraduate. That’s worth noting. It doesn’t diminish the research, IEEE SoutheastCon acceptance is a legitimate peer review bar, but it does mean this is early-stage work rather than an established lab’s production output. Replication and extension by other groups will determine whether the approach generalizes.

View Source

More Technology intelligence

View all Technology