
Build a Coding Agent That Knows Your Code—For Less Than a Weekend Project Cost
Translate this article
The Allen Institute for AI (AI2) has introduced a new open-source project aimed at making powerful coding agents more accessible and affordable. Called Open Coding Agents, the release centers on a method to train these systems for a fraction of previously reported costs.
The flagship model family is named SERA (Soft-verified Efficient Repository Agents). According to the announcement, the strongest model, SERA-32B, can solve 54.2% of problems in the SWE-Bench Verified benchmark. The key claim is that this performance was achieved with a training cost of roughly 40 GPU days on a small cluster.
The core challenge the project addresses is specializing an agent for a private codebase. Closed models lack knowledge of internal APIs, conventions, and data pipelines. AI2's proposed solution involves two main innovations:
1. Soft-verified generation (SVG): This method generates synthetic training data by creating code patches that are only partially correct, rather than exhaustively testing for full correctness. The finding is that this "soft-verified" data is as effective for training as perfectly correct data, drastically reducing the cost and complexity of data generation.
2. A "bug-type menu": Using a taxonomy of 51 common bug patterns, the method can generate many varied training examples from a single code repository, creating diverse training data at low cost.
The result, as presented, is a pipeline that allows teams to fine-tune a capable coding agent on their own private code. AI2 provides an example where a SERA-32B model, trained on 8,000 samples from a specific codebase at a cost of $1,300, matched or exceeded the performance of a much larger 110B-parameter "teacher" model on that codebase.
Reported Performance and Cost:
· AI2 states the method reproduces the performance of a prior leading open-source model for approximately $400 in compute.
· Achieving performance competitive with top industry open-weight models is cited at a cost of roughly $12,000.
· SERA models are optimized for NVIDIA hardware and are reported to be compatible with Claude Code "out of the box."
The Stated Goal: To democratize access to coding agent technology. By open-sourcing the models, training recipes, and data generation code, AI2 aims to put this capability within reach of individual researchers, small teams, and organizations wanting to build agents tailored to their unique code.
The release includes models ranging from 8B to 32B parameters and emphasizes a simple setup that can be launched with a single line of code.
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!