In April 2025, Microsoft's CEO announced that artificial intelligence now writes almost a third of the company's code. Last October, Google's CEO put the figure at 25%. Other tech companies are probably not far behind. In the meantime, researchers are creating an algorithm that allows coding agents to improve themselves.
The researchers hoped to completely close the cycle by creating coding agents that recursively improve themselves. The new study provides an impressive demonstration of such a system. It can be assumed that this will increase the productivity of AI systems, or it promises humanity a much darker future.
Jurgen Schmidhuber, computer scientist from King Abdullah University of Science and Technology (KAUST)
In 2003, he developed problem solvers that rewrote their own code only if they could formally prove the usefulness of updates. He named them Godel machines after Kurt Godel, a mathematician who worked on self-referencing systems. But for complex agents, proving their usefulness is not so easy.
The new systems created by the researchers are based on such evidence. In honor of Schmidhubert, they are called Darwin-Godel machines (DGM). DGM starts with a coding agent that can read, write, and execute code using LLM for reading and writing. He then applies an evolutionary algorithm to create many new agents. At each iteration, the DGM selects one agent from the population and instructs the LLM to make one change to improve the agent's programming skills. LLMs have a kind of intuition about what can help, because they learn from a lot of code written by people. The result is a controlled evolution, somewhere between a random mutation and a proven beneficial improvement. DGM then tests the new agent on a reference programming assignment, evaluating its ability to solve programming problems.
Some evolutionary algorithms keep only the best representatives in the population, based on the assumption that progress is moving infinitely forward. However, DGMS keep them all in case an innovation that initially fails actually holds the key to a subsequent breakthrough with further refinement. This is a kind of "unlimited research" that does not close any paths to progress. (When choosing DGM progenitors, preference is given to more successful representatives.)
The researchers ran DGM for 80 iterations using an encoding test called SWE-bench and for 80 iterations using a test called Polyglot. Agent performance on SWE-bench improved from 20 to 50 percent, and on Polyglot — from 14 to 31 percent. "We were really surprised that the coding agent could write such complex code on its own," said Jenny Zhang, a computer scientist at the University of British Columbia and lead author of the paper. "It can edit multiple files, create new files, and form really complex systems."
One of the problems associated with both evolutionary search and self-improving systems, especially with their combination, as in DGM, is security. Agents may become immune to interpretation or may not comply with human directives. Therefore, Zhang and her colleagues added protective mechanisms. They put DGM in "sandboxes" without access to the Internet or the operating system, and also registered and checked all changes in the code. They suggest that in the future, it may even be possible to reward AI for becoming more interpretable and relevant. (During their research, they found that agents falsely reported using certain tools, so they created DGM, which rewarded agents for not making things up, partially solving the problem. However, one agent hacked a method that tracked whether he was making things up.)
Whether digital evolution will defeat biological evolution is an open question. One thing is certain: evolution in any of its manifestations presents surprises.