Programmers, are you fighting bugs every day? Good news is here!
Recently, the Doubao Team from ByteDance has made a significant move by open-sourcing a tool called Multi-SWE-bench. This is not just an ordinary tool; it is specifically designed to test the “automatic bug-fixing” capabilities of large models, and it supports multiple programming languages! Now you can find out if your model can handle not only Python but also Java and C++.
To be honest, the development of large models is progressing rapidly, and the ability to generate code is becoming increasingly important. The previous SWE-bench could only assess programming capabilities in Python, which was too narrow! Moreover, it lacked the complexity needed to handle intricate projects, limiting the development of large models.
Multi-SWE-bench: Not Just Python, Covering Seven Languages!
This time, Multi-SWE-bench is different; it supports seven mainstream languages: Java, TypeScript, C, C++, Go, Rust, and JavaScript! It includes a total of 1632 real bug-fixing tasks, all sourced from open-source projects, ensuring reliable quality. Even better, it categorizes tasks into three levels: easy, medium, and hard, allowing you to clearly see where the model still needs improvement.
The experimental results are quite interesting; current large models can fix Python bugs reasonably well, but when it comes to other languages, the average fix rate is less than 10%! It seems that multi-language code fixing is still a significant challenge!
Reinforcement Learning is Also Applicable, and an Open Source Community Awaits Your Participation!
To enable reinforcement learning in automatic programming, they have also open-sourced Multi-SWE-RL, providing 4723 examples along with a Docker environment for easy one-click startup and automatic evaluation. It is tailor-made for RL training!
Even better, they have established an open-source community, welcoming developers and researchers to participate, expand the dataset, test new methods, and collaboratively build an ecosystem for RL in code. Just imagine fixing bugs together!
Conclusion: The Future of Automatic Programming Awaits Your Participation!
The Doubao Team hopes that Multi-SWE-bench will advance automatic programming technology to new heights, and they will continue to expand its coverage to help large models achieve greater breakthroughs in the field of automated software engineering. So, all programmers, hurry up and get involved to contribute to the future of automatic programming!