OpenAI Launches the "Reinforcement Fine-Tuning" Program, Making It Easier to Create Expert Large Models | Frontline
Written by | Wang Fangyu
Edited by | Su Jianxun
At 2 a.m. Beijing time on December 7, OpenAI brought the second live stream of the 12-day consecutive conference.
In this live stream, OpenAI presented a new solution - Reinforcement Fine-Tuning. This solution and its functions are expected to be officially launched in 2025.
Reinforcement Fine-Tuning is a brand-new model customization method. It takes a pre-trained general model and further trains it on a small-scale dataset in a specific domain to make it adapt to a specific task. Simply put, it is to let a large model that has "learned many things" further "focus on practicing" for a specific task, making it more suitable for that task.
OpenAI executives introduced that Reinforcement Fine-Tuning can enhance the capabilities of large language models from the "high school level" to the "doctoral-level expert", and is suitable for universities, researchers, and enterprises to create unique AI solutions. For example, OpenAI is collaborating with Thomson Reuters to create a legal professional model exclusively for the company.
OpenAI CEO Sam Altman, who did not participate in this live stream, said on social media: "The effect is excellent. It is my biggest surprise in 2024. I look forward to seeing what people build!"
"Reinforcement Fine-Tuning makes it easier to realize large-scale model for industry experts." The founder of an AI large-scale model application enterprise told 36Kr that this is a new solution that has little relevance to ordinary users but is of great value to professionals in the field.
At the live stream, OpenAI showcased a typical case - the research on rare genetic diseases.
OpenAI collaborated with researchers from Berkeley Laboratory and Charité Hospital in Germany to train the GPT o1 Mini model using Reinforcement Fine-Tuning. This model learned to effectively reason about the causes of rare diseases and surpassed the larger GPT o1 model in performance, demonstrating its potential in diagnosing and understanding complex conditions.
It is worth noting that Reinforcement Fine-Tuning is significantly different from the previous fine-tuning methods. Unlike traditional fine-tuning, Reinforcement Fine-Tuning does not simply make the model "remember the answers", but rather trains the model to learn to reason in a specific domain to find the correct answers.
Specifically, Reinforcement Fine-Tuning has two different data sets, one is the fine-tuning data set and the other is the test data set. The model is first trained based on the fine-tuning data set, and then verified with the test data set. Through repeated self-reasoning training and verification, it ultimately achieves a high level. Therefore, Reinforcement Fine-Tuning can achieve a significant performance improvement even with a limited amount of data (sometimes only a few dozen samples).
However, the Reinforcement Fine-Tuning solution is still in the research preview stage, and OpenAI plans to fully launch it in 2025.
Currently, OpenAI is inviting research institutions, universities, and enterprises to participate in the Reinforcement Fine-Tuning research program. OpenAI hopes to collaborate with organizations willing to share data sets to further optimize the model's performance.