TL;DR:
- KAIST researchers introduce SelFee, a language model designed for self-feedback and self-revision generation.
- SelFee achieves high-quality responses without relying on external models and continuously revises its answers within a single inference.
- Diverse instruction data is collected from various sources and augmented using ChatGPT for generating feedback and revision instances.
- Training SelFee using the FastChat framework improves answer quality by increasing the minimum required revisions.
- SelFee shows comparable performance to ChatGPT in the Vicuna evaluation setting but has limitations in math, reasoning, factuality, and coding.
- Iterative revision enhances language model responses, and prioritizing inference computation is more effective than simply increasing model size.
Main AI News:
In a recent study conducted by a team of researchers from KAIST, the effectiveness of natural language feedback in improving language model performance has been underscored. These researchers have introduced an innovative model called SelFee, which is specifically designed for self-feedback and self-revision generation. What sets SelFee apart from previous approaches is its ability to generate high-quality responses without relying on external language or task-specific models.
SelFee is a finely-tuned LLaMA-based instruction-following model that continuously revises its answers until it achieves a high-quality response within a single inference. By leveraging the given instruction, SelFee generates an initial solution and self-feedback sequences. Through a careful analysis of the generated feedback, the model determines whether a revision is necessary. If required, it generates a revised answer based on the feedback. This iterative revision process occurs within a single inference, leading to improved solutions compared to existing LLaMA-based models.
To build SelFee, the researchers collected diverse instruction data from various sources, including ShareGPT, Alpaca, Math, Code, and Flan Collection. To overcome the scarcity of feedback and revision data, they employed a distillation process using a teacher model named ChatGPT. This approach allowed them to generate additional instances of feedback and revision at a more affordable cost.
The training process of SelFee involved leveraging data augmentation techniques utilizing OpenAI API calls. The researchers gathered instructions from multiple sources and inputted them into ChatGPT to generate corresponding answers. Subsequently, they obtained feedback on these generated answers by querying ChatGPT once again. If a revision was deemed necessary, ChatGPT revised the answer based on its own generated feedback. This process continued until no further modifications were deemed necessary.
Using the FastChat framework, SelFee was trained. Based on the instruction, the model underwent fine-tuning to generate the answer and feedback chain, which includes revisions. The researchers discovered that increasing the minimum required revisions during the inference process significantly improved the quality of the answers. They observed that a minimum of three revisions yielded the best performance. Surprisingly, a 7B SelFee model that generated at least three revisions outperformed a 13B SelFee model that did not require any modifications.
To evaluate SelFee, the researchers adopted the Vicuna evaluation setting, which involved 80 diverse queries. Rather than relying on human evaluation, they conducted a pilot evaluation using GPT-4 as the evaluator. Relative scores were reported, taking into account the positional bias of GPT-4 when compared to ChatGPT.
While SelFee showcased comparable performance to ChatGPT in the Vicuna evaluation setting, it demonstrated limitations in areas such as mathematics, reasoning, factuality, and coding when compared to ChatGPT.
Overall, SelFee introduces a pioneering approach to self-feedback and self-revision generation in language models. By continuously fine-tuning the model to revise its answers, SelFee achieves enhanced performance compared to existing models. These research findings emphasize the significance of iterative revision in improving the quality of language model responses. Furthermore, the study suggests that increasing the inference computation of a model may be more effective than simply increasing its size.
Conclusion:
The introduction of SelFee represents a significant breakthrough in the field of language models. Its self-feedback and self-revision capabilities pave the way for improved response quality and reliability. Businesses operating in industries reliant on language models, such as customer support, content generation, and data analysis, can benefit greatly from SelFee’s ability to continuously refine and enhance its answers.
By leveraging SelFee’s iterative revision process, companies can deliver more accurate and contextually appropriate responses, enhancing customer satisfaction and overall operational efficiency. The research findings emphasize the importance of iterative revision and suggest that optimizing inference computation offers a more efficient solution than merely scaling up a model size. As SelFee continues to evolve, it has the potential to reshape the market by setting a new standard for language model performance and driving innovation in natural language processing applications.