SCoReLoRA: Self-Correct via Reinforcement Learning
SCoReLoRA is an innovative approach to fine-tuning language models using Low-Rank Adaptation (LoRA) combined with reinforcement learning techniques for self-correction. This method aims to improve the model's ability to generate more accurate and refined responses through a two-stage training process.
Features
- Implements a two-stage training process for self-correction
- Utilizes reinforcement learning to improve model outputs
- Compatible with Hugging Face's Transformers library and PEFT
- Supports quantized models for efficient fine-tuning
- Includes evaluation metrics for self-correction performance
How It Works
SCoreLora uses a two-stage training process:
Stage I: The model is trained to generate initial responses and then correct them, minimizing the KL divergence between the base model and the fine-tuned model.
Stage II: The model is further trained using reinforcement learning techniques, with rewards based on the quality of self-corrections.
The training process utilizes shaped rewards and KL divergence to balance between improvement and staying close to the original model's behavior.
Evaluation
The implementation includes functions to evaluate the model's self-correction capabilities, measuring metrics such as:
- Accuracy before and after correction
- Improvement rate
- Rate of successful corrections
- Rate of erroneous corrections
Reference
- Downloads last month
- 3