FinGEITje-7B-dpo / README.md

Update README.md

7285cca verified 3 months ago

8.75 kB

	---
	license: cc-by-nc-4.0
	base_model: snoels/FinGEITje-7B-sft
	datasets:
	- BramVanroy/ultra_feedback_dutch
	library_name: peft
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	- geitje
	- fingeitje
	- dutch
	- nl
	- finance
	model-index:
	- name: snoels/FinGEITje-7B-dpo
	results: []
	language:
	- nl
	pipeline_tag: text-generation
	inference: false
	---

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)

	<p align="center" style="margin:0;padding:0">
	<img src="https://huggingface.co/snoels/FinGEITje-7B-dpo/resolve/main/fingeitje-banner-dpo.png" alt="FinGEITje DPO Banner" width="1000"/>
	</p>

	<div style="margin:auto; text-align:center">
	<h1 style="margin-bottom: 0; font-size: 2em;">🐐 FinGEITje 7B DPO</h1>
	<em style="font-size: 1em;">A large open Dutch financial language model aligned through AI feedback.</em>
	</div>

	This model is a fine-tuned version of [snoels/FinGEITje-7B-sft](https://huggingface.co/snoels/FinGEITje-7B-sft) on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset.

	## 📖 Model Description

	[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) is a large open Dutch financial language model with 7 billion parameters, based on Mistral 7B. It has been further trained using Direct Preference Optimization (DPO) on AI-generated preference data, aligning the model's responses with human-like preferences in the Dutch language. This alignment process enhances the model's ability to generate more helpful, coherent, and user-aligned responses in financial contexts.

	## 📊 Training

	### Training Data

	[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) was fine-tuned on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset, which consists of synthetic preference data in Dutch. This dataset includes prompts along with preferred and less preferred responses, allowing the model to learn to generate more aligned responses through DPO.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 64
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.1029 \| 0.1327 \| 100 \| 0.1099 \| -1.8067 \| -5.3683 \| 0.9679 \| 3.5616 \| -892.3373 \| -579.9115 \| -2.4775 \| -2.3705 \|
	\| 0.042 \| 0.2654 \| 200 \| 0.0430 \| -3.5129 \| -10.6778 \| 0.9828 \| 7.1649 \| -1423.2883 \| -750.5289 \| -1.9744 \| -1.9895 \|
	\| 0.0278 \| 0.3981 \| 300 \| 0.0344 \| -3.7335 \| -13.5153 \| 0.9828 \| 9.7818 \| -1707.0360 \| -772.5893 \| -1.7454 \| -1.8191 \|
	\| 0.0223 \| 0.5308 \| 400 \| 0.0308 \| -3.6554 \| -13.7712 \| 0.9858 \| 10.1158 \| -1732.6289 \| -764.7831 \| -1.8020 \| -1.9184 \|
	\| 0.0378 \| 0.6635 \| 500 \| 0.0297 \| -4.0018 \| -16.3285 \| 0.9851 \| 12.3266 \| -1988.3542 \| -799.4221 \| -1.6924 \| -1.8650 \|
	\| 0.0352 \| 0.7962 \| 600 \| 0.0278 \| -3.8104 \| -15.6430 \| 0.9836 \| 11.8327 \| -1919.8119 \| -780.2752 \| -1.7437 \| -1.8978 \|
	\| 0.0238 \| 0.9289 \| 700 \| 0.0279 \| -3.8974 \| -15.9642 \| 0.9828 \| 12.0668 \| -1951.9310 \| -788.9780 \| -1.7371 \| -1.8937 \|

	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.42.4
	- Pytorch 2.3.1
	- Datasets 2.20.0
	- Tokenizers 0.19.1

	## 🛠️ How to Use

	[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) can be utilized using the Hugging Face Transformers library along with PEFT to load the adapters efficiently.

	### Installation

	Ensure you have the necessary libraries installed:

	```bash
	pip install torch transformers peft accelerate
	```

	### Loading the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("BramVanroy/GEITje-7B-ultra", use_fast=False)

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained("BramVanroy/GEITje-7B-ultra", device_map='auto')

	# Load the FinGEITje-7B-dpo model with PEFT adapters
	model = PeftModel.from_pretrained(base_model, "snoels/FinGEITje-7B-dpo", device_map='auto')
	```

	### Generating Text

	```python
	# Prepare the input
	input_text = "Wat zijn de laatste trends in de Nederlandse banksector?"
	input_ids = tokenizer.encode(input_text, return_tensors='pt').to(model.device)

	# Generate a response
	outputs = model.generate(input_ids, max_length=200, num_return_sequences=1)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(response)
	```

	## 🙏 Acknowledgements

	We would like to thank:

	- Rijgersberg ([GitHub](https://github.com/Rijgersberg)) for creating [GEITje](https://github.com/Rijgersberg/GEITje), one of the first Dutch foundation models.
	- Bram Vanroy ([GitHub](https://github.com/BramVanroy)) for creating [GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra) and providing the ultra_feedback_dutch dataset.
	- Contributors of the [Alignment Handbook](https://github.com/huggingface/alignment-handbook) for providing valuable resources that guided the development and training process of [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo).
	- Silverfin for their collaboration in this research. Silverfin, a Belgian scale-up focused on building an accountancy cloud service, provided valuable insights and resources that were instrumental in the development of FinGEITje. More about their work can be found at [Silverfin](https://silverfin.com/).

	## 📝 Citation
	[Link to the paper](https://dl.acm.org/doi/abs/10.1145/3677052.3698628)
	[Link to the arXiv](https://arxiv.org/abs/2410.18417)

	If you use [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) in your work, please cite:

	```bibtex
	@inproceedings{10.1145/3677052.3698628,
	author = {Noels, Sander and De Blaere, Jorne and De Bie, Tijl},
	title = {A Dutch Financial Large Language Model},
	year = {2024},
	isbn = {9798400710810},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3677052.3698628},
	doi = {10.1145/3677052.3698628},
	abstract = {This paper presents FinGEITje, the first Dutch financial Large Language Model (LLM) specifically designed and optimized for various financial tasks. Together with the model, we release a specialized Dutch financial instruction tuning dataset with over 140,000 samples, constructed employing an automated translation and data processing method. The open-source data construction method is provided, facilitating the creation of financial instruction datasets in different languages. To evaluate model performance, the study introduces the first Dutch financial evaluation benchmark, along with an automated evaluation method that utilizes an LLM as an independent evaluator, reducing manual intervention in performance evaluation. The experimental results highlight the superior performance of FinGEITje across five critical Dutch and English financial tasks.},
	booktitle = {Proceedings of the 5th ACM International Conference on AI in Finance},
	pages = {283–291},
	numpages = {9},
	keywords = {Financial Large Language Model, Instruction Tuning., Natural Language Processing},
	location = {Brooklyn, NY, USA},
	series = {ICAIF '24}
	}
	```

	## 📜 License

	This model is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/) license.

	## 📧 Contact

	For any inquiries or questions, please contact [Sander Noels](mailto:[email protected]).