license: apache-2.0 | |
language: | |
- en | |
- hi | |
metrics: | |
- perplexity | |
base_model: meta-llama/Llama-2-7b-hf | |
pipeline_tag: text-generation | |
library_name: transformers | |
tags: | |
- code | |
datasets: | |
- zicsx/mC4-Hindi-Cleaned-3.0 | |
# Finetune Llama-2-7B-hf on Hindi dataset after transtokenization | |
This model was trained on 24GB of RTX A500 on zicsx/mC4-Hindi-Cleaned-3.0 dataset (1%) for 3 hours. | |
We used Hugging Face PEFT-LoRA PyTorch for training. | |
Transtokenization process in -- |