parseny commited on
Commit
d0f656c
1 Parent(s): 85c360c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -5
README.md CHANGED
@@ -1,5 +1,112 @@
1
- ---
2
- metrics:
3
- - rouge
4
- - meteor
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # parseny/TinyLlama1.1B-Nvidia-QA
2
+
3
+ This repository contains the parseny/TinyLlama1.1B-Nvidia-QA model, a fine-tuned version of the TinyLlama language model designed for generating answers on NVIDIA documentation. The model was fine-tuned on a [dataset of question-answer pairs](https://www.kaggle.com/datasets/gondimalladeepesh/nvidia-documentation-question-and-answer-pairs) and evaluated using several metrics to ensure high performance.
4
+
5
+ ## Model Details
6
+
7
+ - **Model ID**: parseny/TinyLlama1.1B-Nvidia-QA
8
+ - **Model Type**: Causal Language Model
9
+ - **Base Model**: TinyLlama-1.1B
10
+ - **Quantization**: 4-bit quantization using BitsAndBytes
11
+ - **Fine-Tuning Framework**: Hugging Face Transformers and PEFT
12
+
13
+ ## Training Configuration
14
+
15
+ The model was fine-tuned with the following training arguments:
16
+
17
+ ```python
18
+ training_arguments = TrainingArguments(
19
+ output_dir="./logs",
20
+ per_device_train_batch_size=16,
21
+ gradient_accumulation_steps=4,
22
+ optim="paged_adamw_32bit",
23
+ fp16=True,
24
+ evaluation_strategy="epoch",
25
+ save_strategy="epoch",
26
+ num_train_epochs=5,
27
+ load_best_model_at_end=True,
28
+ learning_rate=5e-4
29
+ )
30
+ ```
31
+
32
+ ## Evaluation Metrics
33
+
34
+ The performance of the fine-tuned model was evaluated using the following metrics:
35
+
36
+ - **ROUGE Scores**:
37
+ - **ROUGE-1**: 0.3122
38
+ - **ROUGE-2**: 0.1228
39
+ - **ROUGE-L**: 0.2599
40
+ - **ROUGE-Lsum**: 0.2600
41
+
42
+ - **METEOR Score**: 0.27
43
+
44
+ These scores indicate that the model performs reasonably well in generating responses that are lexically and semantically similar to the reference answers.
45
+
46
+ ## Model Usage
47
+
48
+ You can use this model to generate responses for chat-based applications. Below is an example of how to load and use the model for generating responses:
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+ from peft import PeftModel, PeftConfig
53
+ import torch
54
+
55
+ # Load the model and tokenizer
56
+ model_id = "parseny/TinyLlama1.1B-Nvidia-QA"
57
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
58
+ model = AutoModelForCausalLM.from_pretrained(model_id)
59
+ model.to('cuda')
60
+
61
+ # Generate a response
62
+ generation_config = GenerationConfig(
63
+ penalty_alpha=0.6, do_sample=True,
64
+ top_k=5, temperature=0.5, repetition_penalty=1.2,
65
+ max_new_tokens=47, pad_token_id=tokenizer.eos_token_id
66
+ )
67
+
68
+ def generate_response(prompt):
69
+ try:
70
+ inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
71
+ outputs = model.generate(**inputs, generation_config=generation_config)
72
+ generated_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
73
+ start_idx = generated_response.find('<|im_start|>assistant\n') + len('<|im_start|>assistant\n')
74
+ generated_response = generated_response[start_idx:]
75
+ end_idx = generated_response.find('<|im_end|>')
76
+ generated_response = generated_response[:end_idx]
77
+ return generated_response
78
+ except:
79
+ return ""
80
+
81
+ # Example usage
82
+ prompt = "What was the purpose of setting up the DGX RAID memory in version 2 of the pipeline?"
83
+ response = generate_response(prompt)
84
+ print(response)
85
+ ```
86
+
87
+ ## Training Procedure
88
+
89
+ The model was fine-tuned using a dataset of question-answer pairs. The fine-tuning process involved:
90
+
91
+ 1. Loading the pre-trained TinyLlama-1.1B model.
92
+ 2. Quantizing the model to 4-bit precision to reduce memory usage and increase inference speed.
93
+ 3. Fine-tuning the model using the `SFTTrainer` with the specified training arguments.
94
+ 4. Evaluating the model at the end of each epoch and saving the best-performing model.
95
+
96
+ ## How to Cite
97
+
98
+ If you use this model in your research or applications, please cite it as follows:
99
+
100
+ ```
101
+ @misc{parseny-tinyllama-nvidia-qa,
102
+ author = {Your Name},
103
+ title = {TinyLlama1.1B-Nvidia-QA: NVIDIA documnetation helper},
104
+ year = {2024},
105
+ publisher = {Hugging Face},
106
+ url = {https://huggingface.co/parseny/TinyLlama1.1B-Nvidia-QA},
107
+ }
108
+ ```
109
+
110
+ ## Contact
111
+
112
+ For any questions or issues, please open an issue on the Hugging Face model repository.