Syed-Hasan-8503
commited on
Commit
•
280bf01
1
Parent(s):
8d7551b
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- mlabonne/FineTome-100k
|
5 |
+
---
|
6 |
+
# Distilled Google Gemma-2-2b-it
|
7 |
+
|
8 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/G0Q--v5zaiCKW96xm8Mhr.png)
|
9 |
+
|
10 |
+
## Model Description
|
11 |
+
|
12 |
+
This model is a distilled version of Google's Gemma-2-2b-it, created through knowledge distillation from the larger Gemma-2-9b-it model. The distillation process was performed using arcee-ai DistilKit, focusing on preserving the capabilities of the larger model in a more compact form.
|
13 |
+
|
14 |
+
### Key Features
|
15 |
+
|
16 |
+
- **Base Model**: Google Gemma-2-2b-it
|
17 |
+
- **Teacher Model**: Google Gemma-2-9b-it
|
18 |
+
- **Distillation Tool**: arcee-ai DistilKit
|
19 |
+
- **Training Data**: Subset of mlabonne/Tome dataset (30,000 rows)
|
20 |
+
- **Distillation Method**: Logit-based distillation
|
21 |
+
|
22 |
+
## Distillation Process
|
23 |
+
|
24 |
+
The distillation process involved transferring knowledge from the larger Gemma-2-9b-it model to the smaller Gemma-2-2b-it model. This was achieved using arcee-ai DistilKit, which offers several key features:
|
25 |
+
|
26 |
+
1. **Logit-based Distillation**: This method ensures that the student model (Gemma-2-2b-it) learns to mimic the output distribution of the teacher model (Gemma-2-9b-it).
|
27 |
+
|
28 |
+
2. **Architectural Consistency**: Both the teacher and student models share the same architecture, allowing for direct logit-based distillation.
|
29 |
+
|
30 |
+
## Dataset
|
31 |
+
|
32 |
+
The model was trained on a subset of the mlabonne/Tome dataset, utilizing 30,000 rows due to computational constraints. This dataset was chosen for its quality and relevance to the target tasks of the model.
|
33 |
+
|
34 |
+
|
35 |
+
## Model Limitations
|
36 |
+
|
37 |
+
While this distilled model retains much of the capability of its larger counterpart, users should be aware of potential limitations:
|
38 |
+
|
39 |
+
- Slightly reduced performance compared to the original Gemma-2-9b-it model
|
40 |
+
- Limited to the scope of tasks covered in the training data
|
41 |
+
- May not perform as well on highly specialized or domain-specific tasks
|
42 |
+
|
43 |
+
## Usage
|
44 |
+
|
45 |
+
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
|
46 |
+
```sh
|
47 |
+
pip install -U transformers
|
48 |
+
```
|
49 |
+
|
50 |
+
Then, copy the snippet from the section that is relevant for your usecase.
|
51 |
+
|
52 |
+
#### Running with the `pipeline` API
|
53 |
+
|
54 |
+
```python
|
55 |
+
import torch
|
56 |
+
from transformers import pipeline
|
57 |
+
|
58 |
+
pipe = pipeline(
|
59 |
+
"text-generation",
|
60 |
+
model="Syed-Hasan-8503/Gemma-2-2b-it-distilled",
|
61 |
+
model_kwargs={"torch_dtype": torch.bfloat16},
|
62 |
+
device="cuda", # replace with "mps" to run on a Mac device
|
63 |
+
)
|
64 |
+
|
65 |
+
messages = [
|
66 |
+
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
|
67 |
+
]
|
68 |
+
|
69 |
+
outputs = pipe(messages, max_new_tokens=256)
|
70 |
+
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
|
71 |
+
print(assistant_response)
|
72 |
+
# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world. So, what be yer pleasure, eh? 🦜
|
73 |
+
```
|