Syed-Hasan-8503 commited on
Commit
280bf01
1 Parent(s): 8d7551b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - mlabonne/FineTome-100k
5
+ ---
6
+ # Distilled Google Gemma-2-2b-it
7
+
8
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/G0Q--v5zaiCKW96xm8Mhr.png)
9
+
10
+ ## Model Description
11
+
12
+ This model is a distilled version of Google's Gemma-2-2b-it, created through knowledge distillation from the larger Gemma-2-9b-it model. The distillation process was performed using arcee-ai DistilKit, focusing on preserving the capabilities of the larger model in a more compact form.
13
+
14
+ ### Key Features
15
+
16
+ - **Base Model**: Google Gemma-2-2b-it
17
+ - **Teacher Model**: Google Gemma-2-9b-it
18
+ - **Distillation Tool**: arcee-ai DistilKit
19
+ - **Training Data**: Subset of mlabonne/Tome dataset (30,000 rows)
20
+ - **Distillation Method**: Logit-based distillation
21
+
22
+ ## Distillation Process
23
+
24
+ The distillation process involved transferring knowledge from the larger Gemma-2-9b-it model to the smaller Gemma-2-2b-it model. This was achieved using arcee-ai DistilKit, which offers several key features:
25
+
26
+ 1. **Logit-based Distillation**: This method ensures that the student model (Gemma-2-2b-it) learns to mimic the output distribution of the teacher model (Gemma-2-9b-it).
27
+
28
+ 2. **Architectural Consistency**: Both the teacher and student models share the same architecture, allowing for direct logit-based distillation.
29
+
30
+ ## Dataset
31
+
32
+ The model was trained on a subset of the mlabonne/Tome dataset, utilizing 30,000 rows due to computational constraints. This dataset was chosen for its quality and relevance to the target tasks of the model.
33
+
34
+
35
+ ## Model Limitations
36
+
37
+ While this distilled model retains much of the capability of its larger counterpart, users should be aware of potential limitations:
38
+
39
+ - Slightly reduced performance compared to the original Gemma-2-9b-it model
40
+ - Limited to the scope of tasks covered in the training data
41
+ - May not perform as well on highly specialized or domain-specific tasks
42
+
43
+ ## Usage
44
+
45
+ Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
46
+ ```sh
47
+ pip install -U transformers
48
+ ```
49
+
50
+ Then, copy the snippet from the section that is relevant for your usecase.
51
+
52
+ #### Running with the `pipeline` API
53
+
54
+ ```python
55
+ import torch
56
+ from transformers import pipeline
57
+
58
+ pipe = pipeline(
59
+ "text-generation",
60
+ model="Syed-Hasan-8503/Gemma-2-2b-it-distilled",
61
+ model_kwargs={"torch_dtype": torch.bfloat16},
62
+ device="cuda", # replace with "mps" to run on a Mac device
63
+ )
64
+
65
+ messages = [
66
+ {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
67
+ ]
68
+
69
+ outputs = pipe(messages, max_new_tokens=256)
70
+ assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
71
+ print(assistant_response)
72
+ # Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world. So, what be yer pleasure, eh? 🦜
73
+ ```