Pankaj Mathur
commited on
Commit
·
bb961f4
1
Parent(s):
ad636d6
Update README.md
Browse files
README.md
CHANGED
@@ -9,13 +9,13 @@ library_name: adapter-transformers
|
|
9 |
|
10 |
# Dataset
|
11 |
|
12 |
-
We
|
13 |
|
14 |
We leverage all of the 15 system instructions provided in [Orca Research Paper](https://arxiv.org/abs/2306.02707) to generate custom Alpaca dataset, in contrast to vanilla instruction tuning approaches used by original [Alpaca research paper](https://crfm.stanford.edu/2023/03/13/alpaca.html).
|
15 |
|
16 |
This helps student model aka [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b) to learn ***thought*** process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version).
|
17 |
|
18 |
-
Please
|
19 |
|
20 |
# Training
|
21 |
|
@@ -23,22 +23,24 @@ The training configurations are provided in the table below.
|
|
23 |
|
24 |
The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
|
25 |
|
26 |
-
We used DeepSpeed with Zero-3 approaches for parallel gpu training.
|
|
|
|
|
27 |
|
28 |
|||
|
29 |
|:-------------:|:-------------:|
|
30 |
-
|*
|
31 |
|*train_micro_batch_size_per_gpu*|2|
|
32 |
|*gradient_accumulation_steps*|2|
|
33 |
|*Learning rate*|2e-5|
|
34 |
-
|*Epochs*|3|
|
35 |
|*Max length*|1024|
|
|
|
36 |
|
37 |
|
38 |
|
39 |
# Example Usage
|
40 |
|
41 |
-
Below shows an example on how to use
|
42 |
|
43 |
```python
|
44 |
import torch
|
|
|
9 |
|
10 |
# Dataset
|
11 |
|
12 |
+
We trained [OpenLLaMa-3B model](https://github.com/openlm-research/open_llama) on custom explain tuned Alpaca dataset (~52K) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
|
13 |
|
14 |
We leverage all of the 15 system instructions provided in [Orca Research Paper](https://arxiv.org/abs/2306.02707) to generate custom Alpaca dataset, in contrast to vanilla instruction tuning approaches used by original [Alpaca research paper](https://crfm.stanford.edu/2023/03/13/alpaca.html).
|
15 |
|
16 |
This helps student model aka [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b) to learn ***thought*** process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version).
|
17 |
|
18 |
+
Please see below example usage how the **System** prompt is added before each *instruction*.
|
19 |
|
20 |
# Training
|
21 |
|
|
|
23 |
|
24 |
The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
|
25 |
|
26 |
+
We used DeepSpeed with Zero-3 approaches for parallel gpu training by writing our own fine tunning scripts plus leveraging some of the model training code provided by amazing [OpenAlpaca repo](https://github.com/yxuansu/OpenAlpaca)
|
27 |
+
|
28 |
+
Here are some of params used during training:
|
29 |
|
30 |
|||
|
31 |
|:-------------:|:-------------:|
|
32 |
+
|*batch_size*|16|
|
33 |
|*train_micro_batch_size_per_gpu*|2|
|
34 |
|*gradient_accumulation_steps*|2|
|
35 |
|*Learning rate*|2e-5|
|
|
|
36 |
|*Max length*|1024|
|
37 |
+
|*Epochs*|3|
|
38 |
|
39 |
|
40 |
|
41 |
# Example Usage
|
42 |
|
43 |
+
Below shows an example on how to use [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b)
|
44 |
|
45 |
```python
|
46 |
import torch
|