Safetensors
English
olmo2
amanrangapur commited on
Commit
43f5c98
·
verified ·
1 Parent(s): 627e96b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -38
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - allenai/dolma
5
  language:
6
  - en
7
  ---
@@ -16,7 +16,7 @@ language:
16
  OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
17
 
18
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
19
- The OLMo models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
20
  We release all code, checkpoints, logs (coming soon), and details involved in training these models.
21
 
22
 
@@ -27,6 +27,26 @@ The core models released in this batch are the following:
27
  | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo2-7B-1124) | 4 Trillion | 32 | 4096 | 32 | 4096 |
28
  | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo2-13B-1124) | 5 Trillion | 40 | 5120 | 42 | 4096 |
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  We have released checkpoints for these models, for every 1000 training steps.
31
  The naming convention is `stepXXX-tokensYYYB`.
32
 
@@ -42,6 +62,20 @@ out = list_repo_refs("allenai/OLMo2-7B-1124")
42
  branches = [b.name for b in out.branches]
43
  ```
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ### Model Description
46
 
47
  - **Developed by:** Allen Institute for AI (Ai2)
@@ -65,42 +99,6 @@ branches = [b.name for b in out.branches]
65
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
66
 
67
 
68
- ## Uses
69
-
70
- ### Inference
71
-
72
- Proceed as usual with HuggingFace:
73
- ```python
74
- from transformers import AutoModelForCausalLM, AutoTokenizer
75
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
76
- tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
77
- message = ["Language modeling is "]
78
- inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
79
- # optional verifying cuda
80
- # inputs = {k: v.to('cuda') for k,v in inputs.items()}
81
- # olmo = olmo.to('cuda')
82
- response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
83
- print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
84
- >> 'Language modeling is the first step to build natural language generation...'
85
- ```
86
-
87
- Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
88
- The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
89
-
90
- ### Fine-tuning
91
- Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
92
- 1. Fine-tune with the OLMo repository:
93
- ```bash
94
- torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
95
- --data.paths=[{path_to_data}/input_ids.npy] \
96
- --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
97
- --load_path={path_to_checkpoint} \
98
- --reset_trainer_state
99
- ```
100
- For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
101
-
102
- 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
103
-
104
  <!-- TODO -->
105
  ## Evaluation
106
 
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - allenai/dolmino-mix-1124
5
  language:
6
  - en
7
  ---
 
16
  OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
17
 
18
  OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
19
+ The OLMo models are trained on the [Dolmino](https://huggingface.co/datasets/allenai/dolmino-mix-1124) dataset.
20
  We release all code, checkpoints, logs (coming soon), and details involved in training these models.
21
 
22
 
 
27
  | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo2-7B-1124) | 4 Trillion | 32 | 4096 | 32 | 4096 |
28
  | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo2-13B-1124) | 5 Trillion | 40 | 5120 | 42 | 4096 |
29
 
30
+ ## Inference
31
+
32
+ Proceed as usual with HuggingFace:
33
+ ```python
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
36
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
37
+ message = ["Language modeling is "]
38
+ inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
39
+ # optional verifying cuda
40
+ # inputs = {k: v.to('cuda') for k,v in inputs.items()}
41
+ # olmo = olmo.to('cuda')
42
+ response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
43
+ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
44
+ >> 'Language modeling is the first step to build natural language generation...'
45
+ ```
46
+
47
+ Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
48
+ The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
49
+
50
  We have released checkpoints for these models, for every 1000 training steps.
51
  The naming convention is `stepXXX-tokensYYYB`.
52
 
 
62
  branches = [b.name for b in out.branches]
63
  ```
64
 
65
+ ### Fine-tuning
66
+ Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
67
+ 1. Fine-tune with the OLMo repository:
68
+ ```bash
69
+ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
70
+ --data.paths=[{path_to_data}/input_ids.npy] \
71
+ --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
72
+ --load_path={path_to_checkpoint} \
73
+ --reset_trainer_state
74
+ ```
75
+ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
76
+
77
+ 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
78
+
79
  ### Model Description
80
 
81
  - **Developed by:** Allen Institute for AI (Ai2)
 
99
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
100
 
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  <!-- TODO -->
103
  ## Evaluation
104