|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- allenai/dolmino-mix-1124 |
|
- allenai/olmo-mix-1124 |
|
language: |
|
- en |
|
base_model: allenai/OLMo-2-1124-7B |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/OLMo-2-1124-7B-Q8_0-GGUF |
|
This model was converted to GGUF format from [`allenai/OLMo-2-1124-7B`](https://huggingface.co/allenai/OLMo-2-1124-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/allenai/OLMo-2-1124-7B) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
|
|
|
|
|
|
We introduce OLMo 2, a new family of 7B and 13B models featuring a |
|
9-point increase in MMLU, among other evaluation improvements, compared |
|
to the original OLMo 7B model. These gains come from training on |
|
OLMo-mix-1124 and Dolmino-mix-1124 datasets and staged training |
|
approach. |
|
|
|
|
|
OLMo is a series of Open Language Models |
|
designed to enable the science of language models. |
|
These models are trained on the Dolma dataset. We are releasing all |
|
code, checkpoints, logs (coming soon), and associated training details. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Installation |
|
|
|
|
|
|
|
|
|
OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using: |
|
|
|
|
|
pip install --upgrade git+https://github.com/huggingface/transformers.git |
|
|
|
|
|
Inference |
|
|
|
|
|
|
|
You can use OLMo with the standard HuggingFace transformers library: |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B") |
|
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B") |
|
message = ["Language modeling is "] |
|
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optional verifying cuda |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inputs = {k: v.to('cuda') for k,v in inputs.items()} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
olmo = olmo.to('cuda') |
|
|
|
|
|
|
|
|
|
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95) |
|
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0]) |
|
|
|
|
|
|
|
|
|
|
|
|
|
'Language modeling is a key component of any text-based application, but its effectiveness...' |
|
|
|
|
|
|
|
|
|
|
|
|
|
For faster performance, you can quantize the model using the following method: |
|
|
|
|
|
AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", |
|
torch_dtype=torch.float16, |
|
load_in_8bit=True) # Requires bitsandbytes |
|
|
|
|
|
The quantized model is more sensitive to data |
|
types and CUDA operations. To avoid potential issues, it's recommended |
|
to pass the inputs directly to CUDA using: |
|
|
|
|
|
inputs.input_ids.to('cuda') |
|
|
|
|
|
We have released checkpoints for these models. For pretraining, the |
|
naming convention is stepXXX-tokensYYYB. For checkpoints with |
|
ingredients of the soup, the naming convention is |
|
stage2-ingredientN-stepXXX-tokensYYYB |
|
|
|
|
|
To load a specific model revision with HuggingFace, simply add the argument revision: |
|
|
|
|
|
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", revision="step1000-tokens5B") |
|
|
|
|
|
Or, you can access all the revisions for the models via the following code snippet: |
|
|
|
|
|
from huggingface_hub import list_repo_refs |
|
out = list_repo_refs("allenai/OLMo-2-1124-7B") |
|
branches = [b.name for b in out.branches] |
|
|
|
|
|
Fine-tuning |
|
|
|
|
|
|
|
Model fine-tuning can be done from the final checkpoint (the main |
|
revision of this model) or many intermediate checkpoints. Two recipes |
|
for tuning are available. |
|
|
|
|
|
Fine-tune with the OLMo repository: |
|
|
|
|
|
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} |
|
--data.paths=[{path_to_data}/input_ids.npy] |
|
--data.label_mask_paths=[{path_to_data}/label_mask.npy] |
|
--load_path={path_to_checkpoint} |
|
--reset_trainer_state |
|
|
|
|
|
For more documentation, see the GitHub readme. |
|
|
|
|
|
Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here. |
|
|
|
|
|
Model Description |
|
|
|
|
|
|
|
Developed by: Allen Institute for AI (Ai2) |
|
Model type: a Transformer style autoregressive language model. |
|
Language(s) (NLP): English |
|
License: The code and model are released under Apache 2.0. |
|
Contact: Technical inquiries: [email protected]. Press: [email protected] |
|
Date cutoff: Dec. 2023. |
|
|
|
|
|
Model Sources |
|
|
|
|
|
|
|
Project Page: https://allenai.org/olmo |
|
Repositories: |
|
Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo |
|
Evaluation code: https://github.com/allenai/OLMo-Eval |
|
Further fine-tuning code: https://github.com/allenai/open-instruct |
|
|
|
|
|
Paper: Coming soon |
|
|
|
|
|
Pretraining |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OLMo 2 7B |
|
OLMo 2 13B |
|
|
|
|
|
Pretraining Stage 1 |
|
(OLMo-Mix-1124) |
|
4 trillion tokens |
|
(1 epoch) |
|
5 trillion tokens |
|
(1.2 epochs) |
|
|
|
|
|
Pretraining Stage 2 |
|
(Dolmino-Mix-1124) |
|
50B tokens (3 runs) |
|
merged |
|
100B tokens (3 runs) |
|
300B tokens (1 run) |
|
merged |
|
|
|
|
|
Post-training |
|
(Tulu 3 SFT OLMo mix) |
|
SFT + DPO + PPO |
|
(preference mix) |
|
SFT + DPO + PPO |
|
(preference mix) |
|
|
|
|
|
Stage 1: Initial Pretraining |
|
|
|
|
|
|
|
Dataset: OLMo-Mix-1124 (3.9T tokens) |
|
Coverage: 90%+ of total pretraining budget |
|
7B Model: ~1 epoch |
|
13B Model: 1.2 epochs (5T tokens) |
|
|
|
|
|
Stage 2: Fine-tuning |
|
|
|
|
|
|
|
Dataset: Dolmino-Mix-1124 (843B tokens) |
|
Three training mixes: |
|
50B tokens |
|
100B tokens |
|
300B tokens |
|
|
|
|
|
Mix composition: 50% high-quality data + academic/Q&A/instruction/math content |
|
|
|
|
|
Model Merging |
|
|
|
|
|
|
|
7B Model: 3 versions trained on 50B mix, merged via model souping |
|
13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint |
|
|
|
|
|
Bias, Risks, and Limitations |
|
|
|
|
|
|
|
Like any base language model or fine-tuned model without safety |
|
filtering, these models can easily be prompted by users to generate |
|
harmful and sensitive content. Such content may also be produced |
|
unintentionally, especially in cases involving bias, so we recommend |
|
that users consider the risks when applying this technology. |
|
Additionally, many statements from OLMo or any LLM are often inaccurate, |
|
so facts should be verified. |
|
|
|
|
|
Citation |
|
|
|
|
|
|
|
A technical manuscript is forthcoming! |
|
|
|
|
|
Model Card Contact |
|
|
|
|
|
|
|
For errors in this model card, contact [email protected]. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/OLMo-2-1124-7B-Q8_0-GGUF --hf-file olmo-2-1124-7b-q8_0.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/OLMo-2-1124-7B-Q8_0-GGUF --hf-file olmo-2-1124-7b-q8_0.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/OLMo-2-1124-7B-Q8_0-GGUF --hf-file olmo-2-1124-7b-q8_0.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/OLMo-2-1124-7B-Q8_0-GGUF --hf-file olmo-2-1124-7b-q8_0.gguf -c 2048 |
|
``` |
|
|