shimmyshimmer commited on
Commit
b549043
·
verified ·
1 Parent(s): 14a73f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -69
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
 
11
  ---
12
 
 
13
  # Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!
14
 
15
  A reupload from https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
@@ -18,76 +19,22 @@ We have a Google Colab Tesla T4 notebook for TinyLlama with 4096 max sequence le
18
 
19
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/u54VK8m8tk)
20
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png" width="200"/>](https://ko-fi.com/unsloth)
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="400"/>](https://github.com/unslothai/unsloth)
22
-
23
- ```python
24
- from unsloth import FastLanguageModel
25
- import torch
26
- from trl import SFTTrainer
27
- from transformers import TrainingArguments
28
- from datasets import load_dataset
29
- max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
30
- # Get LAION dataset
31
- url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
32
- dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
33
-
34
- # 4bit pre quantized models we support - 4x faster downloading!
35
- fourbit_models = [
36
- "unsloth/mistral-7b-bnb-4bit",
37
- "unsloth/llama-2-7b-bnb-4bit",
38
- "unsloth/llama-2-13b-bnb-4bit",
39
- "unsloth/codellama-34b-bnb-4bit",
40
- "unsloth/tinyllama-bnb-4bit",
41
- ] # Go to https://huggingface.co/unsloth for more 4-bit models!
42
 
43
- # Load Llama model
44
- model, tokenizer = FastLanguageModel.from_pretrained(
45
- model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!
46
- max_seq_length = max_seq_length,
47
- dtype = None,
48
- load_in_4bit = True,
49
- )
50
 
51
- # Do model patching and add fast LoRA weights
52
- model = FastLanguageModel.get_peft_model(
53
- model,
54
- r = 16,
55
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
56
- "gate_proj", "up_proj", "down_proj",],
57
- lora_alpha = 16,
58
- lora_dropout = 0, # Supports any, but = 0 is optimized
59
- bias = "none", # Supports any, but = "none" is optimized
60
- use_gradient_checkpointing = True,
61
- random_state = 3407,
62
- max_seq_length = max_seq_length,
63
- use_rslora = False, # We support rank stabilized LoRA
64
- loftq_config = None, # And LoftQ
65
- )
66
 
67
- trainer = SFTTrainer(
68
- model = model,
69
- train_dataset = dataset,
70
- dataset_text_field = "text",
71
- max_seq_length = max_seq_length,
72
- tokenizer = tokenizer,
73
- args = TrainingArguments(
74
- per_device_train_batch_size = 2,
75
- gradient_accumulation_steps = 4,
76
- warmup_steps = 10,
77
- max_steps = 60,
78
- fp16 = not torch.cuda.is_bf16_supported(),
79
- bf16 = torch.cuda.is_bf16_supported(),
80
- logging_steps = 1,
81
- output_dir = "outputs",
82
- optim = "adamw_8bit",
83
- seed = 3407,
84
- ),
85
- )
86
- trainer.train()
87
 
88
- # Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
89
- # (1) Saving to GGUF / merging to 16bit for vLLM
90
- # (2) Continued training from a saved LoRA adapter
91
- # (3) Adding an evaluation loop / OOMs
92
- # (4) Cutomized chat templates
93
- ```
 
10
 
11
  ---
12
 
13
+
14
  # Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!
15
 
16
  A reupload from https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
 
19
 
20
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/u54VK8m8tk)
21
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png" width="200"/>](https://ko-fi.com/unsloth)
22
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ## Finetune for Free
 
 
 
 
 
 
25
 
26
+ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ | Unsloth supports | Free Notebooks | Performance | Memory use |
29
+ |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
30
+ | **Gemma 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing) | 2.4x faster | 58% less |
31
+ | **Mistral 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | 2.2x faster | 62% less |
32
+ | **Llama-2 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing) | 2.2x faster | 43% less |
33
+ | **TinyLlama** | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing) | 3.9x faster | 74% less |
34
+ | **CodeLlama 34b** A100 | [▶️ Start on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing) | 1.9x faster | 27% less |
35
+ | **Mistral 7b** 1xT4 | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less |
36
+ | **DPO - Zephyr** | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) | 1.9x faster | 19% less |
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ - This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
39
+ - This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
40
+ - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.