LeroyDyer commited on
Commit
a28f762
·
verified ·
1 Parent(s): 7ae8188

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - mergekit
8
+ - merge
9
+ - unsloth
10
+ base_model:
11
+ - LeroyDyer/Mixtral_AI_CyberBrain_2.0
12
+ - ezelikman/quietstar-8-ahead
13
+ ---
14
+
15
+ ActulLLY ITS woRKING IT JUST NEEDS TRAINING DATA!! ....
16
+
17
+
18
+ This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer).
19
+
20
+ IE: First Clone the latest transformers
21
+ enter the models\mistral folder and upload the modelling_mistral.py
22
+ then cd transformers and install frot he folder pip install ./transformers
23
+
24
+ after it can be loaded normally for training;
25
+
26
+ ```
27
+
28
+ from unsloth import FastLanguageModel
29
+ import torch
30
+ max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
31
+ dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
32
+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
33
+
34
+ # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
35
+ fourbit_models = [
36
+ "unsloth/mistral-7b-bnb-4bit",
37
+ "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
38
+ "unsloth/llama-2-7b-bnb-4bit",
39
+ "unsloth/llama-2-13b-bnb-4bit",
40
+ "unsloth/codellama-34b-bnb-4bit",
41
+ "unsloth/tinyllama-bnb-4bit",
42
+ "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
43
+ "unsloth/gemma-2b-bnb-4bit",
44
+ ] # More models at https://huggingface.co/unsloth
45
+
46
+ model = FastLanguageModel.from_pretrained(
47
+ model_name = "LeroyDyer/Mixtral_AI_CyberBrain_3.0", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
48
+ max_seq_length = 2048,
49
+ dtype = dtype,
50
+ load_in_4bit = load_in_4bit,
51
+ # trust_remote_code = True,
52
+ ignore_mismatched_sizes = True,
53
+ merged_talk_heads=True,
54
+ merged_lm_and_talk_heads=False,
55
+ merged_lm_and_think_heads=True,
56
+ use_concat_talk_head=True,
57
+ use_shallow_think=True,
58
+ use_shallow_talk=False,
59
+ use_complex_think_head=False,
60
+ use_complex_talk_head=True,
61
+ use_weighted_talk_head=True,
62
+
63
+ # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
64
+ )
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_id,truncation=True,padding_side="right")
67
+ tokenizer.pad_token_id = tokenizer.eos_token_id
68
+
69
+
70
+
71
+ model.tokenizer = tokenizer
72
+
73
+ model.train
74
+
75
+
76
+ ```
77
+
78
+
79
+ right now the modelling_mistral.py s still havng problems loading remotely hence the hacky way... but after its fixed it will be fine.
80
+
81
+
82
+
83
+ # merge
84
+
85
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
86
+ yes multiple verions of this model was merged in attempts to grab the neccasary tensors ...
87
+ but some how it did not build as some parameters was not loading. ie it would not load the config file! hopefully this will be rectified soon. so remote loading will be fine ... enabling for enhanced training.
88
+ the model was trained to perfection so it still works fine!
89
+ the lora was made so tat later it can be loaded with the model for further training of the effected tensors...
90
+
91
+ ## Merge Details
92
+ ### Merge Method
93
+
94
+ This model was merged using the SLERP merge method.
95
+
96
+ ### Models Merged
97
+
98
+ The following models were included in the merge:
99
+ * [LeroyDyer/Mixtral_AI_CyberBrain_2.0](https://huggingface.co/LeroyDyer/Mixtral_AI_CyberBrain_2.0)
100
+ * [ezelikman/quietstar-8-ahead](https://huggingface.co/ezelikman/quietstar-8-ahead)
101
+
102
+ ### Configuration
103
+
104
+ The following YAML configuration was used to produce this model:
105
+
106
+ ```yaml
107
+
108
+ slices:
109
+ - sources:
110
+ - model: LeroyDyer/Mixtral_AI_CyberBrain_2.0
111
+ layer_range: [0, 32]
112
+ - model: ezelikman/quietstar-8-ahead
113
+ layer_range: [0, 32]
114
+ # or, the equivalent models: syntax:
115
+ # models:
116
+ # - model: mistralai/Mistral-7B-Instruct-v0.2
117
+ # LaRGER MODEL MUST BE BASE or
118
+ # BASE MODEL MUST BE THE TOKENIZER YOU WISH TO ADOPT
119
+ # so for models with customized processes they must be the base model
120
+ # If the base model has remote code then this must be collected and added
121
+ # to the repo after and the config file adusted to allow for automapping to your new repo
122
+ # - model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
123
+ merge_method: slerp
124
+ base_model: ezelikman/quietstar-8-ahead
125
+ parameters:
126
+ t:
127
+ - filter: self_attn
128
+ value: [0.3, 0.6, 0.3786, 0.6, 0.6]
129
+ - filter: mlp
130
+ value: [0.7, 0.4, 0.6, 0.4, 0.7]
131
+ - value: 0.5 # fallback for rest of tensors
132
+ dtype: float16
133
+
134
+ ```