LeroyDyer commited on
Commit
20dc4fd
·
verified ·
1 Parent(s): 81bf2f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -9
README.md CHANGED
@@ -1,22 +1,121 @@
1
  ---
2
  base_model: unsloth/mistral-nemo-instruct-2407-bnb-4bit
3
- language:
4
- - en
5
  license: apache-2.0
6
  tags:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - text-generation-inference
8
- - transformers
9
- - unsloth
10
- - mistral
11
- - gguf
 
 
 
 
 
12
  ---
13
 
14
- # Uploaded model
 
15
 
 
 
 
 
 
 
 
 
 
 
 
16
  - **Developed by:** LeroyDyer
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/mistral-nemo-instruct-2407-bnb-4bit
19
 
20
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
  base_model: unsloth/mistral-nemo-instruct-2407-bnb-4bit
 
 
3
  license: apache-2.0
4
  tags:
5
+ - Mistral_Star
6
+ - Mistral_Quiet
7
+ - Mistral
8
+ - Mixtral
9
+ - Question-Answer
10
+ - Token-Classification
11
+ - Sequence-Classification
12
+ - SpydazWeb-AI
13
+ - chemistry
14
+ - biology
15
+ - legal
16
+ - code
17
+ - climate
18
+ - medical
19
  - text-generation-inference
20
+ language:
21
+ - en
22
+ - sw
23
+ - ig
24
+ - zu
25
+ - ca
26
+ - es
27
+ - pt
28
+ - ha
29
  ---
30
 
31
+ # Spydaz WEB AI
32
+
33
 
34
+ ## Model Architecture
35
+ Mistral Nemo is a transformer model, with the following architecture choices:
36
+ - **Layers:** 40
37
+ - **Dim:** 5,120
38
+ - **Head dim:** 128
39
+ - **Hidden dim:** 14,436
40
+ - **Activation Function:** SwiGLU
41
+ - **Number of heads:** 32
42
+ - **Number of kv-heads:** 8 (GQA)
43
+ - **Vocabulary size:** 2**17 ~= 128k
44
+ - **Rotary embeddings (theta = 1M)**
45
  - **Developed by:** LeroyDyer
46
  - **License:** apache-2.0
47
  - **Finetuned from model :** unsloth/mistral-nemo-instruct-2407-bnb-4bit
48
 
49
+ <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
50
+ https://github.com/spydaz
51
+
52
+
53
+ # Introduction :
54
+
55
+ ## STAR REASONERS !
56
+
57
+ this provides a platform for the model to commuicate pre-response , so an internal objective can be set ie adding an extra planning stage to the model improving its focus and output:
58
+ the thought head can be charged with a thought or methodolgy, such as a ststing to take a step by step approach to the problem or to make an object oriented model first and consider the use cases before creating an output:
59
+ so each thought head can be dedicated to specific ppurpose such as Planning or artifact generation or use case design : or even deciding which methodology should be applied before planning the potential solve route for the response :
60
+ Another head could also be dedicated to retrieving content based on the query from the self which can also be used in the pregenerations stages :
61
+ all pre- reasoners can be seen to be Self Guiding ! essentially removing the requirement to give the model a system prompt instead aligning the heads to a thoght pathways !
62
+ these chains produce data which can be considered to be thoughts : and can further be displayed by framing these thoughts with thought tokens : even allowing for editors comments giving key guidance to the model during training :
63
+ these thoughts will be used in future genrations assisting the model as well a displaying explantory informations in the output :
64
+
65
+ these tokens can be displayed or with held also a setting in the model !
66
+ ### can this be applied in other areas ?
67
+
68
+ Yes! , we can use this type of method to allow for the model to generate code in another channel or head potentially creating a head to produce artifacts for every output , or to produce entity lilsts for every output and framing the outputs in thier relative code tags or function call tags :
69
+ these can also be displayed or hidden for the response . but these can also be used in problem solvibng tasks internally , which again enables for the model to simualte the inpouts and outputs from an interpretor !
70
+ it may even be prudent to include a function executing internally to the model ! ( allowing the model to execute functions in the background! before responding ) as well this oul hae tpo also be specified in the config , as autoexecute or not !.
71
+
72
+ #### AI AGI ?
73
+ so yes we can see we are not far from an ai which can evolve : an advance general inteligent system ( still non sentient by the way )
74
+
75
+
76
+
77
+
78
+ ### Conclusion
79
+
80
+ the resonaer methodology , might be seen to be the way forwards , adding internal funciton laity to the models instead of external connectivity enables for faster and seemless model usage : as well as enriched and informed responses , as even outputs could essentially be cleanss and formated before being presented to the Calling interface, internally to the model :
81
+ the take away is that arre we seeing the decoder/encoder model as simple a function of the inteligence which in truth need to be autonomus !
82
+ ie internal functions and tools as well as disk interaction : an agent must have awareness and control over its environment with sensors and actuators : as a fuction callingmodel it has actuators and canread the directorys it has sensors ... its a start: as we can eget media in and out , but the model needs to get its own control to inpout and output also !
83
+
84
+ Fine tuning : agin this issue of fine tuning : the disussion above eplains the requirement to control the environment from within the moel ( with constraints ) does this eliminate theneed to fine tune a model !
85
+ in fact it should as this give transparency to ther growth ofthe model and if the model fine tuned itself we would be in danger of a model evolveing !
86
+ hence an AGI !
87
+
88
+ # LOAD MODEL
89
+
90
+
91
+ ```
92
+ ! git clone https://github.com/huggingface/transformers.git
93
+ ## copy modeling_mistral.py and configuartion.py to the Transformers foler / Src/models/mistral and overwrite the existing files first:
94
+ ## THEN :
95
+ !cd transformers
96
+ !pip install ./transformers
97
+
98
+ ```
99
+
100
+ then restaet the environment: the model can then load without trust-remote and WILL work FINE !
101
+ it can even be trained : hence the 4 bit optimised version ::
102
+
103
+ ``` Python
104
+
105
+
106
+ # Load model directly
107
+ from transformers import AutoTokenizer, AutoModelForCausalLM
108
+
109
+ tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/_Spydaz_Web_AI_MistralStar_V2", trust_remote_code=True)
110
+ model = AutoModelForCausalLM.from_pretrained("LeroyDyer/_Spydaz_Web_AI_MistralStar_V2", trust_remote_code=True)
111
+ model.tokenizer = tokenizer
112
+
113
+ ```
114
+
115
+
116
+
117
+
118
+
119
+
120
+
121