RichardErkhov commited on
Commit
ec07615
verified
1 Parent(s): f50ead4

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +227 -0
README.md ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit - GGUF
11
+ - Model creator: https://huggingface.co/Agnuxo/
12
+ - Original model: https://huggingface.co/Agnuxo/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q2_K.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q2_K.gguf) | Q2_K | 0.63GB |
18
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_XS.gguf) | IQ3_XS | 0.68GB |
19
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_S.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_S.gguf) | IQ3_S | 0.71GB |
20
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_S.gguf) | Q3_K_S | 0.71GB |
21
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_M.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ3_M.gguf) | IQ3_M | 0.72GB |
22
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K.gguf) | Q3_K | 0.77GB |
23
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_M.gguf) | Q3_K_M | 0.77GB |
24
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q3_K_L.gguf) | Q3_K_L | 0.82GB |
25
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ4_XS.gguf) | IQ4_XS | 0.84GB |
26
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_0.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_0.gguf) | Q4_0 | 0.87GB |
27
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.IQ4_NL.gguf) | IQ4_NL | 0.88GB |
28
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_S.gguf) | Q4_K_S | 0.88GB |
29
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K.gguf) | Q4_K | 0.92GB |
30
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_K_M.gguf) | Q4_K_M | 0.92GB |
31
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_1.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q4_1.gguf) | Q4_1 | 0.95GB |
32
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_0.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_0.gguf) | Q5_0 | 1.02GB |
33
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K_S.gguf) | Q5_K_S | 1.02GB |
34
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K.gguf) | Q5_K | 1.05GB |
35
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_K_M.gguf) | Q5_K_M | 1.05GB |
36
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_1.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q5_1.gguf) | Q5_1 | 1.1GB |
37
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q6_K.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q6_K.gguf) | Q6_K | 1.19GB |
38
+ | [Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q8_0.gguf](https://huggingface.co/RichardErkhov/Agnuxo_-_Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit-gguf/blob/main/Qwen2-1.5B-Instruct_MOE_CODE_assistant_16bit.Q8_0.gguf) | Q8_0 | 1.53GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ base_model: Agnuxo/Qwen2-1.5B-Instruct_MOE_assistant_16bit
46
+ language:
47
+ - en
48
+ license: apache-2.0
49
+ tags:
50
+ - text-generation-inference
51
+ - transformers
52
+ - unsloth
53
+ - qwen2
54
+ - trl
55
+ - sft
56
+ ---
57
+
58
+ # Uploaded model
59
+
60
+ - **Developed by:** Agnuxo
61
+ - **License:** apache-2.0
62
+ - **Finetuned from model :** Agnuxo/Qwen2-1.5B-Instruct_MOE_assistant_16bit
63
+
64
+ This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
65
+
66
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
67
+
68
+ ## How the MOE System Works
69
+
70
+ This model is a core component of a larger Multi-Expert Question Answering System. Here's a breakdown of the system's functionality:
71
+
72
+ 1. **Model Loading:** The system loads the "director" LLM and keeps other expert LLMs (e.g., for programming, biology, mathematics) ready for use.
73
+ 2. **Expert Routing:** When a user asks a question, the system either:
74
+ - Uses keyword matching to identify the relevant domain.
75
+ - Consults the director LLM to classify the question's category.
76
+ 3. **Dynamic Expert Loading:** The system loads the chosen expert LLM into memory, optimizing resource usage by releasing any previously active expert.
77
+ 4. **Response Generation:** The selected expert LLM receives the question and generates a tailored answer.
78
+ 5. **Chat Interface:** A user-friendly chat interface facilitates interaction with the MOE system.
79
+
80
+ This MOE approach enhances efficiency and accuracy compared to relying on a single, general-purpose LLM.
81
+
82
+ Repository and Additional Information
83
+ Full Code: https://huggingface.co/Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit/resolve/main/MOE-LLMs3.py
84
+ GitHub Repository: https://github.com/Agnuxo1/NEBULA
85
+
86
+
87
+ ## Code Example
88
+
89
+ The following code demonstrates the implementation of the Multi-Expert Question Answering System:
90
+
91
+ ```python
92
+ import os
93
+ import torch
94
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
95
+
96
+ MODEL_CONFIG = {
97
+ "director": {
98
+ "name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit",
99
+ "task": "text-generation",
100
+ },
101
+ "programming": {
102
+ "name": "Qwen/Qwen2-1.5B-Instruct",
103
+ "task": "text-generation",
104
+ },
105
+ "biology": {
106
+ "name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_BIOLOGY_assistant_16bit",
107
+ "task": "text-generation",
108
+ },
109
+ "mathematics": {
110
+ "name": "Qwen/Qwen2-Math-1.5B-Instruct",
111
+ "task": "text-generation",
112
+ }
113
+ }
114
+
115
+
116
+ KEYWORDS = {
117
+ "biology": ["cell", "DNA", "protein", "evolution", "genetics", "ecosystem", "organism", "metabolism", "photosynthesis", "microbiology", "c茅lula", "ADN", "prote铆na", "evoluci贸n", "gen茅tica", "ecosistema", "organismo", "metabolismo", "fotos铆ntesis", "microbiolog铆a"],
118
+ "mathematics": ["Math" "mathematics", "equation", "integral", "derivative", "function", "geometry", "algebra", "statistics", "probability", "ecuaci贸n", "integral", "derivada", "funci贸n", "geometr铆a", "谩lgebra", "estad铆stica", "probabilidad"],
119
+ "programming": ["python", "java", "C++", "HTML", "scrip", "code", "Dataset", "API", "framework", "debugging", "algorithm", "compiler", "database", "CSS", "JSON", "XML", "encryption", "IDE", "repository", "Git", "version control", "front-end", "back-end", "API", "stack trace", "REST", "machine learning"]
120
+ }
121
+
122
+ class MOELLM:
123
+ def __init__(self):
124
+ self.current_expert = None
125
+ self.current_model = None
126
+ self.current_tokenizer = None
127
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
128
+ print(f"Using device: {self.device}")
129
+ self.load_director_model()
130
+
131
+ def load_director_model(self):
132
+ """Loads the director model."""
133
+ print("Loading director model...")
134
+ model_name = MODEL_CONFIG["director"]["name"]
135
+ self.director_tokenizer = AutoTokenizer.from_pretrained(model_name)
136
+ self.director_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16).to(self.device)
137
+ self.director_pipeline = pipeline(
138
+ MODEL_CONFIG["director"]["task"],
139
+ model=self.director_model,
140
+ tokenizer=self.director_tokenizer,
141
+ device=self.device
142
+ )
143
+ print("Director model loaded.")
144
+
145
+ def load_expert_model(self, expert):
146
+ """Dynamically loads an expert model, releasing memory from the previous model."""
147
+ if expert not in MODEL_CONFIG:
148
+ raise ValueError(f"Unknown expert: {expert}")
149
+
150
+ if self.current_expert != expert:
151
+ print(f"Loading expert model: {expert}...")
152
+
153
+ # Free memory from the current model if it exists
154
+ if self.current_model:
155
+ del self.current_model
156
+ del self.current_tokenizer
157
+ torch.cuda.empty_cache()
158
+
159
+ model_config = MODEL_CONFIG[expert]
160
+ self.current_tokenizer = AutoTokenizer.from_pretrained(model_config["name"])
161
+ self.current_model = AutoModelForCausalLM.from_pretrained(model_config["name"], torch_dtype=torch.float16).to(self.device)
162
+ self.current_expert = expert
163
+
164
+ print(f"{expert.capitalize()} model loaded.")
165
+
166
+ return pipeline(
167
+ MODEL_CONFIG[expert]["task"],
168
+ model=self.current_model,
169
+ tokenizer=self.current_tokenizer,
170
+ device=self.device
171
+ )
172
+
173
+ def determine_expert_by_keywords(self, question):
174
+ """Determines the expert based on keywords in the question."""
175
+ question_lower = question.lower()
176
+ for expert, keywords in KEYWORDS.items():
177
+ if any(keyword in question_lower for keyword in keywords):
178
+ return expert
179
+ return None
180
+
181
+ def determine_expert(self, question):
182
+ """Determines which expert should answer the question."""
183
+ expert = self.determine_expert_by_keywords(question)
184
+ if expert:
185
+ print(f"Expert determined by keyword: {expert}")
186
+ return expert
187
+
188
+ prompt = f"Classify the following question into one of these categories: programming, biology, mathematics. Question: {question}\nCategory:"
189
+ response = self.director_pipeline(prompt, max_length=100, num_return_sequences=1)[0]['generated_text']
190
+ expert = response.split(":")[-1].strip().lower()
191
+ if expert not in MODEL_CONFIG:
192
+ expert = "director"
193
+ print(f"Redirecting question to: {expert}")
194
+ return expert
195
+
196
+ def generate_response(self, question, expert):
197
+ """Generates a response using the appropriate model."""
198
+ try:
199
+ model = self.load_expert_model(expert)
200
+ prompt = f"Answer the following question as an expert in {expert}: {question}\nAnswer:"
201
+ response = model(prompt, max_length=200, num_return_sequences=1)[0]['generated_text']
202
+ return response.split("Answer:")[-1].strip()
203
+ except Exception as e:
204
+ print(f"Error generating response: {str(e)}")
205
+ return "Sorry, there was an error processing your request. Please try again."
206
+
207
+ def chat_interface(self):
208
+ """Simple chat interface."""
209
+ print("Welcome to the MOE-LLM chat. Type 'exit' to quit.")
210
+ while True:
211
+ question = input("\nYou: ")
212
+ if question.lower() in ['exit', 'quit']:
213
+ break
214
+
215
+ try:
216
+ expert = self.determine_expert(question)
217
+ response = self.generate_response(question, expert)
218
+ print(f"\n{expert.capitalize()}: {response}")
219
+ except Exception as e:
220
+ print(f"Error in chat: {str(e)}")
221
+ print("Please try asking another question.")
222
+
223
+ if __name__ == "__main__":
224
+ moe_llm = MOELLM()
225
+ moe_llm.chat_interface()
226
+
227
+