prithivMLmods commited on
Commit
0487762
Β·
verified Β·
1 Parent(s): 7897a98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -2
README.md CHANGED
@@ -32,7 +32,7 @@ _/ |_ _______ |__|_____ ____ ____ __ __ | | __ __ _____
32
  \/ \//_____/ \/
33
  </pre>
34
 
35
- # **Triangulum 10B: Multilingual Large Language Models (LLMs)**
36
 
37
  Triangulum 10B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
38
 
@@ -68,7 +68,7 @@ pipe = pipeline(
68
  device_map="auto",
69
  )
70
  messages = [
71
- {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
72
  {"role": "user", "content": "Who are you?"},
73
  ]
74
  outputs = pipe(
@@ -77,6 +77,46 @@ outputs = pipe(
77
  )
78
  print(outputs[0]["generated_text"][-1])
79
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  # **Use Cases**
81
 
82
  - Multilingual content generation
@@ -87,3 +127,91 @@ print(outputs[0]["generated_text"][-1])
87
  # **Technical Details**
88
 
89
  Triangulum 10B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  \/ \//_____/ \/
33
  </pre>
34
 
35
+ # **Triangulum 10B GGUF: Multilingual Large Language Models (LLMs)**
36
 
37
  Triangulum 10B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
38
 
 
68
  device_map="auto",
69
  )
70
  messages = [
71
+ {"role": "system", "content": "You are the kind and tri-intelligent assistant helping people to understand complex concepts."},
72
  {"role": "user", "content": "Who are you?"},
73
  ]
74
  outputs = pipe(
 
77
  )
78
  print(outputs[0]["generated_text"][-1])
79
  ```
80
+ # **Demo Inference LlamaForCausalLM**
81
+ ```python
82
+ import torch
83
+ from transformers import AutoTokenizer, LlamaForCausalLM
84
+
85
+ # Load tokenizer and model
86
+ tokenizer = AutoTokenizer.from_pretrained('prithivMLmods/Triangulum-10B', trust_remote_code=True)
87
+ model = LlamaForCausalLM.from_pretrained(
88
+ "prithivMLmods/Triangulum-10B",
89
+ torch_dtype=torch.float16,
90
+ device_map="auto",
91
+ load_in_8bit=False,
92
+ load_in_4bit=True,
93
+ use_flash_attention_2=True
94
+ )
95
+
96
+ # Define a list of system and user prompts
97
+ prompts = [
98
+ """<|im_start|>system
99
+ You are the kind and tri-intelligent assistant helping people to understand complex concepts.<|im_end|>
100
+ <|im_start|>user
101
+ Can you explain the concept of eigenvalues and eigenvectors in a simple way?<|im_end|>
102
+ <|im_start|>assistant"""
103
+ ]
104
+
105
+ # Generate responses for each prompt
106
+ for chat in prompts:
107
+ print(f"Prompt:\n{chat}\n")
108
+ input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
109
+ generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
110
+ response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
111
+ print(f"Response:\n{response}\n{'-'*80}\n")
112
+ ```
113
+
114
+ # **Key Adjustments**
115
+ 1. **System Prompts:** Each prompt defines a different role or persona for the AI to adopt.
116
+ 2. **User Prompts:** These specify the context or task for the assistant, ranging from teaching to storytelling or career advice.
117
+ 3. **Looping Through Prompts:** Each prompt is processed in a loop to showcase the model's versatility.
118
+
119
+ You can expand the list of prompts to explore a variety of scenarios and responses.
120
  # **Use Cases**
121
 
122
  - Multilingual content generation
 
127
  # **Technical Details**
128
 
129
  Triangulum 10B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.
130
+
131
+ # **How to Run Triangulum 10B on Ollama Locally**
132
+
133
+ ```markdown
134
+ # How to Run Ollama Locally
135
+
136
+ This guide demonstrates the power of using open-source LLMs locally, showcasing examples with different open-source models for various use cases. By the end, you'll be equipped to run any future open-source LLM models with ease.
137
+
138
+ ---
139
+
140
+ ## Example 1: How to Run the Triangulum-10B Model
141
+
142
+ The **Triangulum-10B** model is an open-source LLM known for its capabilities across text-based tasks. We'll interact with it similarly to ChatGPT, but run it locally with support for quants.
143
+
144
+ ### Step 1: Download the Model
145
+
146
+ First, download the **Triangulum-10B-F16.gguf** model using the following command:
147
+
148
+ ```bash
149
+ ollama run triangulum-10b-f16.gguf
150
+ ```
151
+
152
+ ### Step 2: Model Initialization and Download
153
+
154
+ Upon running the command, Ollama will initialize and download the model files. You should see output similar to the following:
155
+
156
+ ```plaintext
157
+ pulling manifest
158
+ pulling 8934d96d3f08... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 3.8 GB
159
+ pulling 8c17c2ebb0ea... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 7.0 KB
160
+ pulling 7c23fb36d801... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 4.8 KB
161
+ pulling 2e0493f67d0c... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆοΏ½οΏ½β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 59 B
162
+ pulling fa304d675061... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 91 B
163
+ pulling 42ba7f8a01dd... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 557 B
164
+ verifying sha256 digest
165
+ writing manifest
166
+ removing any unused layers
167
+ success
168
+ >>> Send a message (/? for help)
169
+ ```
170
+
171
+ ### Step 3: Interact with the Model
172
+
173
+ Once the model is ready, you can send a prompt. For example, let's ask:
174
+
175
+ ```plaintext
176
+ >>> What can you do for me?
177
+ ```
178
+
179
+ If you asked the same question, you should receive a response like this:
180
+
181
+ ```plaintext
182
+ As a responsible AI language model, I am here to assist you with any questions or tasks you may have. Here are some examples of things I can help with:
183
+
184
+ 1. Answering questions: I can provide information on a wide range of topics, from science and technology to history and culture.
185
+ 2. Generating ideas: I can help you brainstorm ideas for creative projects, or provide suggestions for solving problems.
186
+ 3. Writing assistance: I can help you with writing tasks such as proofreading, editing, and suggesting alternative words or phrases.
187
+ 4. Translation: I can translate text from one language to another.
188
+ 5. Summarizing content: I can summarize long pieces of text, such as articles or documents, into shorter, more digestible versions.
189
+ 6. Creativity: I can help you generate creative ideas for stories, poems, or other forms of writing.
190
+ 7. Language learning: I can assist you in learning a new language by providing grammar explanations, vocabulary lists, and practice exercises.
191
+ 8. Chatting: I'm here to chat with you and provide a response to any question or topic you'd like to discuss.
192
+
193
+ Please let me know if there is anything specific you would like me to help you with.
194
+ ```
195
+
196
+ ### Step 4: Exit the Program
197
+
198
+ To exit the program, simply type:
199
+
200
+ ```plaintext
201
+ /exit
202
+ ```
203
+
204
+ ## Example 2: Running Multi-Modal Models (Future Use)
205
+
206
+ Ollama supports running multi-modal models where you can send images and ask questions based on them. This section will be updated as more models become available.
207
+
208
+ ## Notes on Using Quantized Models
209
+
210
+ Quantized models like **triangulum-10b-f16.gguf** are optimized for performance on resource-constrained hardware, making it accessible for local inference.
211
+
212
+ 1. Ensure your system has sufficient VRAM or CPU resources.
213
+ 2. Use the `.gguf` model format for compatibility with Ollama.
214
+
215
+ ## Conclusion
216
+
217
+ Running the **Triangulum-10B** model with Ollama provides a robust way to leverage open-source LLMs locally for diverse use cases. By following these steps, you can explore the capabilities of other open-source models in the future.