Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,45 @@ FastThink-0.5B-Tiny is a reasoning-focused model based on Qwen2.5. We have relea
|
|
23 |
|
24 |
**Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
# **Dataset Preparation**
|
27 |
|
28 |
This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
|
@@ -54,4 +93,23 @@ combined_dataset = combined_dataset.map(formatting_prompts_func, batched=True)
|
|
54 |
|
55 |
# Print the first few examples to verify the output
|
56 |
print(combined_dataset[:50000])
|
57 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
**Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
25 |
|
26 |
+
# **Quickstart with Transformer**
|
27 |
+
|
28 |
+
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
29 |
+
|
30 |
+
```python
|
31 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
32 |
+
|
33 |
+
model_name = "prithivMLmods/FastThink-0.5B-Tiny"
|
34 |
+
|
35 |
+
model = AutoModelForCausalLM.from_pretrained(
|
36 |
+
model_name,
|
37 |
+
torch_dtype="auto",
|
38 |
+
device_map="auto"
|
39 |
+
)
|
40 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
41 |
+
|
42 |
+
prompt = "Give me a short introduction to large language model."
|
43 |
+
messages = [
|
44 |
+
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
|
45 |
+
{"role": "user", "content": prompt}
|
46 |
+
]
|
47 |
+
text = tokenizer.apply_chat_template(
|
48 |
+
messages,
|
49 |
+
tokenize=False,
|
50 |
+
add_generation_prompt=True
|
51 |
+
)
|
52 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
53 |
+
|
54 |
+
generated_ids = model.generate(
|
55 |
+
**model_inputs,
|
56 |
+
max_new_tokens=512
|
57 |
+
)
|
58 |
+
generated_ids = [
|
59 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
60 |
+
]
|
61 |
+
|
62 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
63 |
+
```
|
64 |
+
|
65 |
# **Dataset Preparation**
|
66 |
|
67 |
This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
|
|
|
93 |
|
94 |
# Print the first few examples to verify the output
|
95 |
print(combined_dataset[:50000])
|
96 |
+
```
|
97 |
+
# **Intended Use**
|
98 |
+
1. **Reasoning Tasks**: FastThink-0.5B-Tiny is optimized for reasoning-focused applications, such as logical problem-solving, decision-making, and analytical workflows.
|
99 |
+
2. **Instruction Following**: Ideal for scenarios where precise adherence to instructions is required, including generating structured outputs like JSON or tables.
|
100 |
+
3. **Multilingual Support**: Suitable for use in multilingual environments, supporting over 29 languages, making it versatile for global applications.
|
101 |
+
4. **Coding and Mathematics**: Highly effective in tasks involving coding, debugging, or solving mathematical problems, leveraging expert domain knowledge.
|
102 |
+
5. **Role-play Scenarios**: Can simulate conversational agents or personas for role-playing, enhancing chatbot and virtual assistant implementations.
|
103 |
+
6. **Long-form Content Creation**: Designed to generate and manage long-form text (up to 8K tokens) while maintaining context, making it ideal for tasks like report writing or storytelling.
|
104 |
+
7. **Understanding and Processing Structured Data**: Efficient at interpreting and working with structured data, such as tables or hierarchical formats.
|
105 |
+
8. **Low-Resource Applications**: With a smaller parameter size (0.5B), it is well-suited for applications with limited computational resources or edge deployment.
|
106 |
+
|
107 |
+
# **Limitations**
|
108 |
+
1. **Limited Model Size**: As a 0.5B-parameter model, its reasoning and comprehension capabilities are less advanced compared to larger models, particularly for highly complex tasks.
|
109 |
+
2. **Contextual Limitations**: Although it supports a context length of up to 128K tokens, its ability to effectively utilize such a long context may vary, particularly in tasks requiring intricate cross-referencing of earlier inputs.
|
110 |
+
3. **Accuracy in Domain-Specific Tasks**: While capable in coding and mathematics, it may struggle with highly specialized or esoteric domain knowledge compared to models fine-tuned specifically for those areas.
|
111 |
+
4. **Ambiguity Handling**: May misinterpret vague or poorly structured prompts, leading to less accurate or unintended results.
|
112 |
+
5. **Long-Context Tradeoffs**: Generating or processing very long outputs (e.g., close to the 8K token limit) could result in decreased coherence or relevance toward the end.
|
113 |
+
6. **Multilingual Performance**: Although it supports 29 languages, its proficiency and fluency may vary across languages, with some underrepresented languages possibly seeing reduced performance.
|
114 |
+
7. **Resource-Intensive for Long Contexts**: Using its long-context capabilities (128K tokens) can be computationally demanding, requiring significant memory and processing power.
|
115 |
+
8. **Dependence on Fine-Tuning**: For highly specialized tasks or domains, additional fine-tuning may be necessary to achieve optimal performance.
|