prithivMLmods commited on
Commit
c07fd94
·
verified ·
1 Parent(s): fc7cfd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -1
README.md CHANGED
@@ -23,6 +23,45 @@ FastThink-0.5B-Tiny is a reasoning-focused model based on Qwen2.5. We have relea
23
 
24
  **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  # **Dataset Preparation**
27
 
28
  This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
@@ -54,4 +93,23 @@ combined_dataset = combined_dataset.map(formatting_prompts_func, batched=True)
54
 
55
  # Print the first few examples to verify the output
56
  print(combined_dataset[:50000])
57
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
25
 
26
+ # **Quickstart with Transformer**
27
+
28
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ model_name = "prithivMLmods/FastThink-0.5B-Tiny"
34
+
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ model_name,
37
+ torch_dtype="auto",
38
+ device_map="auto"
39
+ )
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+
42
+ prompt = "Give me a short introduction to large language model."
43
+ messages = [
44
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
45
+ {"role": "user", "content": prompt}
46
+ ]
47
+ text = tokenizer.apply_chat_template(
48
+ messages,
49
+ tokenize=False,
50
+ add_generation_prompt=True
51
+ )
52
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
53
+
54
+ generated_ids = model.generate(
55
+ **model_inputs,
56
+ max_new_tokens=512
57
+ )
58
+ generated_ids = [
59
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
60
+ ]
61
+
62
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
63
+ ```
64
+
65
  # **Dataset Preparation**
66
 
67
  This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
 
93
 
94
  # Print the first few examples to verify the output
95
  print(combined_dataset[:50000])
96
+ ```
97
+ # **Intended Use**
98
+ 1. **Reasoning Tasks**: FastThink-0.5B-Tiny is optimized for reasoning-focused applications, such as logical problem-solving, decision-making, and analytical workflows.
99
+ 2. **Instruction Following**: Ideal for scenarios where precise adherence to instructions is required, including generating structured outputs like JSON or tables.
100
+ 3. **Multilingual Support**: Suitable for use in multilingual environments, supporting over 29 languages, making it versatile for global applications.
101
+ 4. **Coding and Mathematics**: Highly effective in tasks involving coding, debugging, or solving mathematical problems, leveraging expert domain knowledge.
102
+ 5. **Role-play Scenarios**: Can simulate conversational agents or personas for role-playing, enhancing chatbot and virtual assistant implementations.
103
+ 6. **Long-form Content Creation**: Designed to generate and manage long-form text (up to 8K tokens) while maintaining context, making it ideal for tasks like report writing or storytelling.
104
+ 7. **Understanding and Processing Structured Data**: Efficient at interpreting and working with structured data, such as tables or hierarchical formats.
105
+ 8. **Low-Resource Applications**: With a smaller parameter size (0.5B), it is well-suited for applications with limited computational resources or edge deployment.
106
+
107
+ # **Limitations**
108
+ 1. **Limited Model Size**: As a 0.5B-parameter model, its reasoning and comprehension capabilities are less advanced compared to larger models, particularly for highly complex tasks.
109
+ 2. **Contextual Limitations**: Although it supports a context length of up to 128K tokens, its ability to effectively utilize such a long context may vary, particularly in tasks requiring intricate cross-referencing of earlier inputs.
110
+ 3. **Accuracy in Domain-Specific Tasks**: While capable in coding and mathematics, it may struggle with highly specialized or esoteric domain knowledge compared to models fine-tuned specifically for those areas.
111
+ 4. **Ambiguity Handling**: May misinterpret vague or poorly structured prompts, leading to less accurate or unintended results.
112
+ 5. **Long-Context Tradeoffs**: Generating or processing very long outputs (e.g., close to the 8K token limit) could result in decreased coherence or relevance toward the end.
113
+ 6. **Multilingual Performance**: Although it supports 29 languages, its proficiency and fluency may vary across languages, with some underrepresented languages possibly seeing reduced performance.
114
+ 7. **Resource-Intensive for Long Contexts**: Using its long-context capabilities (128K tokens) can be computationally demanding, requiring significant memory and processing power.
115
+ 8. **Dependence on Fine-Tuning**: For highly specialized tasks or domains, additional fine-tuning may be necessary to achieve optimal performance.