FastThink-1.5B-Tiny

FastThink-0.5B-Tiny is a reasoning-focused model based on Qwen2.5. We have released a range of base language models and instruction-tuned language models, spanning from 0.5 billion to 72 billion parameters. Qwen2.5 introduces the following improvements over Qwen2:

  • Significantly enhanced knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
  • Major improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and generating structured outputs, especially JSON. It is more resilient to diverse system prompts, enhancing role-play implementation and condition-setting for chatbots.
  • Long-context support for up to 128K tokens and the ability to generate outputs up to 8K tokens.
  • Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.

Quickstart with Transformer

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/FastThink-0.5B-Tiny"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Dataset Preparation

This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the datasets library to load and manipulate the datasets, and the chat_templates library to standardize the conversation format.

Example

# Load the initial three datasets
dataset1 = load_dataset("PowerInfer/LONGCOT-Refine-500K", split="train")
dataset2 = load_dataset("amphora/QwQ-LongCoT-130K", split="train")
dataset3 = load_dataset("AI-MO/NuminaMath-CoT", split="train")

# Map conversation columns for all datasets
dataset1 = dataset1.map(add_conversations_column, batched=False)
dataset2 = dataset2.map(add_conversations_column_prompt_qwq, batched=False)
dataset3 = dataset3.map(add_conversations_column_prompt_solution, batched=False)

# Combine all datasets
combined_dataset = concatenate_datasets([dataset1, dataset2, dataset3])

# Standardize using the ShareGPT format
combined_dataset = standardize_sharegpt(combined_dataset)

# Initialize the tokenizer with a specific chat template
tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")

# Apply formatting function to the combined dataset
combined_dataset = combined_dataset.map(formatting_prompts_func, batched=True)

# Print the first few examples to verify the output
print(combined_dataset[:50000])

Intended Use

  1. Reasoning Tasks: FastThink-0.5B-Tiny is optimized for reasoning-focused applications, such as logical problem-solving, decision-making, and analytical workflows.
  2. Instruction Following: Ideal for scenarios where precise adherence to instructions is required, including generating structured outputs like JSON or tables.
  3. Multilingual Support: Suitable for use in multilingual environments, supporting over 29 languages, making it versatile for global applications.
  4. Coding and Mathematics: Highly effective in tasks involving coding, debugging, or solving mathematical problems, leveraging expert domain knowledge.
  5. Role-play Scenarios: Can simulate conversational agents or personas for role-playing, enhancing chatbot and virtual assistant implementations.
  6. Long-form Content Creation: Designed to generate and manage long-form text (up to 8K tokens) while maintaining context, making it ideal for tasks like report writing or storytelling.
  7. Understanding and Processing Structured Data: Efficient at interpreting and working with structured data, such as tables or hierarchical formats.
  8. Low-Resource Applications: With a smaller parameter size (0.5B), it is well-suited for applications with limited computational resources or edge deployment.

Limitations

  1. Limited Model Size: As a 0.5B-parameter model, its reasoning and comprehension capabilities are less advanced compared to larger models, particularly for highly complex tasks.
  2. Contextual Limitations: Although it supports a context length of up to 128K tokens, its ability to effectively utilize such a long context may vary, particularly in tasks requiring intricate cross-referencing of earlier inputs.
  3. Accuracy in Domain-Specific Tasks: While capable in coding and mathematics, it may struggle with highly specialized or esoteric domain knowledge compared to models fine-tuned specifically for those areas.
  4. Ambiguity Handling: May misinterpret vague or poorly structured prompts, leading to less accurate or unintended results.
  5. Long-Context Tradeoffs: Generating or processing very long outputs (e.g., close to the 8K token limit) could result in decreased coherence or relevance toward the end.
  6. Multilingual Performance: Although it supports 29 languages, its proficiency and fluency may vary across languages, with some underrepresented languages possibly seeing reduced performance.
  7. Resource-Intensive for Long Contexts: Using its long-context capabilities (128K tokens) can be computationally demanding, requiring significant memory and processing power.
  8. Dependence on Fine-Tuning: For highly specialized tasks or domains, additional fine-tuning may be necessary to achieve optimal performance.
Downloads last month
22
Safetensors
Model size
494M params
Tensor type
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for prithivMLmods/FastThink-0.5B-Tiny

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(156)
this model
Quantizations
2 models

Space using prithivMLmods/FastThink-0.5B-Tiny 1

Collection including prithivMLmods/FastThink-0.5B-Tiny