Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit

This is a 4-bit quantized version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model, optimized for efficient inference with reduced memory usage. The quantization was performed using the bitsandbytes library.

Model Details

Model Description

  • Model type: Transformer-based Language Model
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Direct Use

This model is intended for research and practical applications where memory efficiency is critical. It can be used for:

  • Text generation
  • Language understanding tasks
  • Chatbots and conversational AI

Downstream Use

This model can be fine-tuned for specific tasks such as:

  • Sentiment analysis
  • Text classification
  • Summarization

Out-of-Scope Use

This model is not suitable for:

  • High-precision tasks requiring full 16-bit or 32-bit precision
  • Applications requiring extremely low latency

Bias, Risks, and Limitations

The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.

Recommendations

Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
0
Safetensors
Model size
1.14B params
Tensor type
F32
·
FP16
·
U8
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit

Quantized
(60)
this model