Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit

This is a 4-bit quantized version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model, optimized for efficient inference with reduced memory usage. The quantization was performed using the bitsandbytes library.

Model Details

Model Description

Model type: Transformer-based Language Model
Language(s) (NLP): English
License: MIT
Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Direct Use

This model is intended for research and practical applications where memory efficiency is critical. It can be used for:

Text generation
Language understanding tasks
Chatbots and conversational AI

Downstream Use

This model can be fine-tuned for specific tasks such as:

Sentiment analysis
Text classification
Summarization

Out-of-Scope Use

This model is not suitable for:

High-precision tasks requiring full 16-bit or 32-bit precision
Applications requiring extremely low latency

Bias, Risks, and Limitations

The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.

Recommendations

Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

emredeveloper
/

DeepSeek-R1-Distill-Qwen-1.5B-4bit