Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit
This is a 4-bit quantized version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
model, optimized for efficient inference with reduced memory usage. The quantization was performed using the bitsandbytes
library.
Model Details
Model Description
- Model type: Transformer-based Language Model
- Language(s) (NLP): English
- License: MIT
- Finetuned from model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Direct Use
This model is intended for research and practical applications where memory efficiency is critical. It can be used for:
- Text generation
- Language understanding tasks
- Chatbots and conversational AI
Downstream Use
This model can be fine-tuned for specific tasks such as:
- Sentiment analysis
- Text classification
- Summarization
Out-of-Scope Use
This model is not suitable for:
- High-precision tasks requiring full 16-bit or 32-bit precision
- Applications requiring extremely low latency
Bias, Risks, and Limitations
The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.
Recommendations
Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Quantization configuration
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)
# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 0
Model tree for emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B