--- license: mit language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B tags: - cot - r1 - deepseek - text --- # Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library. ## Model Details ### Model Description - **Model type:** Transformer-based Language Model - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` ### Direct Use This model is intended for research and practical applications where memory efficiency is critical. It can be used for: - Text generation - Language understanding tasks - Chatbots and conversational AI ### Downstream Use This model can be fine-tuned for specific tasks such as: - Sentiment analysis - Text classification - Summarization ### Out-of-Scope Use This model is not suitable for: - High-precision tasks requiring full 16-bit or 32-bit precision - Applications requiring extremely low latency ## Bias, Risks, and Limitations The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications. ### Recommendations Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case. ## How to Get Started with the Model Use the code below to get started with the model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import torch # Quantization configuration quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True ) # Load the model and tokenizer tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", quantization_config=quantization_config, device_map="auto", trust_remote_code=True ) # Generate text input_text = "Hello, how are you?" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True))