Model Card: Custom LLaMA-3 Model with 4-bit Quantization

Model Details

Architecture: LoRA (Low-Rank Adaptation)
Quantization: 4-bit

Model Description

This is a custom version of the LLaMA-3 language model trained with 4-bit quantization. The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, allowing for reduced memory usage and faster training times without significant loss in performance.

Training Configuration

The model was trained using the following configuration:

Learning Rate: 2e-4
Optimizer: AdamW (8-bit)
Weight Decay: 0.01
LR Scheduler: Linear
Mixed Precision: FP16/BF16 (depending on hardware support)

LoRA Configuration

The model uses LoRA for efficient parameter adaptation with the following settings:

Rank (r): 16
Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
LoRA Alpha: 16

Training Dataset

Dataset: Custom dataset containing Turkish text data
Max Sequence Length: 1024

Usage

To use this model, you can load it using the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized")
model = AutoModelForCausalLM.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized", low_cpu_mem_usage=True,  load_in_4bit=True)

prompt_format = """Aşağıda bir görevi tanımlayan bir talimat ve daha fazla bağlam sağlayan bir girdi bulunmaktadır. Talebi uygun şekilde tamamlayan bir yanıt yazın.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    prompt_format.format(
        """, # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512, do_sample=True, temperature=0.75, top_k=50, top_p=0.9, repetition_penalty=1.1)

Performance

Training Loss:: 1.385300
Evaluation Metrics: To be updated based on evaluation results
Limitations and Biases: This model inherits biases present in the training data. It is important to evaluate the model thoroughly for your specific use case and consider any ethical implications of its deployment.