Model Card: Custom LLaMA-3 Model with 4-bit Quantization

Model Details

  • Architecture: LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit

Model Description

This is a custom version of the LLaMA-3 language model trained with 4-bit quantization. The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, allowing for reduced memory usage and faster training times without significant loss in performance.

Training Configuration

The model was trained using the following configuration:

  • Learning Rate: 2e-4
  • Optimizer: AdamW (8-bit)
  • Weight Decay: 0.01
  • LR Scheduler: Linear
  • Mixed Precision: FP16/BF16 (depending on hardware support)

LoRA Configuration

The model uses LoRA for efficient parameter adaptation with the following settings:

  • Rank (r): 16
  • Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
  • LoRA Alpha: 16

Training Dataset

  • Dataset: Custom dataset containing Turkish text data
  • Max Sequence Length: 1024

Usage

To use this model, you can load it using the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized")
model = AutoModelForCausalLM.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized", low_cpu_mem_usage=True,  load_in_4bit=True)

prompt_format = """Aşağıda bir görevi tanımlayan bir talimat ve daha fazla bağlam sağlayan bir girdi bulunmaktadır. Talebi uygun şekilde tamamlayan bir yanıt yazın.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    prompt_format.format(
        """, # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512, do_sample=True, temperature=0.75, top_k=50, top_p=0.9, repetition_penalty=1.1)

Performance

  • Training Loss:: 1.385300
  • Evaluation Metrics: To be updated based on evaluation results
  • Limitations and Biases: This model inherits biases present in the training data. It is important to evaluate the model thoroughly for your specific use case and consider any ethical implications of its deployment.
Downloads last month
2
Safetensors
Model size
4.65B params
Tensor type
FP16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support