mistral-7b-distilabel-truthy-dpo

mistral-7b-distilabel-truthy-dpo is a DPO fine-tuned version of mistralai/Mistral-7B-v0.1 using the mlabonne/distilabel-truthy-dpo-v0.1 dataset.

LoRA

r: 16
LoRA alpha: 16
LoRA dropout: 0.05

Training arguments

Batch size: 4
Gradient accumulation steps: 4
Optimizer: paged_adamw_32bit
Max steps: 100
Learning rate: 5e-05
Learning rate scheduler type: cosine
Beta: 0.1
Max prompt length: 1024
Max length: 1536

Downloads last month: 127

Safetensors

Model size

7.24B params

Tensor type

FP16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CorticalStack/mistral-7b-distilabel-truthy-dpo

Base model

mistralai/Mistral-7B-v0.1

Finetuned

(801)

this model

Quantizations

3 models

CorticalStack
/

mistral-7b-distilabel-truthy-dpo

mistral-7b-distilabel-truthy-dpo

LoRA

Training arguments

Model tree for CorticalStack/mistral-7b-distilabel-truthy-dpo

Spaces using CorticalStack/mistral-7b-distilabel-truthy-dpo 5