PhigRange-DPO

image/png PhigRange-DPO is a DPO fine-tuned of johnsnowlabs/PhigRange-2.7B-Slerp using the mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha preference dataset. The model has been trained for for 1080 steps.

πŸ† Evaluation results

Coming Soon

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "johnsnowlabs/PhigRange-DPO"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-04
  • train_batch_size: 1
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: AdamOptimizer32bit
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1080

Framework versions

  • Transformers 4.38.0.dev0
  • Pytorch 2.1.2+cu118
  • Datasets 2.17.0
  • Tokenizers 0.15.0
Downloads last month
23
Safetensors
Model size
2.78B params
Tensor type
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train johnsnowlabs/PhigRange-DPO