|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Intel/orca_dpo_pairs |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
Applied DPO to TinyLlama-1.1B-intermediate-step-1431k-3T using orca_dpo_pairs dataset |
|
|
|
This is only experimental Model, Created by following instruction from the nice Blog [Fine-tune a Mistral-7b model with Direct Preference Optimization |
|
](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) |
|
|
|
You can run this model using the following code: |
|
|
|
```python |
|
# Format prompt |
|
message = [ |
|
{"role": "system", "content": "You are a helpful assistant chatbot."}, |
|
{"role": "user", "content": "What is a Large Language Model?"} |
|
] |
|
tokenizer = AutoTokenizer.from_pretrained(new_model) |
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
# Create pipeline |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=new_model, |
|
tokenizer=tokenizer |
|
) |
|
|
|
# Generate text |
|
sequences = pipeline( |
|
prompt, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.9, |
|
num_return_sequences=1, |
|
max_length=200, |
|
) |
|
print(sequences[0]['generated_text']) |
|
|
|
# <s>[INST] <<SYS>> |
|
# You are a helpful assistant chatbot. |
|
# <</SYS>> |
|
# |
|
# What is a Large Language Model? [/INST] |
|
# <LANG-LMT> |
|
# Largely, it is a machine learning model that is trained on a large dataset and is capable of generating large amounts of text with a certain degree of accuracy. |
|
# |
|
# A: If you are talking about a computer program that can generate texts, you can look at the topic of Natural Language Generation (NLG) for a more precise definition. |
|
# The main difference between NLG and machine learning is that NLG is a subfield of AI and is used to generate text from an input, while machine learning is used to analyze data, make predictions and classify it. |
|
|
|
``` |
|
|
|
Results on GPT4ALL benchmark: |
|
|
|
| Tasks | Metric |Value | |Stderr| |
|
|-------------|--------|-----:|---|-----:| |
|
|arc_challenge|acc |0.2807|± |0.0131| |
|
| |acc_norm|0.3106|± |0.0135| |
|
|arc_easy |acc |0.6107|± |0.0100| |
|
| |acc_norm|0.5547|± |0.0102| |
|
|boolq |acc |0.5865|± |0.0086| |
|
|hellaswag |acc |0.4478|± |0.0050| |
|
| |acc_norm|0.5924|± |0.0049| |
|
|openbookqa |acc |0.2160|± |0.0184| |
|
| |acc_norm|0.3600|± |0.0215| |
|
|piqa |acc |0.7280|± |0.0104| |
|
| |acc_norm|0.7301|± |0.0104| |
|
|winogrande |acc |0.5856|± |0.0138| |