JacopoAbate commited on
Commit
011719d
1 Parent(s): ed0684a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - ppo
8
+ - phi3
9
+ - chatml
10
+ datasets: argilla/ultrafeedback-binarized-preferences-cleaned
11
+ metrics:
12
+ - hellaswag
13
+ - arc_challenge
14
+ - m_mmlu 5 shot
15
+ ---
16
+
17
+ # Model Information
18
+
19
+
20
+ Phi-3-mini-128k-instruct-PPO is an updated version of [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct), aligned with PPO.
21
+
22
+ - It's trained on [ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned).
23
+
24
+ # Evaluation
25
+
26
+ We evaluated the model using the same test sets as used for the [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
27
+
28
+ | hellaswag acc_norm | arc_challenge acc_norm | m_mmlu 5-shot acc | Average |
29
+ |:----------------------| :--------------- | :-------------------- | :------- |
30
+ | 0.7621 | 0.5375 | 0.6824 | 0.6606 |
31
+
32
+
33
+ ## Usage
34
+
35
+ Be sure to install these dependencies before running the program
36
+
37
+ ```python
38
+ !pip install transformers torch sentencepiece
39
+ ```
40
+
41
+ ```python
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
+
44
+ device = "cpu" # if you want to use the gpu make sure to have cuda toolkit installed and change this to "cuda"
45
+
46
+ model = AutoModelForCausalLM.from_pretrained("MoxoffSpA/Phi-3-mini-128k-instruct-PPO")
47
+ tokenizer = AutoTokenizer.from_pretrained("MoxoffSpA/Phi-3-mini-128k-instruct-PPO")
48
+
49
+ question = """Quanto è alta la torre di Pisa?"""
50
+ context = """
51
+ La Torre di Pisa è un campanile del XII secolo, famoso per la sua inclinazione. Alta circa 56 metri.
52
+ """
53
+
54
+ prompt = f"Domanda: {question}, contesto: {context}"
55
+
56
+ messages = [
57
+ {"role": "user", "content": prompt}
58
+ ]
59
+
60
+ encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
61
+
62
+ model_inputs = encodeds.to(device)
63
+ model.to(device)
64
+
65
+ generated_ids = model.generate(
66
+ model_inputs, # The input to the model
67
+ max_new_tokens=128, # Limiting the maximum number of new tokens generated
68
+ do_sample=True, # Enabling sampling to introduce randomness in the generation
69
+ temperature=0.1, # Setting temperature to control the randomness, lower values make it more deterministic
70
+ top_p=0.95, # Using nucleus sampling with top-p filtering for more coherent generation
71
+ eos_token_id=tokenizer.eos_token_id # Specifying the token that indicates the end of a sequence
72
+ )
73
+
74
+ decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
75
+ trimmed_output = decoded_output.strip()
76
+ print(trimmed_output)
77
+ ```
78
+
79
+ ## Bias, Risks and Limitations
80
+
81
+ Phi-3-mini-128k-instruct-PPO has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of
82
+ responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition
83
+ of the corpus was used to train the base model, however it is likely to have included a mix of Web data and technical sources
84
+ like books and code.
85
+
86
+ ## Links to resources
87
+
88
+ - ultrafeedback-binarized-preferences-cleaned dataset: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned
89
+ - Phi-3-mini-128k-instruct model: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
90
+ - Open LLM Leaderbord: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
91
+
92
+ ## The Moxoff Team
93
+
94
+ Jacopo Abate, Marco D'Ambra, Dario Domanin, Luigi Simeone, Gianpaolo Francesco Trotta