doberst commited on
Commit
87aebf9
·
1 Parent(s): f52fe25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -10
README.md CHANGED
@@ -79,14 +79,14 @@ Any model can provide inaccurate or incomplete information, and should be used i
79
 
80
  The fastest way to get started with BLING is through direct import in transformers:
81
 
82
- from transformers import AutoTokenizer, AutoModelForCausalLM
83
- tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-1.3b-0.1")
84
- model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-1.3b-0.1")
85
 
86
 
87
  The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
88
 
89
- full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
90
 
91
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
92
 
@@ -95,7 +95,30 @@ The BLING model was fine-tuned with closed-context samples, which assume general
95
 
96
  To get the best results, package "my_prompt" as follows:
97
 
98
- my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
 
101
  ## Citation [optional]
@@ -111,8 +134,3 @@ This BLING model was built on top of a "Sheared Llama" model base - for more inf
111
  ## Model Card Contact
112
 
113
  Darren Oberst & llmware team
114
-
115
- Please reach out anytime if you are interested in this project!
116
-
117
-
118
-
 
79
 
80
  The fastest way to get started with BLING is through direct import in transformers:
81
 
82
+ from transformers import AutoTokenizer, AutoModelForCausalLM
83
+ tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-1.3b-0.1")
84
+ model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-1.3b-0.1")
85
 
86
 
87
  The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
88
 
89
+ full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
90
 
91
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
92
 
 
95
 
96
  To get the best results, package "my_prompt" as follows:
97
 
98
+ my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
99
+
100
+
101
+ If you are using a HuggingFace generation script:
102
+
103
+ # prepare prompt packaging used in fine-tuning process
104
+ new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
105
+
106
+ inputs = tokenizer(new_prompt, return_tensors="pt")
107
+ start_of_output = len(inputs.input_ids[0])
108
+
109
+ # temperature: set at 0.3 for consistency of output
110
+ # max_new_tokens: set at 100 - may prematurely stop a few of the summaries
111
+
112
+ outputs = model.generate(
113
+ inputs.input_ids.to(device),
114
+ eos_token_id=tokenizer.eos_token_id,
115
+ pad_token_id=tokenizer.eos_token_id,
116
+ do_sample=True,
117
+ temperature=0.3,
118
+ max_new_tokens=100,
119
+ )
120
+
121
+ output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
122
 
123
 
124
  ## Citation [optional]
 
134
  ## Model Card Contact
135
 
136
  Darren Oberst & llmware team