korywat commited on
Commit
313dcbe
·
verified ·
1 Parent(s): 2f45eef

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -246
README.md CHANGED
@@ -17,11 +17,9 @@ tags:
17
 
18
  The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
19
 
20
- This model is an implementation of Mistral-7B-Instruct-v0_3 found [here]({source_repo}).
21
- This repository provides scripts to run Mistral-7B-Instruct-v0_3 on Qualcomm® devices.
22
- More details on model performance across various devices, can be found
23
- [here](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized).
24
-
25
 
26
  ### Model Details
27
 
@@ -51,243 +49,6 @@ More details on model performance across various devices, can be found
51
 
52
 
53
 
54
- ## Installation
55
-
56
- This model can be installed as a Python package via pip.
57
-
58
- ```bash
59
- pip install qai-hub-models
60
- ```
61
-
62
-
63
- ## Configure Qualcomm® AI Hub to run this model on a cloud-hosted device
64
-
65
- Sign-in to [Qualcomm® AI Hub](https://app.aihub.qualcomm.com/) with your
66
- Qualcomm® ID. Once signed in navigate to `Account -> Settings -> API Token`.
67
-
68
- With this API token, you can configure your client to run models on the cloud
69
- hosted devices.
70
- ```bash
71
- qai-hub configure --api_token API_TOKEN
72
- ```
73
- Navigate to [docs](https://app.aihub.qualcomm.com/docs/) for more information.
74
-
75
-
76
-
77
- ## Demo on-device
78
-
79
- The package contains a simple end-to-end demo that downloads pre-trained
80
- weights and runs this model on a sample input.
81
-
82
- ```bash
83
- python -m qai_hub_models.models.mistral_7b_instruct_v0_3_quantized.demo
84
- ```
85
-
86
- The above demo runs a reference implementation of pre-processing, model
87
- inference, and post processing.
88
-
89
- **NOTE**: If you want running in a Jupyter Notebook or Google Colab like
90
- environment, please add the following to your cell (instead of the above).
91
- ```
92
- %run -m qai_hub_models.models.mistral_7b_instruct_v0_3_quantized.demo
93
- ```
94
-
95
-
96
- ### Run model on a cloud-hosted device
97
-
98
- In addition to the demo, you can also run the model on a cloud-hosted Qualcomm®
99
- device. This script does the following:
100
- * Performance check on-device on a cloud-hosted device
101
- * Downloads compiled assets that can be deployed on-device for Android.
102
- * Accuracy check between PyTorch and on-device outputs.
103
-
104
- ```bash
105
- python -m qai_hub_models.models.mistral_7b_instruct_v0_3_quantized.export
106
- ```
107
- ```
108
- Profiling Results
109
- ------------------------------------------------------------
110
-
111
- Device : Snapdragon 8 Elite QRD (15)
112
- Runtime : QNN
113
- Response Rate (Tokens/Second): 10.73
114
- Time to First Token (Seconds): (0.18, 5.79)
115
- ```
116
-
117
-
118
- ## How does this work?
119
-
120
- This [export script](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized/qai_hub_models/models/Mistral-7B-Instruct-v0_3/export.py)
121
- leverages [Qualcomm® AI Hub](https://aihub.qualcomm.com/) to optimize, validate, and deploy this model
122
- on-device. Lets go through each step below in detail:
123
-
124
- Step 1: **Upload compiled model**
125
-
126
- Upload compiled models from `qai_hub_models.models.mistral_7b_instruct_v0_3_quantized` on hub.
127
- ```python
128
- import torch
129
-
130
- import qai_hub as hub
131
- from qai_hub_models.models.mistral_7b_instruct_v0_3_quantized import Model
132
-
133
- # Load the model
134
- model = Model.from_precompiled()
135
-
136
- model_promptprocessor_part1 = hub.upload_model(model.prompt_processor_part1.get_target_model_path())
137
- model_promptprocessor_part2 = hub.upload_model(model.prompt_processor_part2.get_target_model_path())
138
- model_promptprocessor_part3 = hub.upload_model(model.prompt_processor_part3.get_target_model_path())
139
- model_promptprocessor_part4 = hub.upload_model(model.prompt_processor_part4.get_target_model_path())
140
- model_tokengenerator_part1 = hub.upload_model(model.token_generator_part1.get_target_model_path())
141
- model_tokengenerator_part2 = hub.upload_model(model.token_generator_part2.get_target_model_path())
142
- model_tokengenerator_part3 = hub.upload_model(model.token_generator_part3.get_target_model_path())
143
- model_tokengenerator_part4 = hub.upload_model(model.token_generator_part4.get_target_model_path())
144
- ```
145
-
146
-
147
- Step 2: **Performance profiling on cloud-hosted device**
148
-
149
- After uploading compiled models from step 1. Models can be profiled model on-device using the
150
- `target_model`. Note that this scripts runs the model on a device automatically
151
- provisioned in the cloud. Once the job is submitted, you can navigate to a
152
- provided job URL to view a variety of on-device performance metrics.
153
- ```python
154
-
155
- # Device
156
- device = hub.Device("Samsung Galaxy S23")
157
- profile_job_promptprocessor_part1 = hub.submit_profile_job(
158
- model=model_promptprocessor_part1,
159
- device=device,
160
- )
161
- profile_job_promptprocessor_part2 = hub.submit_profile_job(
162
- model=model_promptprocessor_part2,
163
- device=device,
164
- )
165
- profile_job_promptprocessor_part3 = hub.submit_profile_job(
166
- model=model_promptprocessor_part3,
167
- device=device,
168
- )
169
- profile_job_promptprocessor_part4 = hub.submit_profile_job(
170
- model=model_promptprocessor_part4,
171
- device=device,
172
- )
173
- profile_job_tokengenerator_part1 = hub.submit_profile_job(
174
- model=model_tokengenerator_part1,
175
- device=device,
176
- )
177
- profile_job_tokengenerator_part2 = hub.submit_profile_job(
178
- model=model_tokengenerator_part2,
179
- device=device,
180
- )
181
- profile_job_tokengenerator_part3 = hub.submit_profile_job(
182
- model=model_tokengenerator_part3,
183
- device=device,
184
- )
185
- profile_job_tokengenerator_part4 = hub.submit_profile_job(
186
- model=model_tokengenerator_part4,
187
- device=device,
188
- )
189
-
190
- ```
191
-
192
- Step 3: **Verify on-device accuracy**
193
-
194
- To verify the accuracy of the model on-device, you can run on-device inference
195
- on sample input data on the same cloud hosted device.
196
- ```python
197
-
198
- input_data_promptprocessor_part1 = model.prompt_processor_part1.sample_inputs()
199
- inference_job_promptprocessor_part1 = hub.submit_inference_job(
200
- model=model_promptprocessor_part1,
201
- device=device,
202
- inputs=input_data_promptprocessor_part1,
203
- )
204
- on_device_output_promptprocessor_part1 = inference_job_promptprocessor_part1.download_output_data()
205
-
206
- input_data_promptprocessor_part2 = model.prompt_processor_part2.sample_inputs()
207
- inference_job_promptprocessor_part2 = hub.submit_inference_job(
208
- model=model_promptprocessor_part2,
209
- device=device,
210
- inputs=input_data_promptprocessor_part2,
211
- )
212
- on_device_output_promptprocessor_part2 = inference_job_promptprocessor_part2.download_output_data()
213
-
214
- input_data_promptprocessor_part3 = model.prompt_processor_part3.sample_inputs()
215
- inference_job_promptprocessor_part3 = hub.submit_inference_job(
216
- model=model_promptprocessor_part3,
217
- device=device,
218
- inputs=input_data_promptprocessor_part3,
219
- )
220
- on_device_output_promptprocessor_part3 = inference_job_promptprocessor_part3.download_output_data()
221
-
222
- input_data_promptprocessor_part4 = model.prompt_processor_part4.sample_inputs()
223
- inference_job_promptprocessor_part4 = hub.submit_inference_job(
224
- model=model_promptprocessor_part4,
225
- device=device,
226
- inputs=input_data_promptprocessor_part4,
227
- )
228
- on_device_output_promptprocessor_part4 = inference_job_promptprocessor_part4.download_output_data()
229
-
230
- input_data_tokengenerator_part1 = model.token_generator_part1.sample_inputs()
231
- inference_job_tokengenerator_part1 = hub.submit_inference_job(
232
- model=model_tokengenerator_part1,
233
- device=device,
234
- inputs=input_data_tokengenerator_part1,
235
- )
236
- on_device_output_tokengenerator_part1 = inference_job_tokengenerator_part1.download_output_data()
237
-
238
- input_data_tokengenerator_part2 = model.token_generator_part2.sample_inputs()
239
- inference_job_tokengenerator_part2 = hub.submit_inference_job(
240
- model=model_tokengenerator_part2,
241
- device=device,
242
- inputs=input_data_tokengenerator_part2,
243
- )
244
- on_device_output_tokengenerator_part2 = inference_job_tokengenerator_part2.download_output_data()
245
-
246
- input_data_tokengenerator_part3 = model.token_generator_part3.sample_inputs()
247
- inference_job_tokengenerator_part3 = hub.submit_inference_job(
248
- model=model_tokengenerator_part3,
249
- device=device,
250
- inputs=input_data_tokengenerator_part3,
251
- )
252
- on_device_output_tokengenerator_part3 = inference_job_tokengenerator_part3.download_output_data()
253
-
254
- input_data_tokengenerator_part4 = model.token_generator_part4.sample_inputs()
255
- inference_job_tokengenerator_part4 = hub.submit_inference_job(
256
- model=model_tokengenerator_part4,
257
- device=device,
258
- inputs=input_data_tokengenerator_part4,
259
- )
260
- on_device_output_tokengenerator_part4 = inference_job_tokengenerator_part4.download_output_data()
261
-
262
- ```
263
- With the output of the model, you can compute like PSNR, relative errors or
264
- spot check the output with expected output.
265
-
266
- **Note**: This on-device profiling and inference requires access to Qualcomm®
267
- AI Hub. [Sign up for access](https://myaccount.qualcomm.com/signup).
268
-
269
-
270
-
271
-
272
- ## Deploying compiled model to Android
273
-
274
-
275
- The models can be deployed using multiple runtimes:
276
- - TensorFlow Lite (`.tflite` export): [This
277
- tutorial](https://www.tensorflow.org/lite/android/quickstart) provides a
278
- guide to deploy the .tflite model in an Android application.
279
-
280
-
281
- - QNN ( `.so` / `.bin` export ): This [sample
282
- app](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/sample_app.html)
283
- provides instructions on how to use the `.so` shared library or `.bin` context binary in an Android application.
284
-
285
-
286
- ## View on Qualcomm® AI Hub
287
- Get more details on Mistral-7B-Instruct-v0_3's performance across various devices [here](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized).
288
- Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
289
-
290
-
291
  ## License
292
  * The license for the original implementation of Mistral-7B-Instruct-v0_3 can be found [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE).
293
  * The license for the compiled assets for on-device deployment can be found [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE)
@@ -301,10 +62,9 @@ Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
301
 
302
 
303
  ## Community
304
- * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
305
  * For questions or feedback please [reach out to us](mailto:[email protected]).
306
 
307
-
308
  ## Usage and Limitations
309
 
310
  Model may not be used for or in connection with any of the following applications:
@@ -325,5 +85,3 @@ Model may not be used for or in connection with any of the following application
325
  - Recommender systems of social media platforms;
326
  - Scraping of facial images (from the internet or otherwise); and/or
327
  - Subliminal manipulation
328
-
329
-
 
17
 
18
  The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
19
 
20
+ This is based on the implementation of Mistral-7B-Instruct-v0_3 found
21
+ [here]({source_repo}). More details on model performance
22
+ accross various devices, can be found [here](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized).
 
 
23
 
24
  ### Model Details
25
 
 
49
 
50
 
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## License
53
  * The license for the original implementation of Mistral-7B-Instruct-v0_3 can be found [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE).
54
  * The license for the compiled assets for on-device deployment can be found [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE)
 
62
 
63
 
64
  ## Community
65
+ * Join [our AI Hub Slack community](https://qualcomm-ai-hub.slack.com/join/shared_invite/zt-2d5zsmas3-Sj0Q9TzslueCjS31eXG2UA#/shared-invite/email) to collaborate, post questions and learn more about on-device AI.
66
  * For questions or feedback please [reach out to us](mailto:[email protected]).
67
 
 
68
  ## Usage and Limitations
69
 
70
  Model may not be used for or in connection with any of the following applications:
 
85
  - Recommender systems of social media platforms;
86
  - Scraping of facial images (from the internet or otherwise); and/or
87
  - Subliminal manipulation