abhinavkulkarni
/

psmathur-orca_mini_v2_13b-w4-g128-awq

@@ -1,27 +1,24 @@
 ---
-license: cc-by-nc-sa-4.0
 language:
 - en
-library_name: transformers
-pipeline_tag: text-generation
 tags:
-- Orca
 - AWQ
 inference: false
 ---
-# orca_mini_v2_13b
-An **Uncensored** LLaMA-13b model in collaboration with [Eric Hartford](https://huggingface.co/ehartford), trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches.
 This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
 ## Model Date
-July 8, 2023
 ## Model License
-Please refer to original Orca Mini v2 model license ([link](https://huggingface.co/psmathur/orca_mini_v2_13b)).
 Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
@@ -29,6 +26,8 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
 ## How to Use
 ```bash
@@ -47,7 +46,7 @@ from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
 from accelerate import init_empty_weights, load_checkpoint_and_dispatch
 from huggingface_hub import snapshot_download
-model_name = "psmathur/orca_mini_v2_13b"
 # Config
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
@@ -62,10 +61,10 @@ q_config = {
     "q_group_size": 128,
 }
-load_quant = snapshot_download('abhinavkulkarni/psmathur-orca_mini_v2_13b-w4-g128-awq')
 with init_empty_weights():
-    model = AutoModelForCausalLM.from_pretrained(model_name, config=config,
                                                  torch_dtype=torch.float16, trust_remote_code=True)
 real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
@@ -93,81 +92,49 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
 This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evaluation-harness).
-[orca_mini_v2_13b](https://huggingface.co/psmathur/orca_mini_v2_13b)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|
-|wikitext|      1|word_perplexity|23.8997|   |      |
-|        |       |byte_perplexity| 1.8104|   |      |
-|        |       |bits_per_byte  | 0.8563|   |      |
-[orca_mini_v2_13b (4-bit 128-group AWQ)](https://huggingface.co/abhinavkulkarni/psmathur-orca_mini_v2_13b-w4-g128-awq)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|
-|wikitext|      1|word_perplexity|27.4695|   |      |
-|        |       |byte_perplexity| 1.8581|   |      |
-|        |       |bits_per_byte  | 0.8938|   |      |
 ## Acknowledgements
-If you found `orca_mini_v2_13b` useful in your research or applications, please kindly cite using the following BibTeX:
-```
-@misc{orca_mini_v2_13b,
-  author = {Pankaj Mathur},
-  title = {orca_mini_v2_13b: An explain tuned LLaMA-13b model on uncensored wizardlm, alpaca, & dolly datasets},
-  year = {2023},
-  publisher = {GitHub, HuggingFace},
-  journal = {GitHub repository, HuggingFace repository},
-  howpublished = {\url{https://https://huggingface.co/psmathur/orca_mini_v2_13b},
-}
 ```
-```
-@software{touvron2023llama,
-  title={LLaMA: Open and Efficient Foundation Language Models},
-  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
-  journal={arXiv preprint arXiv:2302.13971},
-  year={2023}
 }
 ```
 ```
-@misc{openalpaca,
-  author = {Yixuan Su and Tian Lan and Deng Cai},
-  title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
-  year = {2023},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
 }
 ```
 ```
-@misc{alpaca,
-  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
-  title = {Stanford Alpaca: An Instruction-following LLaMA model},
-  year = {2023},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
-}
-```
-```
-@online{DatabricksBlog2023DollyV2,
-    author    = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin},
-    title     = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM},
-    year      = {2023},
-    url       = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm},
-    urldate   = {2023-06-30}
-}
-```
-```
-@misc{xu2023wizardlm,
-      title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
-      author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
-      year={2023},
-      eprint={2304.12244},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
 }
 ```

 ---
+license: cc
 language:
 - en
 tags:
 - AWQ
 inference: false
 ---
+# VMware/open-llama-13B-open-instruct (4-bit 128g AWQ Quantized)
+[Instruction-tuned version](https://huggingface.co/VMware/open-llama-13b-open-instruct) of the fully trained [Open LLama 13B](https://huggingface.co/openlm-research/open_llama_13b) model.
 This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
 ## Model Date
+July 5, 2023
 ## Model License
+Please refer to original MPT model license ([link](https://huggingface.co/VMware/open-llama-13b-open-instruct)).
 Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
+For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
 ## How to Use
 ```bash
 from accelerate import init_empty_weights, load_checkpoint_and_dispatch
 from huggingface_hub import snapshot_download
+model_name = "VMware/open-llama-13b-open-instruct"
 # Config
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
     "q_group_size": 128,
 }
+load_quant = snapshot_download('abhinavkulkarni/open-llama-13b-open-instruct-w4-g128-awq')
 with init_empty_weights():
+    model = AutoModelForCausalLM.from_config(config=config,
                                                  torch_dtype=torch.float16, trust_remote_code=True)
 real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
 This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evaluation-harness).
+[Open-LLaMA-13B-Instruct](https://huggingface.co/VMware/open-llama-13b-open-instruct)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|
+|wikitext|      1|word_perplexity|11.6564|   |      |
+|        |       |byte_perplexity| 1.5829|   |      |
+|        |       |bits_per_byte  | 0.6626|   |      |
+[Open-LLaMA-13B-Instruct (4-bit 128-group AWQ)](https://huggingface.co/abhinavkulkarni/open-llama-13b-open-instruct-w4-g128-awq)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|
+|wikitext|      1|word_perplexity|11.9652|   |      |
+|        |       |byte_perplexity| 1.5907|   |      |
+|        |       |bits_per_byte  | 0.6696|   |      |
 ## Acknowledgements
+If you found OpenLLaMA useful in your research or applications, please cite using the following BibTeX:
 ```
+@software{openlm2023openllama,
+  author = {Geng, Xinyang and Liu, Hao},
+  title = {OpenLLaMA: An Open Reproduction of LLaMA},
+  month = May,
+  year = 2023,
+  url = {https://github.com/openlm-research/open_llama}
 }
 ```
 ```
+@software{together2023redpajama,
+  author = {Together Computer},
+  title = {RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset},
+  month = April,
+  year = 2023,
+  url = {https://github.com/togethercomputer/RedPajama-Data}
 }
 ```
 ```
+@article{touvron2023llama,
+  title={Llama: Open and efficient foundation language models},
+  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and others},
+  journal={arXiv preprint arXiv:2302.13971},
+  year={2023}
 }
 ```