munish0838 commited on
Commit
ad98d39
·
verified ·
1 Parent(s): 308bd2e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ license: other
4
+ base_model: internlm/internlm2-chat-1_8b
5
+ ---
6
+ # QuantFactory/internlm2-chat-1_8b-GGUF
7
+ This is quantized version of [internlm/internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) created using llama.cpp
8
+
9
+ # Model Description
10
+
11
+ <div align="center">
12
+
13
+ <img src="https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e" width="200"/>
14
+ <div>&nbsp;</div>
15
+ <div align="center">
16
+ <b><font size="5">InternLM</font></b>
17
+ <sup>
18
+ <a href="https://internlm.intern-ai.org.cn/">
19
+ <i><font size="4">HOT</font></i>
20
+ </a>
21
+ </sup>
22
+ <div>&nbsp;</div>
23
+ </div>
24
+ </div>
25
+
26
+
27
+ ## Introduction
28
+ InternLM2-1.8B is the 1.8 billion parameter version of the second generation InternLM series. In order to facilitate user use and research, InternLM2-1.8B has three versions of open-source models. They are:
29
+
30
+ - InternLM2-1.8B: Foundation models with high quality and high adaptation flexibility, which serve as a good starting point for downstream deep adaptations.
31
+ - InternLM2-Chat-1.8B-SFT: Chat model after supervised fine-tuning (SFT) on InternLM2-1.8B.
32
+ - InternLM2-Chat-1.8B: Further aligned on top of InternLM2-Chat-1.8B-SFT through online RLHF. InternLM2-Chat-1.8B exhibits better instruction following, chat experience, and function calling, which is recommended for downstream applications.
33
+
34
+ The InternLM2 has the following technical features:
35
+
36
+ - Effective support for ultra-long contexts of up to 200,000 characters: The model nearly perfectly achieves "finding a needle in a haystack" in long inputs of 200,000 characters. It also leads among open-source models in performance on long-text tasks such as LongBench and L-Eval.
37
+ - Comprehensive performance enhancement: Compared to the previous generation model, it shows significant improvements in various capabilities, including reasoning, mathematics, and coding.
38
+
39
+
40
+ ## InternLM2-1.8B
41
+
42
+ ### Performance Evaluation
43
+
44
+ We have evaluated InternLM2 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
45
+
46
+ | Dataset\Models | InternLM2-1.8B | InternLM2-Chat-1.8B-SFT | InternLM2-7B | InternLM2-Chat-7B |
47
+ | :---: | :---: | :---: | :---: | :---: |
48
+ | MMLU | 46.9 | 47.1 | 65.8 | 63.7 |
49
+ | AGIEval | 33.4 | 38.8 | 49.9 | 47.2 |
50
+ | BBH | 37.5 | 35.2 | 65.0 | 61.2 |
51
+ | GSM8K | 31.2 | 39.7 | 70.8 | 70.7 |
52
+ | MATH | 5.6 | 11.8 | 20.2 | 23.0 |
53
+ | HumanEval | 25.0 | 32.9 | 43.3 | 59.8 |
54
+ | MBPP(Sanitized) | 22.2 | 23.2 | 51.8 | 51.4 |
55
+
56
+
57
+ - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
58
+ - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
59
+
60
+
61
+
62
+ **Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
63
+
64
+ ### Import from Transformers
65
+
66
+ To load the InternLM2 1.8B Chat model using Transformers, use the following code:
67
+
68
+ ```python
69
+ import torch
70
+ from transformers import AutoTokenizer, AutoModelForCausalLM
71
+ tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-1_8b", trust_remote_code=True)
72
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
73
+ model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-1_8b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
74
+ model = model.eval()
75
+ response, history = model.chat(tokenizer, "hello", history=[])
76
+ print(response)
77
+ # Hello! How can I help you today?
78
+ response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
79
+ print(response)
80
+ ```
81
+
82
+ The responses can be streamed using `stream_chat`:
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import AutoModelForCausalLM, AutoTokenizer
87
+
88
+ model_path = "internlm/internlm2-chat-1_8b"
89
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
90
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
91
+
92
+ model = model.eval()
93
+ length = 0
94
+ for response, history in model.stream_chat(tokenizer, "Hello", history=[]):
95
+ print(response[length:], flush=True, end="")
96
+ length = len(response)
97
+ ```
98
+
99
+ ## Deployment
100
+
101
+ ### LMDeploy
102
+
103
+ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
104
+
105
+ ```bash
106
+ pip install lmdeploy
107
+ ```
108
+
109
+ You can run batch inference locally with the following python code:
110
+
111
+ ```python
112
+ import lmdeploy
113
+ pipe = lmdeploy.pipeline("internlm/internlm2-chat-1_8b")
114
+ response = pipe(["Hi, pls intro yourself", "Shanghai is"])
115
+ print(response)
116
+ ```
117
+
118
+ Or you can launch an OpenAI compatible server with the following command:
119
+
120
+ ```bash
121
+ lmdeploy serve api_server internlm/internlm2-chat-1_8b --model-name internlm2-chat-1_8b --server-port 23333
122
+ ```
123
+
124
+ Then you can send a chat request to the server:
125
+
126
+ ```bash
127
+ curl http://localhost:23333/v1/chat/completions \
128
+ -H "Content-Type: application/json" \
129
+ -d '{
130
+ "model": "internlm2-chat-1_8b",
131
+ "messages": [
132
+ {"role": "system", "content": "You are a helpful assistant."},
133
+ {"role": "user", "content": "Introduce deep learning to me."}
134
+ ]
135
+ }'
136
+ ```
137
+
138
+ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
139
+
140
+ ### vLLM
141
+
142
+ Launch OpenAI compatible server with `vLLM>=0.3.2`:
143
+
144
+ ```bash
145
+ pip install vllm
146
+ ```
147
+
148
+ ```bash
149
+ python -m vllm.entrypoints.openai.api_server --model internlm/internlm2-chat-1_8b --served-model-name internlm2-chat-1_8b --trust-remote-code
150
+ ```
151
+
152
+ Then you can send a chat request to the server:
153
+
154
+ ```bash
155
+ curl http://localhost:8000/v1/chat/completions \
156
+ -H "Content-Type: application/json" \
157
+ -d '{
158
+ "model": "internlm2-chat-1_8b",
159
+ "messages": [
160
+ {"role": "system", "content": "You are a helpful assistant."},
161
+ {"role": "user", "content": "Introduce deep learning to me."}
162
+ ]
163
+ }'
164
+ ```
165
+
166
+ Find more details in the [vLLM documentation](https://docs.vllm.ai/en/latest/index.html)
167
+
168
+ ## Open Source License
169
+
170
+ The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <[email protected]>.