Spaces:
Running
Running
Feature(LLMLingua): add note
Browse files
app.py
CHANGED
@@ -6,6 +6,9 @@ llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu")
|
|
6 |
INTRO = """
|
7 |
# LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
|
8 |
This is an early demo of the prompt compression method LLMLingua.
|
|
|
|
|
|
|
9 |
To use it, upload your prompt and set the compression target.
|
10 |
1. β
Set the different components of the prompt separately, including instruction, context, and question. Leave the corresponding field empty if a particular component does not exist.
|
11 |
- Question: This refers to the directives given by the user to the LLMs, such as inquiries, questions, or requests. Positioned after the instruction and context modules, the question module has a high sensitivity to compression.
|
|
|
6 |
INTRO = """
|
7 |
# LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
|
8 |
This is an early demo of the prompt compression method LLMLingua.
|
9 |
+
|
10 |
+
It should be noted that due to limited resources, we only provide the **GPT2-Small** size language model in this demo. Using the **LLaMA2-7B** as a small language model would result in a significant performance improvement, especially at high compression ratios.
|
11 |
+
|
12 |
To use it, upload your prompt and set the compression target.
|
13 |
1. β
Set the different components of the prompt separately, including instruction, context, and question. Leave the corresponding field empty if a particular component does not exist.
|
14 |
- Question: This refers to the directives given by the user to the LLMs, such as inquiries, questions, or requests. Positioned after the instruction and context modules, the question module has a high sensitivity to compression.
|