|
--- |
|
license: bsd |
|
datasets: |
|
- ManthanKulakarni/Text2JQL_v2 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- LLaMa |
|
- JQL |
|
- Jira |
|
- GGML |
|
- GGML-q8_0 |
|
- GPU |
|
- CPU |
|
- 7B |
|
- llama.cpp |
|
- text-generation-webui |
|
--- |
|
|
|
GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
|
|
## How to run in `llama.cpp` |
|
|
|
|
|
``` |
|
./main -t 10 -ngl 32 -m ggml-model-q8_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write JQL(Jira query Language) for give input ### Input: stories assigned to manthan which are created in last 10 days with highest priority and label is set to release ### Response:" |
|
``` |
|
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. |
|
|
|
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration. |
|
|
|
Tto have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins` |
|
|
|
## How to run in `text-generation-webui` |
|
|
|
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md). |
|
|
|
## How to run using `LangChain` |
|
|
|
##### Instalation on CPU |
|
``` |
|
pip install llama-cpp-python |
|
``` |
|
##### Instalation on GPU |
|
``` |
|
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python |
|
``` |
|
|
|
```python |
|
from langchain.llms import LlamaCpp |
|
from langchain import PromptTemplate, LLMChain |
|
from langchain.callbacks.manager import CallbackManager |
|
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler |
|
|
|
n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool. |
|
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU. |
|
n_ctx=2048 |
|
|
|
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) |
|
|
|
# Make sure the model path is correct for your system! |
|
llm = LlamaCpp( |
|
model_path="./ggml-model-q8_0.bin", |
|
n_gpu_layers=n_gpu_layers, n_batch=n_batch, |
|
callback_manager=callback_manager, |
|
verbose=True, |
|
n_ctx=n_ctx |
|
) |
|
|
|
llm("""### Instruction: |
|
Write JQL(Jira query Language) for give input |
|
|
|
### Input: |
|
stories assigned to manthan which are created in last 10 days with highest priority and label is set to release |
|
|
|
### Response:""") |
|
``` |
|
For more information refer [LangChain](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/llamacpp) |