|
# Local LangChain with FastChat |
|
|
|
[LangChain](https://python.langchain.com/en/latest/index.html) is a library that facilitates the development of applications by leveraging large language models (LLMs) and enabling their composition with other sources of computation or knowledge. |
|
FastChat's OpenAI-compatible [API server](openai_api.md) enables using LangChain with open models seamlessly. |
|
|
|
## Launch RESTful API Server |
|
|
|
Here are the steps to launch a local OpenAI API server for LangChain. |
|
|
|
First, launch the controller |
|
|
|
```bash |
|
python3 -m fastchat.serve.controller |
|
``` |
|
|
|
LangChain uses OpenAI model names by default, so we need to assign some faux OpenAI model names to our local model. |
|
Here, we use Vicuna as an example and use it for three endpoints: chat completion, completion, and embedding. |
|
`--model-path` can be a local folder or a Hugging Face repo name. |
|
See a full list of supported models [here](../README.md#supported-models). |
|
|
|
```bash |
|
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5 |
|
``` |
|
|
|
Finally, launch the RESTful API server |
|
|
|
```bash |
|
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000 |
|
``` |
|
|
|
## Set OpenAI Environment |
|
|
|
You can set your environment with the following commands. |
|
|
|
Set OpenAI base url |
|
|
|
```bash |
|
export OPENAI_API_BASE=http://localhost:8000/v1 |
|
``` |
|
|
|
Set OpenAI API key |
|
|
|
```bash |
|
export OPENAI_API_KEY=EMPTY |
|
``` |
|
|
|
If you meet the following OOM error while creating embeddings, please set a smaller batch size by using environment variables. |
|
|
|
~~~bash |
|
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\\n\\n(CUDA out of memory. Tried to allocate xxx MiB (GPU 0; xxx GiB total capacity; xxx GiB already allocated; xxx MiB free; xxx GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400) |
|
~~~ |
|
|
|
You can try `export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1`. |
|
|
|
## Try local LangChain |
|
|
|
Here is a question answerting example. |
|
|
|
Download a text file. |
|
|
|
```bash |
|
wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt |
|
``` |
|
|
|
Run LangChain. |
|
|
|
~~~py |
|
from langchain.chat_models import ChatOpenAI |
|
from langchain.document_loaders import TextLoader |
|
from langchain.embeddings import OpenAIEmbeddings |
|
from langchain.indexes import VectorstoreIndexCreator |
|
|
|
embedding = OpenAIEmbeddings(model="text-embedding-ada-002") |
|
loader = TextLoader("state_of_the_union.txt") |
|
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader]) |
|
llm = ChatOpenAI(model="gpt-3.5-turbo") |
|
|
|
questions = [ |
|
"Who is the speaker", |
|
"What did the president say about Ketanji Brown Jackson", |
|
"What are the threats to America", |
|
"Who are mentioned in the speech", |
|
"Who is the vice president", |
|
"How many projects were announced", |
|
] |
|
|
|
for query in questions: |
|
print("Query:", query) |
|
print("Answer:", index.query(query, llm=llm)) |
|
~~~ |
|
|