Setting up langchain and text-generation-webui with local LLMs

Setting up langchain and text-generation-webui with local LLMs

Abstractoid
Abstractoid

text-generation-webui with langchain

Langchain is a framework (or rather collection of useful classes) that can be used to build LLM-powered apps, mostly used around retrievel-based use cases. It has out-of-the box support for OpenAI but in this post I’ll connect it with local LLMs as they are getting better by every new release.

Though there are better alternatives like litellm and vllm, I’ll use text-generation-webui as it makes it simple to experiment with models. All of these tools partially or fully implement the same API spec that OpenAPI uses. First, start text-generation-webui with api extension,

python server.py --api --listen

It will listen API calls on port 5000 by default and you can change this by passing --api-port

Then load your model from the UI, I used openchat/openchat-3.5-1210. Now all you have to do is to import the packages and adjust your template according the to the model’s.

from langchain.chat_models import ChatOpenAI
from langchain.globals import set_debug
from langchain.prompts import ChatPromptTemplate

set_debug(True)

# This is crucial for open models. They may differ in prompt template
chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful AI bot."),
        ("human", "GPT4 Correct User: {question}<|end_of_turn|>"),
        ("ai", "GPT4 Correct Assistant:"),
    ]
)

# Adjust your base url accordingly, key is requred but useless
llm = ChatOpenAI(
    base_url="http://127.0.0.1:5000/v1", 
    openai_api_key="some_dummy_key",
    temperature=0.3,
    max_tokens=2048,
)

llm(chat_template.format_messages(question="What is the capital of Iceland?"))
AIMessage(content='The capital of Iceland is Reykjavik.')

That’s it! Now you can start building your apps with open models.

Using vllm

Here is how it can be done with vllm. First install vllm via pip install vllm. Then all you need to do is to run,

python -u -m vllm.entrypoints.openai.api_server \
       --host 0.0.0.0 \
       --model '/home/user/text-generation-webui/models/openchat-16k'

Depending on the location your model weights are stored, change --model '/home/user/text-generation-webui/models/. Also you need to ensure model name in your code matches the model parameter you started the vllm with,

llm = ChatOpenAI(
    base_url="http://127.0.0.1:8000/v1", 
    openai_api_key="some_dummy_key",
    temperature=0.6,
    max_tokens=2048,
    model="/home/user/text-generation-webui/models/openchat-16k"
)