Use AutoGen for Local LLMs
TL;DR: We demonstrate how to use autogen for local LLM application. As an example, we will initiate an endpoint using FastChat and perform inference on ChatGLMv2-6b.
Preparations
Clone FastChat
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.
Download checkpoint
ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. ChatGLM2-6B is its second-generation version.
Before downloading from HuggingFace Hub, you need to have Git LFS installed.
Initiate server
First, launch the controller
Then, launch the model worker(s)
Finally, launch the RESTful API server
Normally this will work. However, if you encounter error like this, commenting out all the lines containing finish_reason
in fastchat/protocol/api_protocol.py
and fastchat/protocol/openai_api_protocol.py
will fix the problem. The modified code looks like:
Interact with model using oai.Completion
(requires openai1)
Now the models can be directly accessed through openai-python library as well as autogen.oai.Completion
and autogen.oai.ChatCompletion
.
If you would like to switch to different models, download their checkpoints and specify model path when launching model worker(s).
interacting with multiple local LLMs
If you would like to interact with multiple LLMs on your local machine, replace the model_worker
step above with a multi model variant:
The inference code would be:
For Further Reading
- Documentation about
autogen
. - Documentation about FastChat.