vLLM
vLLM is a locally run proxy and inference server, providing an OpenAI-compatible API. As it performs both the proxy and the inferencing, you don’t need to install an additional inference server.
Note: vLLM does not support OpenAI’s Function Calling (usable with AG2). However, it is in development and may be available by the time you read this.
Running this stack requires the installation of:
- AG2 (installation instructions)
- vLLM
Note: We recommend using a virtual environment for your stack.
Installing vLLM
In your terminal:
Choosing models
vLLM will download new models when you run the server.
The models are sourced from Hugging Face, a filtered list of Text Generation models is here and vLLM has a list of commonly used models. Use the full model name, e.g. mistralai/Mistral-7B-Instruct-v0.2
.
Chat Template
vLLM uses a pre-defined chat template, unless the model has a chat template defined in its config file on Hugging Face.
This can cause an issue if the chat template doesn’t allow 'role' : 'system'
messages, as used in AG2.
Therefore, we will create a chat template for the Mistral.AI Mistral 7B model we are using that allows roles of ‘user’, ‘assistant’, and ‘system’.
Create a file named ag2mistraltemplate.jinja
with the following content:
Chat Templates are specific to the model/model family. The example shown here is for Mistral-based models like Mistral 7B and Mixtral 8x7B.
vLLM has a number of example templates for models that can be a starting point for your chat template. Just remember, the template may need to be adjusted to support ‘system’ role messages.
Running vLLM proxy server
To run vLLM with the chosen model and our chat template, in your terminal:
By default, vLLM will run at ‘http://0.0.0.0:8000’.
Installing AG2 with OpenAI API support
Run the following command to install AG2 with the OpenAI package as vLLM supports the OpenAI API.
If you have been using autogen
or pyautogen
, all you need to do is upgrade it using:
or
as pyautogen
, autogen
, and ag2
are aliases for the same PyPI package.
Using vLLM with AG2
Now that we have the URL for the vLLM proxy server, you can use it within AG2 in the same way as OpenAI or cloud-based proxy servers.
As you are running this proxy server locally, no API key is required. As api_key
is a mandatory field for configurations within AG2 we put a dummy value in it, as per the example below.
Although we are specifying the model when running the vLLM command, we must still put it into the model
value for vLLM.