RealtimeAgent in a Swarm Orchestration
AG2 supports RealtimeAgent, a powerful agent type that connects seamlessly to OpenAI’s Realtime API. With RealtimeAgent, you can add voice interaction and listening capabilities to your swarms, enabling dynamic and natural communication.
AG2 provides an intuitive programming interface to build and orchestrate swarms of agents. With RealtimeAgent, you can enhance swarm functionality, integrating real-time interactions alongside task automation. Check the Documentation and Blog for further insights.
In this notebook, we implement OpenAI’s airline customer service example in AG2 using the RealtimeAgent for enhanced interaction.
Install AG2 with twilio dependencies
To use the realtime agent we will connect it to twilio service, this tutorial was inspired by twilio tutorial for connecting to OpenAPI real-time agent.
We have prepared a TwilioAdapter
to enable you to connect your
realtime agent to twilio service.
To be able to run this notebook, you will need to install ag2 with additional realtime and twilio dependencies.
Prepare your llm_config
and realtime_llm_config
The
config_list_from_json
function loads a list of configurations from an environment variable or
a json file.
Prompts & Utility Functions
The prompts and utility functions remain unchanged from the original example.
Define Agents and register functions
Register Handoffs
Now we register the handoffs for the agents. Note that you don’t need to
define the transfer functions and pass them in. Instead, you can
directly register the handoffs using the ON_CONDITION
class.
Before you start the server
To run uviconrn server inside the notebook, you will need to use nest_asyncio. This is because Jupyter uses the asyncio event loop, and uvicorn uses its own event loop. nest_asyncio will allow uvicorn to run in Jupyter.
Please install nest_asyncio by running the following cell.
Running the Code
Note: You may need to expose your machine to the internet through a tunnel, such as one provided by ngrok.
This code sets up the FastAPI server for the RealtimeAgent, enabling it to handle real-time voice interactions through Twilio. By executing this code, you’ll start the server and make it accessible for testing voice calls.
Here’s what happens when you run the code:
- Server Initialization: A FastAPI application is started, ready to process requests and WebSocket connections.
- Incoming Call Handling: The
/incoming-call
route processes incoming calls from Twilio, providing a TwiML response to connect the call to a real-time AI assistant. - WebSocket Integration: The
/media-stream
WebSocket endpoint bridges the connection between Twilio’s media stream and OpenAI’s Realtime API through the RealtimeAgent. - RealtimeAgent Configuration: The RealtimeAgent registers a swarm
of agents (e.g.,
triage_agent
,flight_modification
) to handle complex tasks during the call.
How to Execute
- Run the Code: Execute the provided code block in your Python environment (such as a Jupyter Notebook or directly in a Python script).
- Start the Server: The server will listen for requests on port
5050
. You can access the root URL (http://localhost:5050/
) to confirm the server is running. - Connect Twilio: Use a tool like ngrok to expose the server to the public internet, then configure Twilio to route calls to the public URL.
Once the server is running, you can connect a Twilio phone call to the AI assistant and test the RealtimeAgent’s capabilities!