TL;DR:
RealtimeAgent
WebSocketAudioAdapter
: Stream audio directly from your browser using WebSockets.In our previous blog post, we introduced a way to interact with the RealtimeAgent
using TwilioAudioAdapter
. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we’re excited to introduce theWebSocketAudioAdapter
, a streamlined approach to real-time audio streaming directly via a web browser.
This post explores the features, benefits, and implementation of the WebSocketAudioAdapter
, showing how it transforms the way we connect with real-time agents.
WebSocketAudioAdapter
Previously introduced TwilioAudioAdapter
provides a robust way to connect to your RealtimeAgent
, but it comes with challenges:
The WebSocketAudioAdapter
eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms.
At its core, the WebSocketAudioAdapter
leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a RealtimeAgent
agent processes them.
Here’s a quick overview of its components and how they fit together:
WebSocket Connection:
Integration with FastAPI:
Powered by Realtime Agents:
RealtimeAgent
, allowing the agent to process audio inputs and respond intelligently.Unlike TwilioAudioAdapter
, the WebSocketAudioAdapter
requires no phone numbers, no telephony configuration, and no external accounts. It’s a plug-and-play solution.
By streaming audio over WebSockets, the adapter ensures low latency, making conversations feel natural and seamless.
Everything happens within the user’s browser, meaning no additional software is required. This makes it ideal for web applications.
Whether you’re building a chatbot, a voice assistant, or an interactive application, the adapter can integrate easily with existing frameworks and AI systems.
Let’s walk through a practical example where we use the WebSocketAudioAdapter
to create a voice-enabled weather bot.
You can find the full example here.
To run the demo example, follow these steps:
Create a OAI_CONFIG_LIST
file based on the provided OAI_CONFIG_LIST_sample
:
In the OAI_CONFIG_LIST file, update the api_key
to your OpenAI and/or Gemini API keys.
To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter:
Install the required Python packages using pip
:
Run the application with Uvicorn:
After you start the server you should see your application running in the logs:
Now you can simply open localhost:5050/start-chat in your browser, and dive into an interactive conversation with the RealtimeAgent
! 🎤✨
To get started, simply speak into your microphone and ask a question. For example, you can say:
“What’s the weather like in Seattle?”
This initial question will activate the agent, and it will respond, showcasing its ability to understand and interact with you in real time.
Let’s dive in and break down how this example works—from setting up the server to handling real-time audio streaming with WebSockets.
We use FastAPI to serve the chat interface and handle WebSocket connections. A key part is configuring the server to load and render HTML templates dynamically for the user interface.
Jinja2Templates
to load chat.html
from the templates
directory. The template is dynamically rendered with variables like the server’s port
.static
directory.The /media-stream
WebSocket route is where real-time audio interaction is processed and streamed to the AI assistant. Let’s break it down step-by-step:
/media-stream
. Using await websocket.accept()
, we ensure the connection is live and ready for communication.getLogger("uvicorn.error")
) is set up to monitor and debug the server’s activities, helping track events during the connection and interaction process.WebSocketAudioAdapter
The WebSocketAudioAdapter
bridges the client’s audio stream with the RealtimeAgent
. It streams audio data over WebSockets in real time, ensuring seamless communication between the browser and the agent.Configure the Realtime Agent
The RealtimeAgent
is the AI assistant driving the interaction. Key parameters include:
"Weather Bot"
.realtime_llm_config
for LLM settings.WebSocketAudioAdapter
for handling audio.Define a Custom Realtime Function
The get_weather
function is registered as a realtime callable function. When the user asks about the weather, the agent can call the function to get an accurate weather report and respond based on the provided information:
"The weather is cloudy."
for "Seattle"
."The weather is sunny."
for other locations.await realtime_agent.run()
method starts the agent, handling incoming audio streams, processing user queries, and responding in real time.Here is the full code for the /media-stream
endpoint:
The WebSocketAudioAdapter
marks a shift toward simpler, more accessible real-time audio solutions. It empowers developers to build and deploy voice applications faster and more efficiently. Whether you’re creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming.
Try it out and bring your voice-enabled ideas to life!
TL;DR:
RealtimeAgent
WebSocketAudioAdapter
: Stream audio directly from your browser using WebSockets.In our previous blog post, we introduced a way to interact with the RealtimeAgent
using TwilioAudioAdapter
. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we’re excited to introduce theWebSocketAudioAdapter
, a streamlined approach to real-time audio streaming directly via a web browser.
This post explores the features, benefits, and implementation of the WebSocketAudioAdapter
, showing how it transforms the way we connect with real-time agents.
WebSocketAudioAdapter
Previously introduced TwilioAudioAdapter
provides a robust way to connect to your RealtimeAgent
, but it comes with challenges:
The WebSocketAudioAdapter
eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms.
At its core, the WebSocketAudioAdapter
leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a RealtimeAgent
agent processes them.
Here’s a quick overview of its components and how they fit together:
WebSocket Connection:
Integration with FastAPI:
Powered by Realtime Agents:
RealtimeAgent
, allowing the agent to process audio inputs and respond intelligently.Unlike TwilioAudioAdapter
, the WebSocketAudioAdapter
requires no phone numbers, no telephony configuration, and no external accounts. It’s a plug-and-play solution.
By streaming audio over WebSockets, the adapter ensures low latency, making conversations feel natural and seamless.
Everything happens within the user’s browser, meaning no additional software is required. This makes it ideal for web applications.
Whether you’re building a chatbot, a voice assistant, or an interactive application, the adapter can integrate easily with existing frameworks and AI systems.
Let’s walk through a practical example where we use the WebSocketAudioAdapter
to create a voice-enabled weather bot.
You can find the full example here.
To run the demo example, follow these steps:
Create a OAI_CONFIG_LIST
file based on the provided OAI_CONFIG_LIST_sample
:
In the OAI_CONFIG_LIST file, update the api_key
to your OpenAI and/or Gemini API keys.
To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter:
Install the required Python packages using pip
:
Run the application with Uvicorn:
After you start the server you should see your application running in the logs:
Now you can simply open localhost:5050/start-chat in your browser, and dive into an interactive conversation with the RealtimeAgent
! 🎤✨
To get started, simply speak into your microphone and ask a question. For example, you can say:
“What’s the weather like in Seattle?”
This initial question will activate the agent, and it will respond, showcasing its ability to understand and interact with you in real time.
Let’s dive in and break down how this example works—from setting up the server to handling real-time audio streaming with WebSockets.
We use FastAPI to serve the chat interface and handle WebSocket connections. A key part is configuring the server to load and render HTML templates dynamically for the user interface.
Jinja2Templates
to load chat.html
from the templates
directory. The template is dynamically rendered with variables like the server’s port
.static
directory.The /media-stream
WebSocket route is where real-time audio interaction is processed and streamed to the AI assistant. Let’s break it down step-by-step:
/media-stream
. Using await websocket.accept()
, we ensure the connection is live and ready for communication.getLogger("uvicorn.error")
) is set up to monitor and debug the server’s activities, helping track events during the connection and interaction process.WebSocketAudioAdapter
The WebSocketAudioAdapter
bridges the client’s audio stream with the RealtimeAgent
. It streams audio data over WebSockets in real time, ensuring seamless communication between the browser and the agent.Configure the Realtime Agent
The RealtimeAgent
is the AI assistant driving the interaction. Key parameters include:
"Weather Bot"
.realtime_llm_config
for LLM settings.WebSocketAudioAdapter
for handling audio.Define a Custom Realtime Function
The get_weather
function is registered as a realtime callable function. When the user asks about the weather, the agent can call the function to get an accurate weather report and respond based on the provided information:
"The weather is cloudy."
for "Seattle"
."The weather is sunny."
for other locations.await realtime_agent.run()
method starts the agent, handling incoming audio streams, processing user queries, and responding in real time.Here is the full code for the /media-stream
endpoint:
The WebSocketAudioAdapter
marks a shift toward simpler, more accessible real-time audio solutions. It empowers developers to build and deploy voice applications faster and more efficiently. Whether you’re creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming.
Try it out and bring your voice-enabled ideas to life!