Use Cases
- Use cases
- Notebooks
- All Notebooks
- Websockets: Streaming input and output using websockets
- Perplexity Search Tool
- Auto Generated Agent Chat: Task Solving with Code Generation, Execution, Debugging & Human Feedback
- Using a local Telemetry server to monitor a GraphRAG agent
- Auto Generated Agent Chat: Task Solving with Provided Tools as Functions
- RealtimeAgent with gemini client
- Agentic RAG workflow on tabular data from a PDF file
- Language Agent Tree Search
- Config loader utility functions
- Wikipedia Agent
- Google Drive Tools
- FSM - User can input speaker transition constraints
- DeepResearchAgent
- Group Chat with Retrieval Augmented Generation
- AgentOptimizer: An Agentic Way to Train Your LLM Agent
- Using RetrieveChat with Qdrant for Retrieve Augmented Code Generation and Question Answering
- RealtimeAgent with WebRTC connection
- Agent with memory using Mem0
- Preprocessing Chat History with `TransformMessages`
- RealtimeAgent in a Swarm Orchestration
- RAG OpenAI Assistants in AG2
- Using Guidance with AG2
- Using RetrieveChat Powered by MongoDB Atlas for Retrieve Augmented Code Generation and Question Answering
- Using Neo4j's graph database with AG2 agents for Question & Answering
- Use AG2 in Databricks with DBRX
- Wikipedia Search Tools
- Solving Complex Tasks with Nested Chats
- DeepSeek: Adding Browsing Capabilities to AG2
- Solving Complex Tasks with A Sequence of Nested Chats
- Nested Chats for Tool Use in Conversational Chess
- OpenAI Assistants in AG2
- Group Chat with Coder and Visualization Critic
- Agent Chat with Multimodal Models: LLaVA
- SocietyOfMindAgent
- Conversational Workflows with MCP: A Marie Antoinette Take on The Eiffel Tower
- Agent Chat with Multimodal Models: DALLE and GPT-4V
- StateFlow: Build Workflows through State-Oriented Actions
- Small, Local Model (IBM Granite) Multi-Agent RAG
- Group Chat with Tools
- DuckDuckGo Search Tool
- Load the configuration including the response format
- RealtimeAgent in a Swarm Orchestration
- MCP Clients
- Agent Tracking with AgentOps
- SQL Agent for Spider text-to-SQL benchmark
- AutoBuild
- Automatically Build Multi-agent System from Agent Library
- Generate Dalle Images With Conversable Agents
- Agent Observability with OpenLIT
- Using RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering
- A Uniform interface to call different LLMs
- OptiGuide with Nested Chats in AG2
- Conversational Chess using non-OpenAI clients
- Chat Context Dependency Injection
- Agent with memory using Mem0
- `run` function examples with event processing
- From Dad Jokes To Sad Jokes: Function Calling with GPTAssistantAgent
- Mitigating Prompt hacking with JSON Mode in Autogen
- Group Chat with Customized Speaker Selection Method
- Trip planning with a FalkorDB GraphRAG agent using a Swarm
- Tavily Search Tool
- Structured output
- Adding Google Search Capability to AG2
- Task Solving with Provided Tools as Functions (Asynchronous Function Calls)
- Conversational Workflows with MCP: A French joke on a random Wikipedia article
- Auto Generated Agent Chat: Using MathChat to Solve Math Problems
- Agent Chat with custom model loading
- Auto Generated Agent Chat: Function Inception
- Use AG2 to Tune ChatGPT
- Auto Generated Agent Chat: Group Chat with GPTAssistantAgent
- Using FalkorGraphRagCapability with agents for GraphRAG Question & Answering
- Solving Multiple Tasks in a Sequence of Async Chats
- Discord, Slack, and Telegram messaging tools
- RealtimeAgent in a Swarm Orchestration using WebRTC
- Runtime Logging with AG2
- WebSurferAgent
- Use MongoDBQueryEngine to query Markdown files
- Conversational Workflows with MCP: A Shakespearean Take on arXiv Abstracts
- RAG with DocAgent
- Solving Multiple Tasks in a Sequence of Chats
- Cross-Framework LLM Tool for CaptainAgent
- Web Scraping using Apify Tools
- Auto Generated Agent Chat: Collaborative Task Solving with Coding and Planning Agent
- Currency Calculator: Task Solving with Provided Tools as Functions
- Use ChromaDBQueryEngine to query Markdown files
- Using RetrieveChat for Retrieve Augmented Code Generation and Question Answering
- Writing a software application using function calls
- Using OpenAI’s Web Search Tool with AG2
- Using RetrieveChat Powered by PGVector for Retrieve Augmented Code Generation and Question Answering
- Enhanced Swarm Orchestration with AG2
- Perform Research with Multi-Agent Group Chat
- Auto Generated Agent Chat: Teaching AI New Skills via Natural Language Interaction
- Usage tracking with AG2
- Using Neo4j's native GraphRAG SDK with AG2 agents for Question & Answering
- Groupchat with Llamaindex agents
- Assistants with Azure Cognitive Search and Azure Identity
- Swarm Orchestration with AG2
- Tools with Dependency Injection
- Structured output from json configuration
- Solving Multiple Tasks in a Sequence of Chats with Different Conversable Agent Pairs
- Chat with OpenAI Assistant using function call in AG2: OSS Insights for Advanced GitHub Data Analysis
- Auto Generated Agent Chat: Collaborative Task Solving with Multiple Agents and Human Users
- Adding YouTube Search Capability to AG2
- Chatting with a teachable agent
- Making OpenAI Assistants Teachable
- RealtimeAgent in a Swarm Orchestration
- Translating Video audio using Whisper and GPT-3.5-turbo
- Run a standalone AssistantAgent
- Auto Generated Agent Chat: Task Solving with Langchain Provided Tools as Functions
- Use LLamaIndexQueryEngine to query Markdown files
- Auto Generated Agent Chat: GPTAssistant with Code Interpreter
- Interactive LLM Agent Dealing with Data Stream
- Agent Chat with Async Human Inputs
- ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies
- Auto Generated Agent Chat: Solving Tasks Requiring Web Info
- Use AG2 to Tune OpenAI Models
- Engaging with Multimodal Models: GPT-4V in AG2
- Supercharging Web Crawling with Crawl4AI
- Use AG2 in Microsoft Fabric
- Cross-Framework LLM Tool Integration with AG2
- Demonstrating the `AgentEval` framework using the task of solving math problems as an example
- Group Chat
- Adding Browsing Capabilities to AG2
- CaptainAgent
- (Legacy) Implement Swarm-style orchestration with GroupChat
- Task Solving with Code Generation, Execution and Debugging
- RealtimeAgent with local websocket connection
- Community Gallery
Notebooks
Agent Chat with Multimodal Models: DALLE and GPT-4V
Multimodal agent chat with DALL-E and GPT-4v.
Requires: OpenAI V1.
Before everything starts, install AutoGen with the lmm
option
Copy
pip install "ag2[lmm]>=0.2.3"
Copy
import os
import re
from typing import Any, Optional, Union
import PIL
import matplotlib.pyplot as plt
from PIL import Image
from diskcache import Cache
from openai import OpenAI
from autogen import Agent, AssistantAgent, ConversableAgent, LLMConfig, UserProxyAgent
from autogen.agentchat.contrib.img_utils import _to_pil, get_image_data, get_pil_image
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
Copy
config_path = "OAI_CONFIG_LIST"
llm_config_4v = LLMConfig.from_json(
path=config_path,
max_tokens=1000,
).where(
model=["gpt-4-vision-preview"],
)
llm_config_dalle = LLMConfig.from_json(
path=config_path,
).where(model=["dalle"])
gpt4_llm_config = LLMConfig.from_json(path=config_path, cache_seed=42).where(
model=["gpt-4", "gpt-4-0314", "gpt4", "gpt-4-32k", "gpt-4-32k-0314", "gpt-4-32k-v0314"],
)
The llm_config_dalle
should be something like:
Copy
LLMConfig(
config_list=[
{'api_type': 'openai', 'model': 'dalle', 'api_key': '**********', 'api_version': '2024-02-01', 'tags': []}
]
)
Helper Functions
We first create a wrapper for DALLE call, make the
Copy
def dalle_call(client: OpenAI, model: str, prompt: str, size: str, quality: str, n: int) -> str:
"""Generate an image using OpenAI's DALL-E model and cache the result.
This function takes a prompt and other parameters to generate an image using OpenAI's DALL-E model.
It checks if the result is already cached; if so, it returns the cached image data. Otherwise,
it calls the DALL-E API to generate the image, stores the result in the cache, and then returns it.
Args:
client (OpenAI): The OpenAI client instance for making API calls.
model (str): The specific DALL-E model to use for image generation.
prompt (str): The text prompt based on which the image is generated.
size (str): The size specification of the image. TODO: This should allow specifying landscape, square, or portrait modes.
quality (str): The quality setting for the image generation.
n (int): The number of images to generate.
Returns:
str: The image data as a string, either retrieved from the cache or newly generated.
Note:
- The cache is stored in a directory named '.cache/'.
- The function uses a tuple of (model, prompt, size, quality, n) as the key for caching.
- The image data is obtained by making a secondary request to the URL provided by the DALL-E API response.
"""
# Function implementation...
cache = Cache(".cache/") # Create a cache directory
key = (model, prompt, size, quality, n)
if key in cache:
return cache[key]
# If not in cache, compute and store the result
response = client.images.generate(
model=model,
prompt=prompt,
size=size,
quality=quality,
n=n,
)
image_url = response.data[0].url
img_data = get_image_data(image_url)
cache[key] = img_data
return img_data
Here is a helper function to extract image from a DALLE agent. We will show the DALLE agent later.
Copy
def extract_img(agent: Agent) -> PIL.Image:
"""Extracts an image from the last message of an agent and converts it to a PIL image.
This function searches the last message sent by the given agent for an image tag,
extracts the image data, and then converts this data into a PIL (Python Imaging Library) image object.
Parameters:
agent (Agent): An instance of an agent from which the last message will be retrieved.
Returns:
PIL.Image: A PIL image object created from the extracted image data.
Note:
- The function assumes that the last message contains an <img> tag with image data.
- The image data is extracted using a regular expression that searches for <img> tags.
- It's important that the agent's last message contains properly formatted image data for successful extraction.
- The `_to_pil` function is used to convert the extracted image data into a PIL image.
- If no <img> tag is found, or if the image data is not correctly formatted, the function may raise an error.
"""
last_message = agent.last_message()["content"]
if isinstance(last_message, str):
img_data = re.findall("<img (.*)>", last_message)[0]
elif isinstance(last_message, list):
# The GPT-4V format, where the content is an array of data
assert isinstance(last_message[0], dict)
img_data = last_message[0]["image_url"]["url"]
pil_img = get_pil_image(img_data)
return pil_img
The DALLE Agent
Copy
class DALLEAgent(ConversableAgent):
def __init__(self, name, llm_config: dict[str, Any], **kwargs: Any):
super().__init__(name, llm_config=llm_config, **kwargs)
try:
config_list = llm_config["config_list"]
api_key = config_list[0]["api_key"]
except Exception as e:
print("Unable to fetch API Key, because", e)
api_key = os.getenv("OPENAI_API_KEY")
self._dalle_client = OpenAI(api_key=api_key)
self.register_reply([Agent, None], DALLEAgent.generate_dalle_reply)
def send(
self,
message: Union[dict[str, Any], str],
recipient: Agent,
request_reply: Optional[bool] = None,
silent: Optional[bool] = False,
):
# override and always "silent" the send out message;
# otherwise, the print log would be super long!
super().send(message, recipient, request_reply, silent=True)
def generate_dalle_reply(self, messages: Optional[list[dict[str, Any]]], sender: "Agent", config):
"""Generate a reply using OpenAI DALLE call."""
client = self._dalle_client if config is None else config
if client is None:
return False, None
if messages is None:
messages = self._oai_messages[sender]
prompt = messages[-1]["content"]
# TODO: integrate with autogen.oai. For instance, with caching for the API call
img_data = dalle_call(
client=client,
model="dall-e-3",
prompt=prompt,
size="1024x1024", # TODO: the size should be flexible, deciding landscape, square, or portrait mode.
quality="standard",
n=1,
)
img_data = _to_pil(img_data) # Convert to PIL image
# Return the OpenAI message format
return True, {"content": [{"type": "image_url", "image_url": {"url": img_data}}]}
Simple Example: Call directly from User
Copy
dalle = DALLEAgent(name="Dalle", llm_config=llm_config_dalle)
user_proxy = UserProxyAgent(
name="User_proxy", system_message="A human admin.", human_input_mode="NEVER", max_consecutive_auto_reply=0
)
# Ask the question with an image
user_proxy.initiate_chat(
dalle,
message="""Create an image with black background, a happy robot is showing a sign with "I Love AG2".""",
)
Copy
img = extract_img(dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
Example With Critics: Iterate several times to improve
Copy
class DalleCreator(AssistantAgent):
def __init__(self, n_iters=2, **kwargs):
"""Initializes a DalleCreator instance.
This agent facilitates the creation of visualizations through a collaborative effort among
its child agents: dalle and critics.
Parameters:
- n_iters (int, optional): The number of "improvement" iterations to run. Defaults to 2.
- **kwargs: keyword arguments for the parent AssistantAgent.
"""
super().__init__(**kwargs)
self.register_reply([Agent, None], reply_func=DalleCreator._reply_user, position=0)
self._n_iters = n_iters
def _reply_user(self, messages=None, sender=None, config=None):
if all((messages is None, sender is None)):
error_msg = f"Either {messages=} or {sender=} must be provided."
logger.error(error_msg) # noqa: F821
raise AssertionError(error_msg)
if messages is None:
messages = self._oai_messages[sender]
img_prompt = messages[-1]["content"]
# Define the agents
self.critics = MultimodalConversableAgent(
name="Critics",
system_message="""You need to improve the prompt of the figures you saw.
How to create a figure that is better in terms of color, shape, text (clarity), and other things.
Reply with the following format:
CRITICS: the image needs to improve...
PROMPT: here is the updated prompt!
""",
llm_config=llm_config_4v,
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
)
self.dalle = DALLEAgent(name="Dalle", llm_config=llm_config_dalle, max_consecutive_auto_reply=0)
# Data flow begins
self.send(message=img_prompt, recipient=self.dalle, request_reply=True)
img = extract_img(self.dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
print("Image PLOTTED")
for i in range(self._n_iters):
# Downsample the image s.t. GPT-4V can take
img = extract_img(self.dalle)
smaller_image = img.resize((128, 128), Image.Resampling.LANCZOS)
smaller_image.save("result.png")
self.msg_to_critics = f"""Here is the prompt: {img_prompt}.
Here is the figure <img result.png>.
Now, critic and create a prompt so that DALLE can give me a better image.
Show me both "CRITICS" and "PROMPT"!
"""
self.send(message=self.msg_to_critics, recipient=self.critics, request_reply=True)
feedback = self._oai_messages[self.critics][-1]["content"]
img_prompt = re.findall("PROMPT: (.*)", feedback)[0]
self.send(message=img_prompt, recipient=self.dalle, request_reply=True)
img = extract_img(self.dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
print(f"Image {i} PLOTTED")
return True, "result.jpg"
Copy
creator = DalleCreator(
name="DALLE Creator!",
max_consecutive_auto_reply=0,
system_message="Help me coordinate generating image",
llm_config=gpt4_llm_config,
)
user_proxy = UserProxyAgent(name="User", human_input_mode="NEVER", max_consecutive_auto_reply=0)
user_proxy.initiate_chat(
creator, message="""Create an image with black background, a happy robot is showing a sign with "I Love AG2"."""
)
Assistant
Responses are generated using AI and may contain mistakes.