Conversation Patterns
In the previous chapter we used two-agent conversation, which can be
started by the initiate_chat
method. Two-agent chat is a useful
conversation pattern but AutoGen offers more. In this chapter, we will
first dig a little bit more into the two-agent chat pattern and chat
result, then we will show you several conversation patterns that involve
more than two agents.
An Overview
- Two-agent chat: the simplest form of conversation pattern where two agents chat with each other.
- Sequential chat: a sequence of chats between two agents, chained together by a carryover mechanism, which brings the summary of the previous chat to the context of the next chat.
- Group Chat: a single chat involving more than two agents. An
important question in group chat is: What agent should be next to
speak? To support different scenarios, we provide different ways to
organize agents in a group chat:
- We support several strategies to select the next agent:
round_robin
,random
,manual
(human selection), andauto
(Default, using an LLM to decide). - We provide a way to constrain the selection of the next speaker (See examples below).
- We allow you to pass in a function to customize the selection of the next speaker. With this feature, you can build a StateFlow model which allows a deterministic workflow among your agents. Please refer to this guide and this blog post on StateFlow for more details.
- We support several strategies to select the next agent:
- Nested Chat: package a workflow into a single agent for reuse in a larger workflow.
Two-Agent Chat and Chat Result
Two-agent chat is the simplest form of conversation pattern. We start a
two-agent chat using the initiate_chat
method of every
ConversableAgent
agent. We have already seen multiple examples of
two-agent chats in previous chapters but we haven’t covered the details.
The following figure illustrates how two-agent chat works.
A two-agent chats takes two inputs: a message, which is a string
provided by the caller; a context, which specifies various parameters of
the chat. The sender agent uses its chat initializer method (i.e.,
generate_init_message
method of ConversableAgent
) to generate an
initial message from the inputs, and sends it to the recipient agent to
start the chat. The sender agent is the agent whose initiate_chat
method is called, and the recipient agent is the other agent.
Once the chat terminates, the history of the chat is processed by a chat
summarizer. The summarizer summarizes the chat history and calculates
the token usage of the chat. You can configure the type of summary using
the summary_method
parameter of the initiate_chat
method. By
default, it is the last message of the chat (i.e.,
summary_method='last_msg'
).
The example below is a two-agent chat between a student agent and a teacher agent. Its summarizer uses an LLM-based summary.
Let’s see what the summary looks like. The summary is stored in the
chat_result
object of the type ChatResult
that was returned by the
initiate_chat
method.
In the above example, the summary method is set to reflection_with_llm
which takes a list of messages from the conversation and summarize them
using a call to an LLM. The summary method first tries to use the
recipient’s LLM, if it is not available then it uses the sender’s LLM.
In this case the recipient is “Teacher_Agent” and the sender is
“Student_Agent”. The input prompt for the LLM is the following default
prompt:
You can also use a custom prompt by setting the summary_prompt
argument of initiate_chat
.
There are some other useful information in the ChatResult
object,
including the conversation history, human input, and token cost.
That chat messages in the chat result are from the recipient agent’s perspective – the sender is the “assistant” and the recipient is the “user”.
Sequential Chats
The name of this pattern is self-explanatory – it is a sequence of chats between two agents, chained together by a mechanism called carryover, which brings the summary of the previous chat to the context of the next chat.
This pattern is useful for complex task that can be broken down into interdependent sub-tasks. The figure below illustrate how this pattern works.
In this pattern, the a pair of agents first start a two-agent chat, then
the summary of the conversation becomes a carryover for the next
two-agent chat. The next chat passes the carryover to the carryover
parameter of the context to generate its initial message.
Carryover accumulates as the conversation moves forward, so each subsequent chat starts with all the carryovers from previous chats.
The figure above shows distinct recipient agents for all the chats, however, the recipient agents in the sequence are allowed to repeat.
To illustrate this pattern, let’s consider a simple example of arithmetic operator agents. One agent (called the “Number_Agent”) is responsible for coming up with a number, and other agents are responsible for performing a specific arithmetic operation on the number, e.g., add 1, multiply by 2, etc..
The Number Agent chats with the first operator agent, then the second
operator agent, and so on. After each chat, the last message in the
conversation (i.e., the result of the arithmetic operation from the
operator agent) is used as the summary of the chat. This is specified by
the summary_method
parameter. In the end we will have the result of
the arithmetic operations.
First thing to note is that the initiate_chats
method takes a list of
dictionaries, each dictionary contains the arguments for the
initiate_chat
method.
Second, each chat in the sequence has a maximum round of 2, as specified
with the setting max_turns=2
, which means each arithmetic operation is
performed twice. So you can see in the first chat the number 14 becomes
15 and then 16, in the second chat the number 16 becomes 32 and then 64,
and so on.
Third, the carryover accumulates as the chats go on. In the second chat, the carryover is the summary of the first chat “16”. In the third chat, the carryover is the summary of the first and second chat, which is the list “16” and “64”, and both numbers are operated upon. In the forth and last chat, the carryover is the summary of all previous chats, which is the list “16”, “64”, “14” and “62”, and all of these numbers are operated upon.
The final note is that the initiate_chats
method returns a list of
ChatResult
objects, one for each chat in the sequence.
Besides calling initiate_chats
from the same sender agent, you can
also call a high-level function autogen.agentchat.initiate_chats
to
start a sequence of two-agent chats with different sender agents. This
function allows you to specify the sender agent for each chat in the
sequence.
Group Chat
So far we have only seen conversation patterns that involve two agents or a sequence of two-agent chats. AutoGen provides a more general conversation pattern called group chat, which involves more than two agents. The core idea of group chat is that all agents contribute to a single conversation thread and share the same context. This is useful for tasks that require collaboration among multiple agents.
The figure below illustrates how group chat works.
A group chat is orchestrated by a special agent type GroupChatManager
.
In the first step of the group chat, the Group Chat Manager selects an
agent to speak. Then, the selected agent speaks and the message is sent
back to the Group Chat Manager, who broadcasts the message to all
other agents in the group. This process repeats until the conversation
stops.
The Group Chat Manager can use several strategies to select the next agent. Currently, the following strategies are supported:
round_robin
: The Group Chat Manager selects agents in a round-robin fashion based on the order of the agents provided.random
: The Group Chat Manager selects agents randomly.manual
: The Group Chat Manager selects agents by asking for human input.auto
: The default strategy, which selects agents using the Group Chat Manager’s LLM.
To illustrate this pattern, let’s consider a simple example of a group chat among the same arithmetic operator agents as in the previous example, with the objective of turning a number into a specific target number using a sequence of arithmetic operations powered by the agents.
In this example, we use the auto
strategy to select the next agent. To
help the Group Chat Manager select the next agent, we also set the
description
of the agents. Without the description
, the Group Chat
Manager will use the agents’ system_message
, which may be not be the
best choice.
We first create a GroupChat
object and provide the list of agents. If
we were to use the round_robin
strategy, this list would specify the
order of the agents to be selected. We also initialize the group chat
with an empty message list and a maximum round of 6, which means there
will be at most 6 iteratiosn of selecting speaker, agent speaks and
broadcasting message.
Now we create a GroupChatManager
object and provide the GroupChat
object as input. We also need to specify the llm_config
of the Group
Chat Manager so it can use the LLM to select the next agent (the auto
strategy).
Finally, we have the Number Agent from before to start a two-agent chat with the Group Chat Manager, which runs the group chat internally and terminates the two-agent chat when the internal group chat is done. Because the Number Agent is selected to speak by us, it counts as the first round of the group chat.
You can see that the Number Agent is selected to speak first, then the Group Chat Manager selects the Multiplier Agent to speak, then the Adder Agent, and so on. The number is operated upon by each agent in the group chat, and the final result is 13.
We can take a look at the summary of the group chat, provided by the
ChatResult
object returned by the initiate_chat
method.
Send Introductions
In the previous example, we set the description
of the agents to help
the Group Chat Manager select the next agent. This only helps the Group
Chat Manager, however, does not help the participating agents to know
about each other. Sometimes it is useful have each agent introduce
themselves to other agents in the group chat. This can be done by
setting the send_introductions=True
.
Under the hood, the Group Chat Manager sends a message containing the agents’ names and descriptions to all agents in the group chat before the group chat starts.
Group Chat in a Sequential Chat
Group chat can also be used as a part of a sequential chat. In this case, the Group Chat Manager is treated as a regular agent in the sequence of two-agent chats.
In the above example, the Group Chat Manager runs the group chat two times. In the first time the number 3 becomes 13, and the last message of this group chat is being used as the carryover for the next group chat, which starts from 13.
You can also see from the warning message that the Group Chat Manager’s
history is being cleared after the first group chat, which is the
default. To keep the history of the Group Chat Manager, you can set the
clear_history=False
for the first chat.
Constrained Speaker Selection
Group chat is a powerful conversation pattern, but it can be hard to
control if the number of participating agents is large. AutoGen provides
a way to constrain the selection of the next speaker by using the
allowed_or_disallowed_speaker_transitions
argument of the GroupChat
class.
The allowed_or_disallowed_speaker_transitions
argument is a dictionary
that maps a given agent to a list of agents that can (or cannot) be
selected to speak next. The speaker_transitions_type
argument
specifies whether the transitions are allowed or disallowed.
Here is an example:
In this example, the allowed transitions are specified for each agent.
The Number Agent can be followed by the Adder Agent and the Number
Agent, the Adder Agent can be followed by the Multiplier Agent and the
Number Agent, and so on. Let’s put this into the group chat and see how
it works. The speaker_transitions_type
is set to allowed
so the
transitions are positive constraints.
This time, the agents are selected following the constraints we have specified.
Changing the select speaker role name
As part of the Group chat process, when the select_speaker_method is set to ‘auto’ (the default value), a select speaker message is sent to the LLM to determine the next speaker.
Each message in the chat sequence has a role
attribute that is
typically user
, assistant
, or system
. The select speaker message
is the last in the chat sequence when used and, by default, has a role
of system
.
When using some models, such as Mistral through Mistral.AI’s API, the
role on the last message in the chat sequence has to be user
.
To change the default behaviour, Autogen provides a way to set the value
of the select speaker message’s role to any string value by setting the
role_for_select_speaker_messages
parameter in the GroupChat’s
constructor. The default value is system
and by setting it to user
you can accommodate the last message role requirement of Mistral.AI’s
API.
Nested Chats
The previous conversations patterns (two-agent chat, sequential chat, and group chat) are useful for building complex workflows, however, they do not expose a single conversational interface, which is often needed for scenarios like question-answering bots and personal assistants. In some other cases, it is also useful to package a workflow into a single agent for reuse in a larger workflow. AutoGen provides a way to achieve this by using nested chats.
Nested chats is powered by the nested chats handler, which is a
pluggable component of ConversableAgent
. The figure below illustrates
how the nested chats handler triggers a sequence of nested chats when a
message is received.
When a message comes in and passes the human-in-the-loop component, the nested chats handler checks if the message should trigger a nested chat based on conditions specified by the user. If the conditions are met, the nested chats handler starts a sequence of nested chats specified using the sequential chats pattern. In each of the nested chats, the sender agent is always the same agent that triggered the nested chats. In the end, the nested chat handler uses the results of the nested chats to produce a response to the original message. By default, the nested chat handler uses the summary of the last chat as the response.
Here is an example of using nested chats to build an arithmetic agent that packages arithmetic operations, code-based validation, and poetry into a single agent. This arithmetic agent takes a number transformation request like “turn number 3 into 13” and returns a poem that describes a transformation attempt.
First we define the agents. We reuse the
group_chat_manager_with_intros
from previous example to orchestrate
the arithmetic operations.
Now we define the nested chats using the sequential chat pattern. All
the senders are always artihmetic_agent
.
Now we register the nested chats handler to the arithmetic_agent
and
set the conditions for triggering the nested chats.
Finally, we call generate_reply
to get a response from the
arithmetic_agent
– this will trigger a sequence of nested chats and
return the summary of the last nested chat as the response.
A poem is returned as the response, which describes the transformation attempt from 3 to 7.
The implementation of the nested chats handler makes use of the
register_reply
method, which allows you to make extensive customization to
ConversableAgent
. The GroupChatManager uses the same mechanism to
implement the group chat.
Nested chat is a powerful conversation pattern that allows you to package complex workflows into a single agent. You can hide tool usages within a single agent by having the tool-caller agent starts a nested chat with a tool-executor agent and then use the result of the nested chat to generate a response. See the nested chats for tool use notebook for an example.
Summary
In this chapter, we covered two-agent chat, sequential chat, group chat,
and nested chat patterns. You can compose these patterns like LEGO
blocks to create complex workflows. You can also use
register_reply
to create new patterns.
This is the last chapter on basic AutoGen concepts. In the next chatper, we will give you some tips on what to do next.