Code Executors
In the last chapter, we used two agents powered by a large language model (LLM) to play a game by exchanging messages. In this chapter, we introduce code executors, which enable agents to not just chat but also to interact with an environment and perform useful computations and take actions.
Overview
In AutoGen, a code executor is a component that takes input messages (e.g., those containing code blocks), performs execution, and outputs messages with the results. AutoGen provides two types of built-in code executors, one is command line code executor, which runs code in a command line environment such as a UNIX shell, and the other is Jupyter executor, which runs code in an interactive Jupyter kernel.
For each type of executor, AutoGen provides two ways to execute code: locally and in a Docker container. One way is to execute code directly in the same host platform where AutoGen is running, i.e., the local operating system. It is for development and testing, but it is not ideal for production as LLM can generate arbitrary code. The other way is to execute code in a Docker container. The table below shows the combinations of code executors and execution environments.
Code Executor (autogen.coding ) |
Environment | Platform |
---|---|---|
LocalCommandLineCodeExecutor |
Shell | Local |
DockerCommandLineCodeExecutor |
Shell | Docker |
jupyter.JupyterCodeExecutor |
Jupyter Kernel (e.g., python3) | Local/Docker |
In this chapter, we will focus on the command line code executors. For the Jupyter code executor, please refer to the topic page for Jupyter Code Executor.
Local Execution
The figure below shows the architecture of the local command line code
executor
(autogen.coding.LocalCommandLineCodeExecutor
).
Upon receiving a message with a code block, the local command line code executor first writes the code block to a code file, then starts a new subprocess to execute the code file. The executor reads the console output of the code execution and sends it back as a reply message.
Here is an example of using the code executor to run a Python code block
that prints a random number. First we create an agent with the code
executor that uses a temporary directory to store the code files. We
specify human_input_mode="ALWAYS"
to manually validate the safety of
the the code being executed.
Before running this example, we need to make sure the matplotlib
and
numpy
are installed.
Now we have the agent generate a reply given a message with a Python code block.
During the generation of response, a human input is requested to give an opportunity to intercept the code execution. In this case, we choose to continue the execution, and the agent’s reply contains the output of the code execution.
We can take a look at the generated plot in the temporary directory.
Clean up the working directory to avoid affecting future conversations.
Docker Execution
To mitigate the security risk of running LLM-generated code locally, we
can use the docker command line code executor
(autogen.coding.DockerCommandLineCodeExecutor
)
to execute code in a docker container. This way, the generated code can
only access resources that are explicitly given to it.
The figure below illustrates how docker execution works.
Similar to the local command line code executor, the docker executor extracts code blocks from input messages, writes them to code files. For each code file, it starts a docker container to execute the code file, and reads the console output of the code execution.
To use docker execution, you need to install Docker on your machine. Once you have Docker installed and running, you can set up your code executor agent as follow:
The work_dir
in the constructor points to a local file system
directory just like in the local execution case. The docker container
will mount this directory and the executor write code files and output
to it.
Use Code Execution in Conversation
Writing and executing code is necessary for many tasks such as data analysis, machine learning, and mathematical modeling. In AutoGen, coding can be a conversation between a code writer agent and a code executor agent, mirroring the interaction between a programmer and a code interpreter.
The code writer agent can be powered by an LLM such as GPT-4 with code-writing capability. And the code executor agent is powered by a code executor.
The following is an agent with a code writer role specified using
system_message
. The system message contains important instruction on
how to use the code executor in the code executor agent.
Here is an example of solving a math problem through a conversation between the code writer agent and the code executor agent (created above).
During the previous chat session, human input was requested each time the code executor agent responded to ensure that the code was safe to execute.
Now we can try a more complex example that involves querying the web. Let’s say we want to get the the stock price gains year-to-date for Tesla and Meta (formerly Facebook). We can also use the two agents with several iterations of conversation.
In the previous conversation, the code writer agent generated a code block to install necessary packages and another code block for a script to fetch the stock price and calculate gains year-to-date for Tesla and Meta. The code executor agent installed the packages, executed the script, and returned the results.
Let’s take a look at the chart that was generated.
Because code execution leave traces like code files and output in the file system, we may want to clean up the working directory after each conversation concludes.
Stop the docker command line executor to clean up the docker container.
Command Line or Jupyter Code Executor?
The command line code executor does not keep any state in memory between executions of different code blocks it receives, as it writes each code block to a separate file and executes the code block in a new process.
Contrast to the command line code executor, the Jupyter code executor runs all code blocks in the same Jupyter kernel, which keeps the state in memory between executions. See the topic page for Jupyter Code Executor.
The choice between command line and Jupyter code executor depends on the nature of the code blocks in agents’ conversation. If each code block is a “script” that does not use variables from previous code blocks, the command line code executor is a good choice. If some code blocks contain expensive computations (e.g., training a machine learning model and loading a large amount of data), and you want to keep the state in memory to avoid repeated computations, the Jupyter code executor is a better choice.
Note on User Proxy Agent and Assistant Agent
User Proxy Agent
In the previous examples, we create the code executor agent directly
using the
ConversableAgent
class. Existing AutoGen examples often create code executor agent using
the
UserProxyAgent
class, which is a subclass of
ConversableAgent
with human_input_mode=ALWAYS
and llm_config=False
– it always
requests human input for every message and does not use LLM. It also
comes with default description
field for each of the
human_input_mode
setting. This class is a convenient short-cut for
creating an agent that is intended to be used as a code executor.
Assistant Agent
In the previous examples, we created the code writer agent directly
using the
ConversableAgent
class. Existing AutoGen examples often create the code writer agent
using the
AssistantAgent
class, which is a subclass of
ConversableAgent
with human_input_mode=NEVER
and code_execution_config=False
– it
never requests human input and does not use code executor. It also comes
with default system_message
and description
fields. This class is a
convenient short-cut for creating an agent that is intended to be used
as a code writer and does not execute code.
In fact, in the previous example we use the default system_message
field of the
AssistantAgent
class to instruct the code writer agent how to use code executor.
Best Practice
It is very important to note that the
UserProxyAgent
and
AssistantAgent
are meant to be shortcuts to avoid writing the system_message
instructions for the
ConversableAgent
class. They are not suitable for all use cases. As we will show in the
next chapter, tuning the system_message
field is vital for agent to
work properly in more complex conversation patterns beyond two-agent
chat.
As a best practice, always tune your agent’s system_message
instructions for your specific use case and avoid subclassing
UserProxyAgent
and
AssistantAgent
.
Summary
In this chapter, we introduced code executors, how to set up Docker and local execution, and how to use code execution in a conversation to solve tasks. In the next chapter, we will introduce tool use, which is similar to code executors but restricts what code an agent can execute.