Documentation Index
Fetch the complete documentation index at: https://docs.ag2.ai/llms.txt
Use this file to discover all available pages before exploring further.
1. Use AG2’s OpenAIWrapper for cost estimation
The OpenAIWrapper from autogen tracks token counts and costs of your
API calls. Use the create() method to initiate requests and
print_usage_summary() to retrieve a detailed usage report, including
total cost and token usage for both cached and actual requests.
mode=["actual", "total"] (default): print usage summary for
non-caching completions and all completions (including cache).
mode='actual': only print non-cached usage.
mode='total': only print all usage (including cache).
Reset your session’s usage data with clear_usage_summary() when
needed.
2. Track cost and token count for agents
We also support cost estimation for agents. Use
Agent.print_usage_summary() to print the cost summary for the agent.
You can retrieve usage summary in a dict using
Agent.get_actual_usage() and Agent.get_total_usage(). Note that
Agent.reset() will also reset the usage summary.
To gather usage data for a list of agents, we provide an utility
function autogen.gather_usage_summary(agents) where you pass in a list
of agents and gather the usage summary.
3. Custom token price for up-to-date cost estimation
AG2 tries to keep the token prices up-to-date. However, you can pass in
a price field in config_list if the token price is not listed or
up-to-date. Please creating an issue or pull request to help us keep the
token prices up-to-date!
Note: in json files, the price should be a list of two floats.
Example Usage:
{
"model": "gpt-3.5-turbo-xxxx",
"api_key": "YOUR_API_KEY",
"price": [0.0005, 0.0015]
}
Caution when using Azure OpenAI!
If you are using azure OpenAI, the model returned from completion
doesn’t have the version information. The returned model is either
‘gpt-35-turbo’ or ‘gpt-4’. From there, we are calculating the cost based
on gpt-3.5-turbo-0125: (0.0005, 0.0015) per 1k prompt and completion
tokens and gpt-4-0613: (0.03, 0.06). This means the cost can be wrong if
you are using a different version from azure OpenAI.
This will be improved in the future. However, the token count summary is
accurate. You can use the token count to calculate the cost yourself.
Requirements
AG2 requires Python>=3.9:
Set your API Endpoint
The
config_list_from_json
function loads a list of configurations from an environment variable or
a json file.
import autogen
from autogen import AssistantAgent, OpenAIWrapper, UserProxyAgent, gather_usage_summary
config_list = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"tags": ["gpt-4o", "gpt-4o-mini"], # comment out to get all
},
)
It first looks for environment variable “OAI_CONFIG_LIST” which needs to
be a valid json string. If that variable is not found, it then looks for
a json file named “OAI_CONFIG_LIST”. It filters the configs by tags (you
can filter by other keys as well).
The config list looks like the following:
config_list = [
{
"model": "gpt-4o",
"api_key": "<your OpenAI API key>",
"tags": ["gpt-4o"],
}, # OpenAI API endpoint for gpt-4o
{
"model": "gpt-4o-mini",
"base_url": "<your Azure OpenAI API base>",
"api_type": "azure",
"api_version": "2024-07-18",
"api_key": "<your Azure OpenAI API key>",
"tags": ["gpt-4o-mini", "20240718"],
}
]
You can set the value of config_list in any way you prefer. Please refer
to this User
Guide
for full code examples of the different methods.
OpenAIWrapper with cost estimation
client = OpenAIWrapper(config_list=config_list)
messages = [
{"role": "user", "content": "Can you give me 3 useful tips on learning Python? Keep it simple and short."},
]
response = client.create(messages=messages, cache_seed=None)
print(response.cost)
OpenAIWrapper with custom token price
# Adding price to the config_list
for i in range(len(config_list)):
config_list[i]["price"] = [
1,
1,
] # Note: This price is just for demonstration purposes. Please replace it with the actual price of the model.
client = OpenAIWrapper(config_list=config_list)
messages = [
{"role": "user", "content": "Can you give me 3 useful tips on learning Python? Keep it simple and short."},
]
response = client.create(messages=messages, cache_seed=None)
print("Price:", response.cost)
Usage Summary for OpenAIWrapper
When creating a instance of OpenAIWrapper, cost of all completions from
the same instance is recorded. You can call print_usage_summary() to
checkout your usage summary. To clear up, use clear_usage_summary().
client = OpenAIWrapper(config_list=config_list)
messages = [
{"role": "user", "content": "Can you give me 3 useful tips on learning Python? Keep it simple and short."},
]
client.print_usage_summary() # print usage summary
# The first creation
# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.
response = client.create(messages=messages, cache_seed=41)
client.print_usage_summary() # default to ["actual", "total"]
client.print_usage_summary(mode="actual") # print actual usage summary
client.print_usage_summary(mode="total") # print total usage summary
# take out cost
print(client.actual_usage_summary)
print(client.total_usage_summary)
# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost.
# So actual cost doesn't change but total cost doubles.
response = client.create(messages=messages, cache_seed=41)
client.print_usage_summary()
# clear usage summary
client.clear_usage_summary()
client.print_usage_summary()
# all completions are returned from cache, so no actual cost incurred.
response = client.create(messages=messages, cache_seed=41)
client.print_usage_summary()
Usage Summary for Agents
Agent.print_usage_summary() will print the cost summary for the
agent.
Agent.get_actual_usage() and Agent.get_total_usage() will return
the usage summary in a dict. When an agent doesn’t use LLM, they
will return None.
Agent.reset() will reset the usage summary.
autogen.gather_usage_summary will gather the usage summary for a
list of agents.
assistant = AssistantAgent(
"assistant",
system_message="You are a helpful assistant.",
llm_config={
"timeout": 600,
"cache_seed": None,
"config_list": config_list,
},
)
ai_user_proxy = UserProxyAgent(
name="ai_user",
human_input_mode="NEVER",
max_consecutive_auto_reply=1,
code_execution_config=False,
llm_config={
"config_list": config_list,
},
# In the system message the "user" always refers to the other agent.
system_message="You ask a user for help. You check the answer from the user and provide feedback.",
)
assistant.reset()
math_problem = "$x^3=125$. What is x?"
ai_user_proxy.initiate_chat(
assistant,
message=math_problem,
)
ai_user_proxy.print_usage_summary()
print()
assistant.print_usage_summary()
user_proxy = UserProxyAgent(
name="user",
human_input_mode="NEVER",
max_consecutive_auto_reply=2,
code_execution_config=False,
default_auto_reply="That's all. Thank you.",
)
user_proxy.print_usage_summary()
print("Actual usage summary for assistant (excluding completion from cache):", assistant.get_actual_usage())
print("Total usage summary for assistant (including completion from cache):", assistant.get_total_usage())
print("Actual usage summary for ai_user_proxy:", ai_user_proxy.get_actual_usage())
print("Total usage summary for ai_user_proxy:", ai_user_proxy.get_total_usage())
print("Actual usage summary for user_proxy:", user_proxy.get_actual_usage())
print("Total usage summary for user_proxy:", user_proxy.get_total_usage())
usage_summary = gather_usage_summary([assistant, ai_user_proxy, user_proxy])
usage_summary["usage_including_cached_inference"]