Installation

Install AG2 with Gemini features:

pip install ag2[openai,gemini]

Dependencies of This Notebook

In this notebook, we will explore how to use Gemini in AG2 alongside other tools. Install the necessary dependencies with the following command:

pip install ag2[openai,gemini,retrievechat,lmm]

If you have been using autogen or pyautogen, all you need to do is upgrade it using:

pip install -U autogen[openai,gemini,retrievechat,lmm]

or

pip install -U pyautogen[openai,gemini,retrievechat,lmm]

as pyautogen, autogen, and ag2 are aliases for the same PyPI package.

Features

There’s no need to handle OpenAI or Google’s GenAI packages separately; AG2 manages all of these for you. You can easily create different agents with various backend LLMs using the assistant agent. All models and agents are readily accessible at your fingertips.

Support features:

  • Function/tool calling
  • Structured Outputs (Notebook example)
  • Token usage and cost correctly as per Google’s API costs (as of December 2024)

Main Distinctions

  • Gemini accepts system instructions to guide model behavior. If provided, a system message is passed to Gemini’s system_instruction field.
  • If no API key is specified for Gemini, then authentication will happen using the default google auth mechanism for Google Cloud. Service accounts are also supported, where the JSON key file has to be provided.

Sample OAI_CONFIG_LIST

[
    {
        "model": "gemini-2.0-flash-lite",
        "api_key": "your Google's GenAI Key goes here",
        "api_type": "google"
    },
    {
        "model": "gemini-2.0-flash",
        "api_type": "google"
    },
    {
        "model": "gemini-1.5-pro",
        "project_id": "your-awesome-google-cloud-project-id",
        "location": "us-west1",
        "google_application_credentials": "your-google-service-account-key.json",
        "api_type": "google"
    }
]

Gemini coding example

import autogen
from autogen import AssistantAgent, UserProxyAgent
from autogen.code_utils import content_str

config_list_gemini = autogen.config_list_from_json(
    env_or_file="OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gemini-2.0-flash-lite"],
    },
)

seed = 42  # for caching, use None for no caching

assistant = AssistantAgent(
    name="assistant",
    system_message=(
      "You are a helpful coding assistant. "
      "After your code has been executed and you have a code execution result, say 'ALL DONE'. "
      "Do not say 'ALL DONE' in the same response as code."
    ),
    llm_config={"config_list": config_list_gemini, "seed": seed}, max_consecutive_auto_reply=3
)

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={"work_dir": "coding", "use_docker": False},
    human_input_mode="NEVER",
    is_termination_msg=lambda x: content_str(x.get("content")).find("ALL DONE") >= 0,
)

result = user_proxy.initiate_chat(
  recipient=assistant,
  message="Sort the array with Bubble Sort: [4, 1, 5, 2, 3]"
)
user_proxy (to assistant):

Sort the array with Bubble Sort: [4, 1, 5, 2, 3]

--------------------------------------------------------------------------------
/app/ag2/autogen/oai/gemini.py:880: UserWarning: Cost calculation is not implemented for model gemini-2.0-flash-lite. Cost will be calculated zero.
  warnings.warn(
assistant (to user_proxy):

'''python
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

arr = [4, 1, 5, 2, 3]
bubble_sort(arr)
print(arr)
'''


--------------------------------------------------------------------------------

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):

exitcode: 0 (execution succeeded)
Code output:
[1, 2, 3, 4, 5]

--------------------------------------------------------------------------------
assistant (to user_proxy):

ALL DONE

--------------------------------------------------------------------------------

Gemini vision example

Gemini’s models have vision capabilitiesso we can create multimodal agents.

Here, we ask a question about

import autogen
from autogen import UserProxyAgent
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent

config_list_gemini_vision = autogen.config_list_from_json(
    env_or_file="OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gemini-2.0-flash"],
    },
)

seed = None # for caching

image_agent = MultimodalConversableAgent(
    "Gemini Vision", llm_config={"config_list": config_list_gemini_vision, "seed": seed}, max_consecutive_auto_reply=1
)

user_proxy = UserProxyAgent("user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=0)

user_proxy.initiate_chat(
    image_agent,
    message="""Describe what is in this image?
<img https://github.com/ag2ai/ag2/blob/main/website/docs/user-guide/models/assets/chat-example.png>.""",
)

user_proxy (to Gemini Vision):

Describe what is in this image?
<img https://github.com/ag2ai/ag2/blob/main/website/docs/user-guide/models/assets/chat-example.png>.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Gemini Vision (to user_proxy):

The image appears to be a screenshot of a chat interface, likely from a customer service or support application. Here's a breakdown of what I can see:

*   **Chat Bubbles:** The primary visual element is a series of chat bubbles representing a conversation between two parties, presumably a user and a support agent or AI chatbot.

*   **User and Agent Avatars/Names:** There are visual cues to distinguish between the user's messages and the agent's messages. This might be through different colored chat bubbles, different alignment (left vs. right), or the presence of avatars or names associated with each message.

*   **Input Field:** There's a text input field at the bottom where the user can type their messages. It likely includes a send button or an enter-to-send functionality.

*   **Interface Elements:** It appears to be a part of a customer support/model interaction, and include buttons like thumbs up and down, and the option to report the bot.

In summary, the image depicts a typical chat interface used for customer support or interaction with AI models, focusing on clear communication and ease of use.

--------------------------------------------------------------------------------