Google Gemini

Installation

Install AG2 with Gemini features:

pip install ag2[gemini]

Dependencies of This Notebook

In this notebook, we will explore how to use Gemini in AG2 alongside other tools. Install the necessary dependencies with the following command:

pip install ag2[gemini,retrievechat,lmm]

If you have been using autogen or ag2, all you need to do is upgrade it using:

pip install -U autogen[gemini,retrievechat,lmm]

pip install -U ag2[gemini,retrievechat,lmm]

as autogen and ag2 are aliases for the same PyPI package.

Features

There’s no need to handle OpenAI or Google’s GenAI packages separately; AG2 manages all of these for you. You can easily create different agents with various backend LLMs using the assistant agent. All models and agents are readily accessible at your fingertips.

Support features:

Function/tool calling
Structured Outputs (Notebook example)
Token usage and cost correctly as per Google’s API costs (as of December 2024)

Main Distinctions

Gemini accepts system instructions to guide model behavior. If provided, a system message is passed to Gemini’s system_instruction field.
If no API key is specified for Gemini, then authentication will happen using the default google auth mechanism for Google Cloud. Service accounts are also supported, where the JSON key file has to be provided.

Sample OAI_CONFIG_LIST

[
    {
        "model": "gemini-2.0-flash-lite",
        "api_key": "your Google's GenAI Key goes here",
        "api_type": "google"
    },
    {
        "model": "gemini-2.0-flash",
        "api_type": "google"
    },
    {
        "model": "gemini-1.5-pro",
        "project_id": "your-awesome-google-cloud-project-id",
        "location": "us-west1",
        "google_application_credentials": "your-google-service-account-key.json",
        "api_type": "google"
    }
]

You can put your Google Gemini API key in an environment variable named GOOGLE_GEMINI_API_KEY instead of using the api_key in your LLM configuration.

For guidance on using environment variables, see our LLMs documentation.

Gemini coding example

import autogen
from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from autogen.code_utils import content_str

seed = 42  # for caching, use None for no caching
llm_config_gemini = LLMConfig.from_json(
    path="OAI_CONFIG_LIST",
    seed=seed,
).where(model="gemini-2.0-flash-lite")

with llm_config_gemini:
    assistant = AssistantAgent(
        name="assistant",
        system_message=(
        "You are a helpful coding assistant. "
        "After your code has been executed and you have a code execution result, say 'ALL DONE'. "
        "Do not say 'ALL DONE' in the same response as code."
        ),
        max_consecutive_auto_reply=3
    )

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={"work_dir": "coding", "use_docker": False},
    human_input_mode="NEVER",
    is_termination_msg=lambda x: content_str(x.get("content")).find("ALL DONE") >= 0,
)

result = user_proxy.initiate_chat(
  recipient=assistant,
  message="Sort the array with Bubble Sort: [4, 1, 5, 2, 3]"
)

user_proxy (to assistant):

Sort the array with Bubble Sort: [4, 1, 5, 2, 3]

--------------------------------------------------------------------------------
/app/ag2/autogen/oai/gemini.py:880: UserWarning: Cost calculation is not implemented for model gemini-2.0-flash-lite. Cost will be calculated zero.
  warnings.warn(
assistant (to user_proxy):

'''python
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]

arr = [4, 1, 5, 2, 3]
bubble_sort(arr)
print(arr)
'''

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):

exitcode: 0 (execution succeeded)
Code output:
[1, 2, 3, 4, 5]

--------------------------------------------------------------------------------
assistant (to user_proxy):

ALL DONE

--------------------------------------------------------------------------------

Gemini vision example

Gemini’s models have vision capabilitiesso we can create multimodal agents.

Here, we ask a question about

import autogen
from autogen import UserProxyAgent, LLMConfig
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent

seed = None # for caching
llm_config_gemini_vision = LLMConfig.from_json(
    path="OAI_CONFIG_LIST",
    seed=seed,
).where(model="gemini-2.0-flash")

with llm_config_gemini_vision:
    image_agent = MultimodalConversableAgent(
        "Gemini Vision",
        max_consecutive_auto_reply=1
    )

user_proxy = UserProxyAgent("user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=0)

user_proxy.initiate_chat(
    image_agent,
    message="""Describe what is in this image?
<img https://github.com/ag2ai/ag2/blob/main/website/docs/user-guide/models/assets/chat-example.png>.""",
)

user_proxy (to Gemini Vision):

Describe what is in this image?
<img https://github.com/ag2ai/ag2/blob/main/website/docs/user-guide/models/assets/chat-example.png>.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Gemini Vision (to user_proxy):

The image appears to be a screenshot of a chat interface, likely from a customer service or support application. Here's a breakdown of what I can see:

*   **Chat Bubbles:** The primary visual element is a series of chat bubbles representing a conversation between two parties, presumably a user and a support agent or AI chatbot.

*   **User and Agent Avatars/Names:** There are visual cues to distinguish between the user's messages and the agent's messages. This might be through different colored chat bubbles, different alignment (left vs. right), or the presence of avatars or names associated with each message.

*   **Input Field:** There's a text input field at the bottom where the user can type their messages. It likely includes a send button or an enter-to-send functionality.

*   **Interface Elements:** It appears to be a part of a customer support/model interaction, and include buttons like thumbs up and down, and the option to report the bot.

In summary, the image depicts a typical chat interface used for customer support or interaction with AI models, focusing on clear communication and ease of use.

--------------------------------------------------------------------------------

Edit this page

User Guide

​Installation

​Dependencies of This Notebook

​Features

​Main Distinctions

​Gemini coding example

​Gemini vision example

Installation

Dependencies of This Notebook

Features

Main Distinctions

Gemini coding example

Gemini vision example