agentchat.contrib.capabilities.generate_images
ImageGenerator
This class defines an interface for image generators.
Concrete implementations of this protocol must provide a generate_image
method that takes a string prompt as
input and returns a PIL Image object.
NOTE: Current implementation does not allow you to edit a previously existing image.
generate_image
Generates an image based on the provided prompt.
Arguments:
prompt
- A string describing the desired image.
Returns:
A PIL Image object representing the generated image.
Raises:
ValueError
- If the image generation fails.
cache_key
Generates a unique cache key for the given prompt.
This key can be used to store and retrieve generated images based on the prompt.
Arguments:
prompt
- A string describing the desired image.
Returns:
A unique string that can be used as a cache key.
DalleImageGenerator
Generates images using OpenAI’s DALL-E models.
This class provides a convenient interface for generating images based on textual prompts using OpenAI’s DALL-E models. It allows you to specify the DALL-E model, resolution, quality, and the number of images to generate.
Note: Current implementation does not allow you to edit a previously existing image.
__init__
Arguments:
llm_config
dict - llm config, must contain a valid dalle model and OpenAI API key in config_list.resolution
str - The resolution of the image you want to generate. Must be one of “256x256”, “512x512”, “1024x1024”, “1792x1024”, “1024x1792”.quality
str - The quality of the image you want to generate. Must be one of “standard”, “hd”.num_images
int - The number of images to generate.
ImageGeneration
This capability allows a ConversableAgent to generate images based on the message received from other Agents.
- Utilizes a TextAnalyzerAgent to analyze incoming messages to identify requests for image generation and extract relevant details.
- Leverages the provided ImageGenerator (e.g., DalleImageGenerator) to create the image.
- Optionally caches generated images for faster retrieval in future conversations.
NOTE: This capability increases the token usage of the agent, as it uses TextAnalyzerAgent to analyze every message received by the agent.
Example:
__init__
Arguments:
image_generator
ImageGenerator - The image generator you would like to use to generate images.cache
None or AbstractCache - The cache client to use to store and retrieve generated images. If None, no caching will be used.text_analyzer_llm_config
Dict or None - The LLM config for the text analyzer. If None, the LLM config will be retrieved from the agent you’re adding the ability to.text_analyzer_instructions
str - Instructions provided to the TextAnalyzerAgent used to analyze incoming messages and extract the prompt for image generation. The default instructions focus on summarizing the prompt. You can customize the instructions to achieve more granular control over prompt extraction.Example
- ‘Extract specific details from the message, like desired objects, styles, or backgrounds.’verbosity
int - The verbosity level. Defaults to 0 and must be greater than or equal to 0. The text analyzer llm calls will be silent if verbosity is less than 2.register_reply_position
int - The position of the reply function in the agent’s list of reply functions. This capability registers a new reply function to handle messages with image generation requests. Defaults to 2 to place it after the check termination and human reply for a ConversableAgent.
add_to_agent
Adds the Image Generation capability to the specified ConversableAgent.
This function performs the following modifications to the agent:
- Registers a reply function: A new reply function is registered with the agent to handle messages that potentially request image generation. This function analyzes the message and triggers image generation if necessary.
- Creates an Agent (TextAnalyzerAgent): This is used to analyze messages for image generation requirements.
- Updates System Message: The agent’s system message is updated to include a message indicating the capability to generate images has been added.
- Updates Description: The agent’s description is updated to reflect the addition of the Image Generation capability. This might be helpful in certain use cases, like group chats.
Arguments:
agent
ConversableAgent - The ConversableAgent to add the capability to.