img_utils
autogen.agentchat.contrib.img_utils.num_tokens_from_gpt_image
num_tokens_from_gpt_image
Calculate the number of tokens required to process an image based on its dimensions
after scaling for different GPT models. Supports “gpt-4-vision”, “gpt-4o”, and “gpt-4o-mini”.
This function scales the image so that its longest edge is at most 2048 pixels and its shortest
edge is at most 768 pixels (for “gpt-4-vision”). It then calculates the number of 512x512 tiles
needed to cover the scaled image and computes the total tokens based on the number of these tiles.
Reference: https://openai.com/api/pricing/
Name | Description |
---|---|
image_data | Union[str, Image.Image]: The image data which can either be a base64 encoded string, a URL, a file path, or a PIL Image object. Type: str | ForwardRef(‘Image.Image’) |
model | str: The model being used for image processing. Can be “gpt-4-vision”, “gpt-4o”, or “gpt-4o-mini”. Type: str Default: ‘gpt-4-vision’ |
low_quality | Type: bool Default: False |
Type | Description |
---|---|
int | int: The total number of tokens required for processing the image. |