DeepSeek: Adding Browsing Capabilities to AG2
browser-use
framework.
With browser-use
,your agents can navigate websites, gather dynamic
content, and interact with web pages. This opens up new possibilities
for tasks like data collection, web automation, and more.
Browser Use
requires Python 3.11 or higher.
To get started with the browser-use
integration in AG2, follow these
steps:
browser-use
extra:
Note: If you have been usingautogen
orag2
, all you need to do is upgrade it using:orasautogen
, andag2
are aliases for the same PyPI package.
config_list
defines the LLM configurations, including the model
and API key.UserProxyAgent
simulates user inputs without requiring actual
human interaction (set to NEVER
).AssistantAgent
represents the AI agent, configured with the LLM
settings.Note:Browser Use
supports the following models: Supported Models We had great experience withOpenAI
,Anthropic
, andGemini
. However,DeepSeek
andOllama
haven’t performed as well.
BrowserUseTool
enables agents to interact with web browsers,
allowing them to access, navigate, and perform actions on websites as
part of their tasks. It acts as a bridge between the language model and
the browser, empowering the agent to browse the web, search for
information, and interact with dynamic web content.
To see what the agents are doing in real-time, set the headless
option
within the browser_config
to False
. This ensures that the browser
runs in a visible window, allowing you to observe the agents’
interactions with the websites. By default, setting headless=True
would run the browser in the background without a GUI, useful for
automated tasks where visibility is not necessary.
nest_asyncio
to allow nested
event loops.
user_proxy.initiate_chat()
method triggers the assistant to
perform a web browsing task, such as searching for “AG2” on Reddit,
clicking the first post, and extracting the first comment. The assistant
then executes the task using the BrowserUseTool
and returns the
extracted content to the user.