Installation
To integrateCrawl4AI with AG2, follow these steps:
-
Install AG2 with the
crawl4aiextra:Copypip install ag2[openai,crawl4ai]If you have been usingautogenorag2, all you need to do is upgrade it using:orCopypip install -U autogen[openai,crawl4ai]asCopypip install -U ag2[openai,crawl4ai]autogenandag2are aliases for the same PyPI package. -
Set up Playwright:
Copy
# Installs Playwright and browsers for all OS playwright install # Additional command, mandatory for Linux only playwright install-deps -
For running the code in Jupyter, use
nest_asyncioto allow nested event loops.Copypip install nest_asyncio
Imports
Copy
import os
import nest_asyncio
from pydantic import BaseModel
from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from autogen.tools.experimental import Crawl4AITool
nest_asyncio.apply()
Agent Configuration
Configure the agents for the interaction.config_listdefines the LLM configurations, including the model and API key.UserProxyAgentsimulates user inputs without requiring actual human interaction (set toNEVER).AssistantAgentrepresents the AI agent, configured with the LLM settings.
Copy
config_list = [{"}]
llm_config = LLMConfig(api_type="openai", model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
with llm_config:
assistant = AssistantAgent(name="assistant")
Web Browsing with Crawl4AI
Crawl4AITool offers three integration modes:
- Basic Web Scraping (No LLM):
Copy
crawlai_tool = Crawl4AITool()
- Web Scraping with LLM Processing:
Copy
crawlai_tool = Crawl4AITool(llm_config=llm_config)
- LLM Processing with Schema for Structured Data:
Copy
class Blog(BaseModel):
title: str
url: str
crawlai_tool = Crawl4AITool(llm_config=llm_config, extraction_model=Blog)
Copy
crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
Initiate Chat
Copy
message = "Extract all blog posts from https://docs.ag2.ai/docs/blog"
result = user_proxy.initiate_chat(
recipient=assistant,
message=message,
max_turns=2,
)
Copy
user_proxy (to assistant):
Extract all blog posts from https://docs.ag2.ai/docs/blog
--------------------------------------------------------------------------------
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
assistant (to user_proxy):
***** Suggested tool call (call_R1DUz9LKPxE3t0wiy94OotAR): crawl4ai *****
Arguments:
{"url":"https://docs.ag2.ai/docs/blog","instruction":"Extract all blog posts including titles, dates, and links."}
*************************************************************************
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING FUNCTION crawl4ai...
Call ID: call_R1DUz9LKPxE3t0wiy94OotAR
Input arguments: {'url': 'https://docs.ag2.ai/docs/blog', 'instruction': 'Extract all blog posts including titles, dates, and links.'}
[INIT].... → Crawl4AI 0.4.247
[FETCH]... ↓ https://docs.ag2.ai/docs/blog... | Status: True | Time: 8.28s
[SCRAPE].. ◆ Processed https://docs.ag2.ai/docs/blog... | Time: 23ms
12:16:26 - LiteLLM:INFO: utils.py:2825 -
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o-mini; provider = openai
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
12:17:04 - LiteLLM:INFO: utils.py:1030 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[EXTRACT]. ■ Completed for https://docs.ag2.ai/docs/blog... | Time: 38.029349499964155s
[COMPLETE] ● https://docs.ag2.ai/docs/blog... | Status: True | Total: 46.34s
user_proxy (to assistant):
***** Response from calling tool (call_R1DUz9LKPxE3t0wiy94OotAR) *****
[
{
"title": "RealtimeAgent with Gemini API",
"url": "https://docs.ag2.ai/docs/blog/2025-01-29-RealtimeAgent-with-gemini/index",
"error": false
},
{
"title": "Tools with ChatContext Dependency Injection",
"url": "https://docs.ag2.ai/docs/blog/2025-01-22-Tools-ChatContext-Dependency-Injection/index",
"error": false
},
{
"title": "Streaming input and output using WebSockets",
"url": "https://docs.ag2.ai/docs/blog/2025-01-10-WebSockets/index",
"error": false
},
{
"title": "Real-Time Voice Interactions over WebRTC",
"url": "https://docs.ag2.ai/docs/blog/2025-01-09-RealtimeAgent-over-WebRTC/index",
"error": false
},
{
"title": "Real-Time Voice Interactions with the WebSocket Audio Adapter",
"url": "https://docs.ag2.ai/docs/blog/2025-01-08-RealtimeAgent-over-websocket/index",
"error": false
},
{
"title": "Tools Dependency Injection",
"url": "https://docs.ag2.ai/docs/blog/2025-01-07-Tools-Dependency-Injection/index",
"error": false
},
{
"title": "Cross-Framework LLM Tool Integration with AG2",
"url": "https://docs.ag2.ai/docs/blog/2024-12-20-Tools-interoperability/index",
"error": false
},
{
"title": "ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning",
"url": "https://docs.ag2.ai/docs/blog/2024-12-20-Reasoning-Update/index",
"error": false
},
{
"title": "Introducing RealtimeAgent Capabilities in AG2",
"url": "https://docs.ag2.ai/docs/blog/2024-12-20-RealtimeAgent/index",
"error": false
},
{
"title": "Knowledgeable Agents with FalkorDB Graph RAG",
"url": "https://docs.ag2.ai/docs/blog/2024-12-06-FalkorDB-Structured/index",
"error": false
},
{
"title": "ReasoningAgent - Tree of Thoughts with Beam Search in AG2",
"url": "https://docs.ag2.ai/docs/blog/2024-12-02-ReasoningAgent2/index",
"error": false
},
{
"title": "Agentic testing for prompt leakage security",
"url": "https://docs.ag2.ai/docs/blog/2024-11-27-Prompt-Leakage-Probing/index",
"error": false
},
{
"title": "Building Swarm-based agents with AG2",
"url": "https://docs.ag2.ai/docs/blog/2024-11-17-Swarm/index",
"error": false
},
{
"title": "Introducing CaptainAgent for Adaptive Team Building",
"url": "https://docs.ag2.ai/docs/blog/2024-11-15-CaptainAgent/index",
"error": false
},
{
"title": "Unlocking the Power of Agentic Workflows at Nexla with Autogen",
"url": "https://docs.ag2.ai/docs/blog/2024-10-23-NOVA/index",
"error": false
},
{
"title": "AgentOps, the Best Tool for AutoGen Agent Observability",
"url": "https://docs.ag2.ai/docs/blog/2024-07-25-AgentOps/index",
"error": false
},
{
"title": "Enhanced Support for Non-OpenAI Models",
"url": "https://docs.ag2.ai/docs/blog/2024-06-24-AltModels-Classes/index",
"error": false
},
{
"title": "AgentEval: A Developer Tool to Assess Utility of LLM-powered Applications",
"url": "https://docs.ag2.ai/docs/blog/2024-06-21-AgentEval/index",
"error": false
},
{
"title": "Agents in AutoGen",
"url": "https://docs.ag2.ai/docs/blog/2024-05-24-Agent/index",
"error": false
},
{
"title": "AutoDefense - Defend against jailbreak attacks with AutoGen",
"url": "https://docs.ag2.ai/docs/blog/2024-03-11-AutoDefense/index",
"error": false
},
{
"title": "What's New in AutoGen?",
"url": "https://docs.ag2.ai/docs/blog/2024-03-03-AutoGen-Update/index",
"error": false
},
{
"title": "StateFlow - Build State-Driven Workflows with Customized Speaker Selection in GroupChat",
"url": "https://docs.ag2.ai/docs/blog/2024-02-29-StateFlow/index",
"error": false
},
{
"title": "FSM Group Chat -- User-specified agent transitions",
"url": "https://docs.ag2.ai/docs/blog/2024-02-11-FSM-GroupChat/index",
"error": false
},
{
"title": "Anny: Assisting AutoGen Devs Via AutoGen",
"url": "https://docs.ag2.ai/docs/blog/2024-02-02-AutoAnny/index",
"error": false
},
{
"title": "AutoGen with Custom Models: Empowering Users to Use Their Own Inference Mechanism",
"url": "https://docs.ag2.ai/latest/docs/blog/2024/01/26/Custom-Models/",
"error": false
},
{
"title": "AutoGenBench -- A Tool for Measuring and Evaluating AutoGen Agents",
"url": "https://docs.ag2.ai/docs/blog/2024-01-25-AutoGenBench/index",
"error": false
},
{
"title": "Code execution is now by default inside docker container",
"url": "https://docs.ag2.ai/docs/blog/2024-01-23-Code-execution-in-docker/index",
"error": false
},
{
"title": "All About Agent Descriptions",
"url": "https://docs.ag2.ai/latest/docs/blog/2023/12/29/AgentDescriptions/",
"error": false
},
{
"title": "AgentOptimizer - An Agentic Way to Train Your LLM Agent",
"url": "https://docs.ag2.ai/docs/blog/2023-12-23-AgentOptimizer/index",
"error": false
},
{
"title": "AutoGen Studio: Interactively Explore Multi-Agent Workflows",
"url": "https://docs.ag2.ai/docs/blog/2023-12-01-AutoGenStudio/index",
"error": false
},
{
"title": "Agent AutoBuild - Automatically Building Multi-agent Systems",
"url": "https://docs.ag2.ai/docs/blog/2023-11-26-Agent-AutoBuild/index",
"error": false
},
{
"title": "How to Assess Utility of LLM-powered Applications?",
"url": "https://docs.ag2.ai/docs/blog/2023-11-20-AgentEval/index",
"error": false
},
{
"title": "AutoGen Meets GPTs",
"url": "https://docs.ag2.ai/docs/blog/2023-11-13-OAI-assistants/index",
"error": false
},
{
"title": "EcoAssistant - Using LLM Assistants More Accurately and Affordably",
"url": "https://docs.ag2.ai/docs/blog/2023-11-09-EcoAssistant/index",
"error": false
},
{
"title": "Multimodal with GPT-4V and LLaVA",
"url": "https://docs.ag2.ai/docs/blog/2023-11-06-LMM-Agent/index",
"error": false
},
{
"title": "AutoGen's Teachable Agents",
"url": "https://docs.ag2.ai/docs/blog/2023-10-26-TeachableAgent/index",
"error": false
},
{
"title": "Retrieval-Augmented Generation (RAG) Applications with AutoGen",
"url": "https://docs.ag2.ai/docs/blog/2023-10-18-RetrieveChat/index",
"error": false
},
{
"title": "Use AutoGen for Local LLMs",
"url": "https://docs.ag2.ai/docs/blog/2023-07-14-Local-LLMs/index",
"error": false
},
{
"title": "MathChat - An Conversational Framework to Solve Math Problems",
"url": "https://docs.ag2.ai/docs/blog/2023-06-28-MathChat/index",
"error": false
},
{
"title": "Achieve More, Pay Less - Use GPT-4 Smartly",
"url": "https://docs.ag2.ai/docs/blog/2023-05-18-GPT-adaptive-humaneval/index",
"error": false
},
{
"title": "Does Model and Inference Parameter Matter in LLM Applications? - A Case Study for MATH",
"url": "https://docs.ag2.ai/latest/docs/blog/2023/04/21/LLM-tuning-math",
"error": false
}
]
**********************************************************************
--------------------------------------------------------------------------------
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
assistant (to user_proxy):
I have extracted all the blog posts from the specified URL. Below is the list of titles along with their corresponding links:
1. [RealtimeAgent with Gemini API](https://docs.ag2.ai/docs/blog/2025-01-29-RealtimeAgent-with-gemini/index)
2. [Tools with ChatContext Dependency Injection](https://docs.ag2.ai/docs/blog/2025-01-22-Tools-ChatContext-Dependency-Injection/index)
3. [Streaming input and output using WebSockets](https://docs.ag2.ai/docs/blog/2025-01-10-WebSockets/index)
4. [Real-Time Voice Interactions over WebRTC](https://docs.ag2.ai/docs/blog/2025-01-09-RealtimeAgent-over-WebRTC/index)
5. [Real-Time Voice Interactions with the WebSocket Audio Adapter](https://docs.ag2.ai/docs/blog/2025-01-08-RealtimeAgent-over-websocket/index)
6. [Tools Dependency Injection](https://docs.ag2.ai/docs/blog/2025-01-07-Tools-Dependency-Injection/index)
7. [Cross-Framework LLM Tool Integration with AG2](https://docs.ag2.ai/docs/blog/2024-12-20-Tools-interoperability/index)
8. [ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning](https://docs.ag2.ai/docs/blog/2024-12-20-Reasoning-Update/index)
9. [Introducing RealtimeAgent Capabilities in AG2](https://docs.ag2.ai/docs/blog/2024-12-20-RealtimeAgent/index)
10. [Knowledgeable Agents with FalkorDB Graph RAG](https://docs.ag2.ai/docs/blog/2024-12-06-FalkorDB-Structured/index)
11. [ReasoningAgent - Tree of Thoughts with Beam Search in AG2](https://docs.ag2.ai/docs/blog/2024-12-02-ReasoningAgent2/index)
12. [Agentic testing for prompt leakage security](https://docs.ag2.ai/docs/blog/2024-11-27-Prompt-Leakage-Probing/index)
13. [Building Swarm-based agents with AG2](https://docs.ag2.ai/docs/blog/2024-11-17-Swarm/index)
14. [Introducing CaptainAgent for Adaptive Team Building](https://docs.ag2.ai/docs/blog/2024-11-15-CaptainAgent/index)
15. [Unlocking the Power of Agentic Workflows at Nexla with Autogen](https://docs.ag2.ai/docs/blog/2024-10-23-NOVA/index)
16. [AgentOps, the Best Tool for AutoGen Agent Observability](https://docs.ag2.ai/docs/blog/2024-07-25-AgentOps/index)
17. [Enhanced Support for Non-OpenAI Models](https://docs.ag2.ai/docs/blog/2024-06-24-AltModels-Classes/index)
18. [AgentEval: A Developer Tool to Assess Utility of LLM-powered Applications](https://docs.ag2.ai/docs/blog/2024-06-21-AgentEval/index)
19. [Agents in AutoGen](https://docs.ag2.ai/docs/blog/2024-05-24-Agent/index)
20. [AutoDefense - Defend against jailbreak attacks with AutoGen](https://docs.ag2.ai/docs/blog/2024-03-11-AutoDefense/index)
21. [What's New in AutoGen?](https://docs.ag2.ai/docs/blog/2024-03-03-AutoGen-Update/index)
...
TERMINATE
--------------------------------------------------------------------------------