autogen.agentchat.contrib.qdrant_retrieve_user_proxy_agent.create_qdrant_from_dir

create_qdrant_from_dir

create_qdrant_from_dir(
    dir_path: str,
    max_tokens: int = 4000,
    client: QdrantClient = None,
    collection_name: str = 'all-my-documents',
    chunk_mode: str = 'multi_lines',
    must_break_at_empty_line: bool = True,
    embedding_model: str = 'BAAI/bge-small-en-v1.5',
    custom_text_split_function: Callable = None,
    custom_text_types: list[str] = ['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf', 'mdx'],
    recursive: bool = True,
    extra_docs: bool = False,
    parallel: int = 0,
    on_disk: bool = False,
    quantization_config: ForwardRef('QuantizationConfig') | None = None,
    hnsw_config: ForwardRef('HnswConfigDiff') | None = None,
    payload_indexing: bool = False,
    qdrant_client_options: dict[str, Any] | None = {}
) ->

Create a Qdrant collection from all the files in a given directory, the directory can also be a single file or a url to a single file.

Parameters:

Name	Description
`dir_path`	the path to the directory, file or url. Type: str
`max_tokens`	the maximum number of tokens per chunk. Default is 4000. Type: int Default: 4000
`client`	the QdrantClient instance. Default is None. Type: QdrantClient Default: None
`collection_name`	the name of the collection. Default is “all-my-documents”. Type: str Default: ‘all-my-documents’
`chunk_mode`	the chunk mode. Default is “multi_lines”. Type: str Default: ‘multi_lines’
`must_break_at_empty_line`	Whether to break at empty line. Default is True. Type: bool Default: True
`embedding_model`	the embedding model to use. Default is “BAAI/bge-small-en-v1.5”. The list of all the available models can be at https://qdrant.github.io/fastembed/examples/Supported_Models/. Type: str Default: ‘BAAI/bge-small-en-v1.5’
`custom_text_split_function`	a custom function to split a string into a list of strings. Default is None, will use the default function in `autogen.retrieve_utils.split_text_to_chunks`. Type: Callable Default: None
`custom_text_types`	a list of file types to be processed. Default is TEXT_FORMATS. Type: list[str] Default: [‘txt’, ‘json’, ‘csv’, ‘tsv’, ‘md’, ‘html’, ‘htm’, ‘rtf’, ‘rst’, ‘jsonl’, ‘log’, ‘xml’, ‘yaml’, ‘yml’, ‘pdf’, ‘mdx’]
`recursive`	whether to search documents recursively in the dir_path. Default is True. Type: bool Default: True
`extra_docs`	whether to add more documents in the collection. Default is False Type: bool Default: False
`parallel`	How many parallel workers to use for embedding. Defaults to the number of CPU cores Type: int Default: 0
`on_disk`	Whether to store the collection on disk. Default is False. Type: bool Default: False
`quantization_config`	Quantization configuration. If None, quantization will be disabled. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.QuantizationConfig’) \| None Default: None
`hnsw_config`	HNSW configuration. If None, default configuration will be used. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.HnswConfigDiff’) \| None Default: None
`payload_indexing`	Whether to create a payload index for the document field. Default is False. Type: bool Default: False
`qdrant_client_options`	(Optional, dict): the options for instantiating the qdrant client. Ref: https://github.com/qdrant/qdrant-client/blob/master/qdrant_client/qdrant_client.py#L36-L58. Type: dict[str, typing.Any] \| None Default: {}

QdrantRetrieveUserProxyAgent query_qdrant

On this page

create_qdrant_from_dir

autogen
- Overview
- Agent
- AgentNameConflictError
- AssistantAgent
- Cache
- ChatResult
- ContextExpression
- ConversableAgent
- GroupChat
- GroupChatManager
- InvalidCarryOverTypeError
- LLMConfig
- ModelClient
- NoEligibleSpeakerError
- OpenAIWrapper
- SenderRequiredError
- UndefinedNextAgentError
- UpdateSystemMessage
- UserProxyAgent
- a_initiate_swarm_chat
- a_run_swarm
- config_list_from_dotenv
- config_list_from_json
- config_list_from_models
- config_list_gpt4_gpt35
- config_list_openai_aoai
- filter_config
- gather_usage_summary
- get_config_list
- initiate_chats
- register_function
- run_swarm
- agentchat
  - Overview
  - a_initiate_chats
  - a_initiate_group_chat
  - a_run_group_chat
  - run_group_chat
  - chat
  - contrib
    - agent_eval
    - agent_optimizer
    - capabilities
    - captainagent
    - gpt_assistant_agent
    - graph_rag
    - img_utils
    - llamaindex_conversable_agent
    - llava_agent
    - math_user_proxy_agent
    - multimodal_conversable_agent
    - qdrant_retrieve_user_proxy_agent
      - Overview
      - QdrantRetrieveUserProxyAgent
      - create_qdrant_from_dir
      - query_qdrant
    - rag
    - retrieve_assistant_agent
    - retrieve_user_proxy_agent
    - society_of_mind_agent
    - swarm_agent
    - text_analyzer_agent
    - vectordb
    - web_surfer
  - group
  - realtime
  - utils
- agents
- browser_utils
- cache
- code_utils
- coding
- doc_utils
- events
- exception_utils
- fast_depends
- formatting_utils
- graph_utils
- import_utils
- interop
- io
- json_utils
- llm_config
- logger
- math_utils
- mcp
- messages
- oai
- retrieve_utils
- runtime_logging
- token_count_utils
- tools
- types

create_qdrant_from_dir

create_qdrant_from_dir(
    dir_path: str,
    max_tokens: int = 4000,
    client: QdrantClient = None,
    collection_name: str = 'all-my-documents',
    chunk_mode: str = 'multi_lines',
    must_break_at_empty_line: bool = True,
    embedding_model: str = 'BAAI/bge-small-en-v1.5',
    custom_text_split_function: Callable = None,
    custom_text_types: list[str] = ['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf', 'mdx'],
    recursive: bool = True,
    extra_docs: bool = False,
    parallel: int = 0,
    on_disk: bool = False,
    quantization_config: ForwardRef('QuantizationConfig') | None = None,
    hnsw_config: ForwardRef('HnswConfigDiff') | None = None,
    payload_indexing: bool = False,
    qdrant_client_options: dict[str, Any] | None = {}
) ->

Create a Qdrant collection from all the files in a given directory, the directory can also be a single file or a url to a single file.

Parameters:

Name	Description
`dir_path`	the path to the directory, file or url. Type: str
`max_tokens`	the maximum number of tokens per chunk. Default is 4000. Type: int Default: 4000
`client`	the QdrantClient instance. Default is None. Type: QdrantClient Default: None
`collection_name`	the name of the collection. Default is “all-my-documents”. Type: str Default: ‘all-my-documents’
`chunk_mode`	the chunk mode. Default is “multi_lines”. Type: str Default: ‘multi_lines’
`must_break_at_empty_line`	Whether to break at empty line. Default is True. Type: bool Default: True
`embedding_model`	the embedding model to use. Default is “BAAI/bge-small-en-v1.5”. The list of all the available models can be at https://qdrant.github.io/fastembed/examples/Supported_Models/. Type: str Default: ‘BAAI/bge-small-en-v1.5’
`custom_text_split_function`	a custom function to split a string into a list of strings. Default is None, will use the default function in `autogen.retrieve_utils.split_text_to_chunks`. Type: Callable Default: None
`custom_text_types`	a list of file types to be processed. Default is TEXT_FORMATS. Type: list[str] Default: [‘txt’, ‘json’, ‘csv’, ‘tsv’, ‘md’, ‘html’, ‘htm’, ‘rtf’, ‘rst’, ‘jsonl’, ‘log’, ‘xml’, ‘yaml’, ‘yml’, ‘pdf’, ‘mdx’]
`recursive`	whether to search documents recursively in the dir_path. Default is True. Type: bool Default: True
`extra_docs`	whether to add more documents in the collection. Default is False Type: bool Default: False
`parallel`	How many parallel workers to use for embedding. Defaults to the number of CPU cores Type: int Default: 0
`on_disk`	Whether to store the collection on disk. Default is False. Type: bool Default: False
`quantization_config`	Quantization configuration. If None, quantization will be disabled. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.QuantizationConfig’) \| None Default: None
`hnsw_config`	HNSW configuration. If None, default configuration will be used. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.HnswConfigDiff’) \| None Default: None
`payload_indexing`	Whether to create a payload index for the document field. Default is False. Type: bool Default: False
`qdrant_client_options`	(Optional, dict): the options for instantiating the qdrant client. Ref: https://github.com/qdrant/qdrant-client/blob/master/qdrant_client/qdrant_client.py#L36-L58. Type: dict[str, typing.Any] \| None Default: {}

QdrantRetrieveUserProxyAgent query_qdrant

On this page

create_qdrant_from_dir

​create_qdrant_from_dir

API Reference

​create_qdrant_from_dir

create_qdrant_from_dir

create_qdrant_from_dir