qdrant_retrieve_user_proxy_agent
autogen.agentchat.contrib.qdrant_retrieve_user_proxy_agent.create_qdrant_from_dir
create_qdrant_from_dir
Create a Qdrant collection from all the files in a given directory, the directory can also be a single file or a
url to a single file.
Name | Description |
---|---|
dir_path | the path to the directory, file or url. Type: str |
max_tokens | the maximum number of tokens per chunk. Default is 4000. Type: int Default: 4000 |
client | the QdrantClient instance. Default is None. Type: QdrantClient Default: None |
collection_name | the name of the collection. Default is “all-my-documents”. Type: str Default: ‘all-my-documents’ |
chunk_mode | the chunk mode. Default is “multi_lines”. Type: str Default: ‘multi_lines’ |
must_break_at_empty_line | Whether to break at empty line. Default is True. Type: bool Default: True |
embedding_model | the embedding model to use. Default is “BAAI/bge-small-en-v1.5”. The list of all the available models can be at https://qdrant.github.io/fastembed/examples/Supported_Models/. Type: str Default: ‘BAAI/bge-small-en-v1.5’ |
custom_text_split_function | a custom function to split a string into a list of strings. Default is None, will use the default function in autogen.retrieve_utils.split_text_to_chunks .Type: Callable Default: None |
custom_text_types | a list of file types to be processed. Default is TEXT_FORMATS. Type: list[str] Default: [‘txt’, ‘json’, ‘csv’, ‘tsv’, ‘md’, ‘html’, ‘htm’, ‘rtf’, ‘rst’, ‘jsonl’, ‘log’, ‘xml’, ‘yaml’, ‘yml’, ‘pdf’, ‘mdx’] |
recursive | whether to search documents recursively in the dir_path. Default is True. Type: bool Default: True |
extra_docs | whether to add more documents in the collection. Default is False Type: bool Default: False |
parallel | How many parallel workers to use for embedding. Defaults to the number of CPU cores Type: int Default: 0 |
on_disk | Whether to store the collection on disk. Default is False. Type: bool Default: False |
quantization_config | Quantization configuration. If None, quantization will be disabled. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.QuantizationConfig’) | None Default: None |
hnsw_config | HNSW configuration. If None, default configuration will be used. Ref: https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection Type: ForwardRef(‘models.HnswConfigDiff’) | None Default: None |
payload_indexing | Whether to create a payload index for the document field. Default is False. Type: bool Default: False |
qdrant_client_options | (Optional, dict): the options for instantiating the qdrant client. Ref: https://github.com/qdrant/qdrant-client/blob/master/qdrant_client/qdrant_client.py#L36-L58. Type: dict[str, typing.Any] | None Default: {} |