autogen.agentchat.contrib.rag.ChromaDBQueryEngine
ChromaDBQueryEngine
This engine leverages Chromadb to persist document embeddings in a named collection
and LlamaIndex’s VectorStoreIndex to efficiently index and retrieve documents, and generate an answer in response
to natural language queries. Collection can be regarded as an abstraction of group of documents in the database.
It expects a Chromadb server to be running and accessible at the specified host and port.
Refer to this link for running Chromadb in a Docker container.
If the host and port are not provided, the engine will create an in-memory ChromaDB client.
Initializes the ChromaDBQueryEngine with db_path, metadata, and embedding function and llm.
Name | Description |
---|---|
host | Type: str | None Default: ‘localhost’ |
port | Type: int | None Default: 8000 |
settings | Type: ForwardRef(‘Settings’) | None Default: None |
tenant | Type: str | None Default: None |
database | Type: str | None Default: None |
embedding_function | Type: Optional[EmbeddingFunction[Any]] Default: None |
metadata | Type: dict[str, typing.Any] | None Default: None |
llm | Type: ForwardRef(‘LLM’) | None Default: None |
collection_name | Type: str | None Default: None |
Instance Methods
add_docs
Add new documents to the underlying database and add to the index.
Name | Description |
---|---|
new_doc_dir | A dir of input documents that are used to create the records in database. Type: pathlib.Path | str | None Default: None |
new_doc_paths_or_urls | A sequence of input documents that are used to create the records in database. A document can be a path to a file or a url. Type: Sequence[pathlib.Path | str] | None Default: None |
*args | Any additional arguments Type: Any |
**kwargs | Any additional keyword arguments Type: Any |
connect_db
Connect to the database.
It does not overwrite the existing collection in the database.
It takes the following steps,
- Set up ChromaDB and LlamaIndex storage.
2. Create the llamaIndex vector store index for querying or inserting docs later
Name | Description |
---|---|
*args | Any additional arguments Type: Any |
**kwargs | Any additional keyword arguments Type: Any |
Type | Description |
---|---|
bool | bool: True if connection is successful |
get_collection_name
Get the name of the collection used by the query engine.
Returns:
The name of the collection.
Type | Description |
---|---|
str | The name of the collection. |
init_db
Initialize the database with the input documents or records.
It overwrites the existing collection in the database.
It takes the following steps,
- Set up ChromaDB and LlamaIndex storage.
2. insert documents and build indexes upon them.
Name | Description |
---|---|
new_doc_dir | a dir of input documents that are used to create the records in database. Type: pathlib.Path | str | None Default: None |
new_doc_paths_or_urls | a sequence of input documents that are used to create the records in database. a document can be a path to a file or a url. Type: Sequence[pathlib.Path | str] | None Default: None |
*args | Any additional arguments Type: Any |
**kwargs | Any additional keyword arguments Type: Any |
Type | Description |
---|---|
bool | bool: True if initialization is successful |
query
Retrieve information from indexed documents by processing a query using the engine’s LLM.
Name | Description |
---|---|
question | A natural language query string used to search the indexed documents. Type: str |
Type | Description |
---|---|
str | A string containing the response generated by LLM. |