agentchat.contrib.vectordb.mongodb
with_id_rename
Utility changes _id field from Collection into id for Document.
MongoDBAtlasVectorDB
A Collection object for MongoDB.
__init__
Initialize the vector database.
Arguments:
connection_string
- str | The MongoDB connection string to connect to. Default is ”.database_name
- str | The name of the database. Default is ‘vector_db’.embedding_function
- Callable | The embedding function used to generate the vector representation.collection_name
- str | The name of the collection to create for this vector database Defaults to Noneindex_name
- str | Index name for the vector database, defaults to ‘vector_index’overwrite
- bool = Falsewait_until_index_ready
- float | None | Blocking call to wait until the database indexes are ready. None, the default, means no wait.wait_until_document_ready
- float | None | Blocking call to wait until the database indexes are ready. None, the default, means no wait.
list_collections
List the collections in the vector database.
Returns:
List[str] | The list of collections.
create_collection
Create a collection in the vector database and create a vector search index in the collection.
Arguments:
collection_name
- str | The name of the collection.overwrite
- bool | Whether to overwrite the collection if it exists. Default is False.get_or_create
- bool | Whether to get or create the collection. Default is True
create_index_if_not_exists
Creates a vector search index on the specified collection in MongoDB.
Arguments:
MONGODB_INDEX
str, optional - The name of the vector search index to create. Defaults to “vector_search_index”.collection
Collection, optional - The MongoDB collection to create the index on. Defaults to None.
get_collection
Get the collection from the vector database.
Arguments:
collection_name
- str | The name of the collection. Default is None. If None, return the current active collection.
Returns:
Collection | The collection object.
delete_collection
Delete the collection from the vector database.
Arguments:
collection_name
- str | The name of the collection.
create_vector_search_index
Create a vector search index in the collection.
Arguments:
collection
- An existing Collection in the Atlas Database.index_name
- Vector Search Index name.similarity
- Algorithm used for measuring vector similarity.kwargs
- Additional keyword arguments.
Returns:
None
insert_docs
Insert Documents and Vector Embeddings into the collection of the vector database.
For large numbers of Documents, insertion is performed in batches.
Arguments:
docs
- List[Document] | A list of documents. Each document is a TypedDictDocument
.collection_name
- str | The name of the collection. Default is None.upsert
- bool | Whether to update the document if it exists. Default is False.batch_size
- Number of documents to be inserted in each batch
update_docs
Update documents, including their embeddings, in the Collection.
Optionally allow upsert as kwarg.
Uses deepcopy to avoid changing docs.
Arguments:
docs
- List[Document] | A list of documents.collection_name
- str | The name of the collection. Default is None.kwargs
- Any | Use upsert=True` to insert documents whose ids are not present in collection.
delete_docs
Delete documents from the collection of the vector database.
Arguments:
ids
- List[ItemID] | A list of document ids. Each id is a typedItemID
.collection_name
- str | The name of the collection. Default is None.
get_docs_by_ids
Retrieve documents from the collection of the vector database based on the ids.
Arguments:
ids
- List[ItemID] | A list of document ids. If None, will return all the documents. Default is None.collection_name
- str | The name of the collection. Default is None.include
- List[str] | The fields to include. If None, will include [“metadata”, “content”], ids will always be included. Basically, use include to choose whether to include embedding and metadatakwargs
- dict | Additional keyword arguments.
Returns:
List[Document] | The results.
retrieve_docs
Retrieve documents from the collection of the vector database based on the queries.
Arguments:
queries
- List[str] | A list of queries. Each query is a string.collection_name
- str | The name of the collection. Default is None.n_results
- int | The number of relevant documents to return. Default is 10.distance_threshold
- float | The threshold for the distance score, only distance smaller than it will be returned. Don’t filter with it if < 0. Default is -1.kwargs
- Dict | Additional keyword arguments. Ones of importance follow:oversampling_factor
- int | This times n_results is ‘ef’ in the HNSW algorithm. It determines the number of nearest neighbor candidates to consider during the search phase. A higher value leads to more accuracy, but is slower. Default is 10
Returns:
QueryResults | For each query string, a list of nearest documents and their scores.