MongoDBAtlasVectorDB

MongoDBAtlasVectorDB(
    connection_string: str = '',
    database_name: str = 'vector_db',
    embedding_function: Callable[..., Any] | None = None,
    collection_name: str = None,
    index_name: str = 'vector_index',
    overwrite: bool = False,
    wait_until_index_ready: float | None = None,
    wait_until_document_ready: float | None = None
)

A Collection object for MongoDB.
Initialize the vector database.

Parameters:
NameDescription
connection_stringType: str

Default:
database_nameType: str

Default: ‘vector_db’
embedding_functionType: Callable[…, Any] | None

Default: None
collection_nameType: str

Default: None
index_nameType: str

Default: ‘vector_index’
overwriteType: bool

Default: False
wait_until_index_readyType: float | None

Default: None
wait_until_document_readyType: float | None

Default: None

Class Attributes

active_collection



embedding_function



type



Instance Methods

create_collection

create_collection(
    self,
    collection_name: str,
    overwrite: bool = False,
    get_or_create: bool = True
) -> Collection

Create a collection in the vector database and create a vector search index in the collection.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str
overwriteboolWhether to overwrite the collection if it exists.

Default is False.

Type: bool

Default: False
get_or_createboolWhether to get or create the collection.

Default is True

Type: bool

Default: True

create_index_if_not_exists

create_index_if_not_exists(
    self,
    index_name: str = 'vector_index',
    collection: Collection = None
) -> 

Creates a vector search index on the specified collection in MongoDB.

Parameters:
NameDescription
index_nameType: str

Default: ‘vector_index’
collectionThe MongoDB collection to create the index on.

Defaults to None.

Type: Collection

Default: None

create_vector_search_index

create_vector_search_index(
    self,
    collection: Collection,
    index_name: str | None = 'vector_index',
    similarity: Literal['euclidean', 'cosine', 'dotProduct'] = 'cosine'
) -> 

Create a vector search index in the collection.

Parameters:
NameDescription
collectionAn existing Collection in the Atlas Database.

Type: Collection
index_nameVector Search Index name.

Type: str | None

Default: ‘vector_index’
similarityAlgorithm used for measuring vector similarity.

Type: Literal[‘euclidean’, ‘cosine’, ‘dotProduct’]

Default: ‘cosine’

delete_collection

delete_collection(self, collection_name: str) -> None

Delete the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str

delete_docs

delete_docs(
    self,
    ids: list[str | int],
    collection_name: str = None,
    **kwargs
) -> 

Delete documents from the collection of the vector database.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

Each id is a typed ItemID.

Type: list[str | int]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
**kwargs

get_collection

get_collection(self, collection_name: str = None) -> Collection

Get the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Default is None.

If None, return the current active collection.

Type: str

Default: None
Returns:
TypeDescription
CollectionCollection | The collection object.

get_docs_by_ids

get_docs_by_ids(
    self,
    ids: list[str | int] = None,
    collection_name: str = None,
    include: list[str] = None,
    **kwargs
) -> list[Document]

Retrieve documents from the collection of the vector database based on the ids.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

If None, will return all the documents.

Default is None.

Type: list[str | int]

Default: None
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
includeList[str]The fields to include.

If None, will include [“metadata”, “content”], ids will always be included.

Basically, use include to choose whether to include embedding and metadata

Type: list[str]

Default: None
**kwargs
Returns:
TypeDescription
list[Document]List[Document] | The results.

insert_docs

insert_docs(
    self,
    docs: list[Document],
    collection_name: str = None,
    upsert: bool = False,
    batch_size: int = 100000,
    **kwargs: Any
) -> None

Insert Documents and Vector Embeddings into the collection of the vector database.
For large numbers of Documents, insertion is performed in batches.

Parameters:
NameDescription
docsList[Document]A list of documents.

Each document is a TypedDict Document.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
upsertboolWhether to update the document if it exists.

Default is False.

Type: bool

Default: False
batch_sizeNumber of documents to be inserted in each batch

Type: int

Default: 100000
**kwargsType: Any

list_collections

list_collections(self) -> 

List the collections in the vector database.
Returns:
List[str] | The list of collections.


retrieve_docs

retrieve_docs(
    self,
    queries: list[str],
    collection_name: str = None,
    n_results: int = 10,
    distance_threshold: float = -1,
    **kwargs: Any
) -> list[list[tuple[Document, float]]]

Retrieve documents from the collection of the vector database based on the queries.

Parameters:
NameDescription
queriesList[str]A list of queries.

Each query is a string.

Type: list[str]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
n_resultsintThe number of relevant documents to return.

Default is 10.

Type: int

Default: 10
distance_thresholdfloatThe threshold for the distance score, only distance smaller than it will be returned.

Don’t filter with it if 0. Default is -1.

Type: float

Default: -1
**kwargsType: Any
Returns:
TypeDescription
list[list[tuple[Document, float]]]QueryResults | For each query string, a list of nearest documents and their scores.

update_docs

update_docs(
    self,
    docs: list[Document],
    collection_name: str = None,
    **kwargs: Any
) -> None

Update documents, including their embeddings, in the Collection.
Optionally allow upsert as kwarg.
Uses deepcopy to avoid changing docs.

Parameters:
NameDescription
docsList[Document]A list of documents.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
**kwargsType: Any