ChromaVectorDB

ChromaVectorDB(
    *,
    client=None,
    path: str = 'tmp/db',
    embedding_function: Callable = None,
    metadata: dict = None,
    **kwargs
)

A vector database that uses ChromaDB as the backend.
Initialize the vector database.

Parameters:
NameDescription
client=None
pathType: str

Default: ‘tmp/db’
embedding_functionType: Callable

Default: None
metadataType: dict

Default: None
**kwargs

Class Attributes

active_collection



embedding_function



type



Instance Methods

create_collection

create_collection(
    self,
    collection_name: str,
    overwrite: bool = False,
    get_or_create: bool = True
) -> Collection

Create a collection in the vector database.
Case 1. if the collection does not exist, create the collection.
Case 2. the collection exists, if overwrite is True, it will overwrite the collection.
Case 3. the collection exists and overwrite is False, if get_or_create is True, it will get the collection, otherwise it raise a ValueError.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str
overwriteboolWhether to overwrite the collection if it exists.

Default is False.

Type: bool

Default: False
get_or_createboolWhether to get the collection if it exists.

Default is True.

Type: bool

Default: True
Returns:
TypeDescription
CollectionCollection | The collection object.

delete_collection

delete_collection(self, collection_name: str) -> None

Delete the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Type: str
Returns:
TypeDescription
NoneNone

delete_docs

delete_docs(
    self,
    ids: list[str | int],
    collection_name: str = None,
    **kwargs
) -> None

Delete documents from the collection of the vector database.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

Each id is a typed ItemID.

Type: list[str | int]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
**kwargs
Returns:
TypeDescription
NoneNone

get_collection

get_collection(self, collection_name: str = None) -> Collection

Get the collection from the vector database.

Parameters:
NameDescription
collection_namestrThe name of the collection.

Default is None.

If None, return the current active collection.

Type: str

Default: None
Returns:
TypeDescription
CollectionCollection | The collection object.

get_docs_by_ids

get_docs_by_ids(
    self,
    ids: list[str | int] = None,
    collection_name: str = None,
    include=None,
    **kwargs
) -> list[Document]

Retrieve documents from the collection of the vector database based on the ids.

Parameters:
NameDescription
idsList[ItemID]A list of document ids.

If None, will return all the documents.

Default is None.

Type: list[str | int]

Default: None
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
include=None
**kwargs
Returns:
TypeDescription
list[Document]List[Document] | The results.

insert_docs

insert_docs(
    self,
    docs: list[Document],
    collection_name: str = None,
    upsert: bool = False
) -> None

Insert documents into the collection of the vector database.

Parameters:
NameDescription
docsList[Document]A list of documents.

Each document is a TypedDict Document.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
upsertboolWhether to update the document if it exists.

Default is False.

Type: bool

Default: False
Returns:
TypeDescription
NoneNone

retrieve_docs

retrieve_docs(
    self,
    queries: list[str],
    collection_name: str = None,
    n_results: int = 10,
    distance_threshold: float = -1,
    **kwargs: Any
) -> list[list[tuple[Document, float]]]

Retrieve documents from the collection of the vector database based on the queries.

Parameters:
NameDescription
queriesList[str]A list of queries.

Each query is a string.

Type: list[str]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
n_resultsintThe number of relevant documents to return.

Default is 10.

Type: int

Default: 10
distance_thresholdfloatThe threshold for the distance score, only distance smaller than it will be returned.

Don’t filter with it if 0.

Default is -1.

Type: float

Default: -1
**kwargsType: Any
Returns:
TypeDescription
list[list[tuple[Document, float]]]QueryResults | The query results. Each query result is a list of list of tuples containing the document and the distance.

update_docs

update_docs(
    self,
    docs: list[Document],
    collection_name: str = None
) -> None

Update documents in the collection of the vector database.

Parameters:
NameDescription
docsList[Document]A list of documents.

Type: list[Document]
collection_namestrThe name of the collection.

Default is None.

Type: str

Default: None
Returns:
TypeDescription
NoneNone