ChromaVectorDB

class ChromaVectorDB(VectorDB)

A vector database that uses ChromaDB as the backend.

__init__

def __init__(*,
             client=None,
             path: str = "tmp/db",
             embedding_function: Callable = None,
             metadata: dict = None,
             **kwargs) -> None

Initialize the vector database.

Arguments:

  • client - chromadb.Client | The client object of the vector database. Default is None. If provided, it will use the client object directly and ignore other arguments.
  • path - str | The path to the vector database. Default is tmp/db. The default was None for version <=0.2.24.
  • embedding_function - Callable | The embedding function used to generate the vector representation of the documents. Default is None, SentenceTransformerEmbeddingFunction(“all-MiniLM-L6-v2”) will be used.
  • metadata - dict | The metadata of the vector database. Default is None. If None, it will use this
  • setting - {“hnsw:space”: “ip”, “hnsw:construction_ef”: 30, “hnsw:M”: 32}. For more details of the metadata, please refer to distances, hnsw, and ALGO_PARAMS.
  • kwargs - dict | Additional keyword arguments.

Returns:

None

create_collection

def create_collection(collection_name: str,
                      overwrite: bool = False,
                      get_or_create: bool = True) -> Collection

Create a collection in the vector database. Case 1. if the collection does not exist, create the collection. Case 2. the collection exists, if overwrite is True, it will overwrite the collection. Case 3. the collection exists and overwrite is False, if get_or_create is True, it will get the collection, otherwise it raise a ValueError.

Arguments:

  • collection_name - str | The name of the collection.
  • overwrite - bool | Whether to overwrite the collection if it exists. Default is False.
  • get_or_create - bool | Whether to get the collection if it exists. Default is True.

Returns:

Collection | The collection object.

get_collection

def get_collection(collection_name: str = None) -> Collection

Get the collection from the vector database.

Arguments:

  • collection_name - str | The name of the collection. Default is None. If None, return the current active collection.

Returns:

Collection | The collection object.

delete_collection

def delete_collection(collection_name: str) -> None

Delete the collection from the vector database.

Arguments:

  • collection_name - str | The name of the collection.

Returns:

None

insert_docs

def insert_docs(docs: list[Document],
                collection_name: str = None,
                upsert: bool = False) -> None

Insert documents into the collection of the vector database.

Arguments:

  • docs - List[Document] | A list of documents. Each document is a TypedDict Document.
  • collection_name - str | The name of the collection. Default is None.
  • upsert - bool | Whether to update the document if it exists. Default is False.
  • kwargs - Dict | Additional keyword arguments.

Returns:

None

update_docs

def update_docs(docs: list[Document], collection_name: str = None) -> None

Update documents in the collection of the vector database.

Arguments:

  • docs - List[Document] | A list of documents.
  • collection_name - str | The name of the collection. Default is None.

Returns:

None

delete_docs

def delete_docs(ids: list[ItemID],
                collection_name: str = None,
                **kwargs) -> None

Delete documents from the collection of the vector database.

Arguments:

  • ids - List[ItemID] | A list of document ids. Each id is a typed ItemID.
  • collection_name - str | The name of the collection. Default is None.
  • kwargs - Dict | Additional keyword arguments.

Returns:

None

retrieve_docs

def retrieve_docs(queries: list[str],
                  collection_name: str = None,
                  n_results: int = 10,
                  distance_threshold: float = -1,
                  **kwargs) -> QueryResults

Retrieve documents from the collection of the vector database based on the queries.

Arguments:

  • queries - List[str] | A list of queries. Each query is a string.
  • collection_name - str | The name of the collection. Default is None.
  • n_results - int | The number of relevant documents to return. Default is 10.
  • distance_threshold - float | The threshold for the distance score, only distance smaller than it will be returned. Don’t filter with it if < 0. Default is -1.
  • kwargs - Dict | Additional keyword arguments.

Returns:

QueryResults | The query results. Each query result is a list of list of tuples containing the document and the distance.

get_docs_by_ids

def get_docs_by_ids(ids: list[ItemID] = None,
                    collection_name: str = None,
                    include=None,
                    **kwargs) -> list[Document]

Retrieve documents from the collection of the vector database based on the ids.

Arguments:

  • ids - List[ItemID] | A list of document ids. If None, will return all the documents. Default is None.
  • collection_name - str | The name of the collection. Default is None.
  • include - List[str] | The fields to include. Default is None. If None, will include [“metadatas”, “documents”], ids will always be included.
  • kwargs - dict | Additional keyword arguments.

Returns:

List[Document] | The results.