Embeddings are the A.I-native way to represent any kind of data, making them the perfect fit for working with all kinds of A.I-powered tools and algorithms. They can represent text, iamges, and soon audio and video. There are many options for creating embeddings, whether locally using an installed library, or by calling an API.
Chroma provides lightweight wrappers around popular embedding providers, making it easy to use them in your apps. You can set an embedding function when you create a Chroma collection, which will be used automatically, or you can call them directly yourself.
To get Chroma's embedding functions, import the chromadb.utils.embedding_functions
module.
Default: Sentence Transformers
By default, Chroma uses Sentence Transformers to create embeddings. Sentence Transformers is a library for creating sentence and document embeddings that can be used for a wide variety of tasks. It is based on the Transformers library from Hugging Face. This embedding function runs locally on your machine, and may require you download the model files (this will happen automatically).
You can pass in an optional model_name
argument, which lets you choose which Sentence Transformers model to use. By default, Chroma uses all-MiniLM-L6-v2
. You can see a list of all available models here.
OpenAI
Chroma provides a convenient wrapper around OpenAI's embedding API. This embedding function runs remotely on OpenAI's servers, and requires an API key. You can get an API key by signing up for an account at OpenAI.
This embedding function relies on the opeanai
python package, which you can install with pip install openai
.
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="YOUR_API_KEY",
model_name="text-embedding-ada-002"
)
You can pass in an optional model_name
argument, which lets you choose which OpenAI embeddings model to use. By default, Chroma uses text-embedding-ada-002
. You can see a list of all available models here.
Custom Embedding Functions
You can create your own embedding function to use with Chroma, it just needs to implement the EmbeddingFunction
protocol.
from chromadb.api.types import Documents, EmbeddingFunction, Embeddings
class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, texts: Documents) -> Embeddings:
# embed the documents somehow
return embeddings
We welcome contributions! If you create an embedding function that you think would be useful to others, please consider submitting a pull request to add it to Chroma's embedding_functions
module.