---
title: Adding a Vectorizer embedding integration | Tiger Data Docs
description: We welcome contributions to add new vectorizer embedding integrations.
---

# Adding a Vectorizer embedding integration

We welcome contributions to add new vectorizer embedding integrations.

The vectorizer consists of two components: the configuration, and the vectorizer worker.

## Configuration

The vectorizer configuration lives in the database, in the `ai.vectorizer` table. The `ai.create_vectorizer` function creates and inserts this configuration into the table. When adding a new integration, only the argument passed to the `embedding` parameter of `ai.create_vectorizer` is relevant. This value is `jsonb` generated by the `ai.embedding_*` family of functions.

To add a new integration, add a new integration-specific function to the pgai extension. This function generates the jsonb configuration for the new integration. Refer to the existing `ai.embedding_openai` and `ai.embedding_ollama` functions for examples of what these look like.

The configuration function should minimise mandatory arguments, while allowing as many optional arguments as needed. Avoid using non-null default values for optional arguments, as leaving a value unconfigured in the vectorizer may be preferable, allowing it to be set in the vectorizer worker instead.

Update the implementation of `ai._validate_embedding` to account for the new integration. Update the tests to account for the new function.

## Vectorizer worker

The vectorizer worker reads the database’s vectorizer configuration at runtime and turns it into a `pgai.vectorizer.Config`.

To add a new integration, add a new file containing the embedding class with fields corresponding to the database’s jsonb configuration into the [embedders directory](https://github.com/timescale/pgai/tree/main/projects/pgai/pgai/vectorizer/embedders) directory. See the existing implementations for examples of how to do this. Implement the `Embedder` class’ abstract methods. Use first-party python libraries for the integration, if available. If no first-party python libraries are available, use direct HTTP requests.

Remember to include the import line of your recently created class into the [embedders \_\_init\_\_.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/embedders/__init__.py).

Add tests which perform end-to-end testing of the new integration. There are two options for handling API calls to the integration API:

1. Use [vcr.py](https://vcrpy.readthedocs.io/en/latest/) to cache real requests to the API
2. Run against the real API

At minimum the integration should use option 1: vcr.py. Option 2 should be used conservatively. We will determine on a case-by-case basis what level of testing we would like.

## pgai library

The pgai library exposes helpers to create a vectorizer via pure python. The classes for this are autogenerated via code generation. To update the classes with a new integration look into the code generator docs in [/projects/pgai/pgai/vectorizer/generate](https://github.com/timescale/pgai/tree/main/projects/pgai/pgai/vectorizer/generate/README).

## Documentation

Ensure that the new integration is documented:

- Document the new database function in [/docs/vectorizer/api-reference.md](/reference/pgai/vectorizer/api-reference/index.md).
- Document any changes to the vectorizer worker in [/docs/vectorizer/worker.md](/reference/pgai/vectorizer/worker/index.md).
- Add a new row in [Supported features in each model](/reference/pgai/index.md) for your worker.
