Overview
This document describes how to create and run vectorizers from Python.
Overview
Section titled “Overview”This document describes how to create and run vectorizers from Python.
Installation
Section titled “Installation”First, install the pgai library:
pip install pgaiThen, you need to install the necessary database tables and functions. All database objects will be created in the ai schema. This is done by running the following Python code:
import pgai
pgai.install(DB_URL)Creating vectorizers from python
Section titled “Creating vectorizers from python”To help you create vectorizers from python, pgai provides the CreateVectorizer helper class. This class makes it easy to generate the create_vectorizer SQL statement, by giving you a pythonic interface.
It accepts all the options listed in the SQL API and exposes the to_sql
method to generate a SQL query which you can then run through the SQL library of your choice:
from pgai.vectorizer import CreateVectorizerfrom pgai.vectorizer.configuration import EmbeddingOpenaiConfig, ChunkingCharacterTextSplitterConfig, FormattingPythonTemplateConfig, LoadingColumnConfig, DestinationTableConfig
vectorizer_statement = CreateVectorizer( source="blog", name="blog_content_embedder", # Optional custom name for easier reference destination=DestinationTableConfig( destination='blog_embeddings' ), loading=LoadingColumnConfig(column_name='content'), embedding=EmbeddingOpenaiConfig( model='text-embedding-3-small', dimensions=768 ), chunking=ChunkingCharacterTextSplitterConfig( chunk_size=800, chunk_overlap=400, separator='.', is_separator_regex=False ), formatting=FormattingPythonTemplateConfig(template='$title - $chunk')).to_sql()Then, you can run this statement using the PostgreSQL library of your choice. For example, using the psycopg library:
import psycopg
with psycopg.connect(conn_string) as conn: with conn.cursor() as cursor: cursor.execute(vectorizer_statement)Running the vectorizer worker
Section titled “Running the vectorizer worker”You can then run the vectorizer worker using the the CLI tool or the Worker class discussed in the vectorizer worker documentation.
Note that you will need to install pgai with the vectorizer-worker extra to run the worker yourself:
pip install pgai[vectorizer-worker]