Ingest data
Ingest data into your services running in Managed Service for TimescaleDB
There are several different ways of ingesting your data into Managed Service for TimescaleDB. This section contains instructions to:
- Bulk upload from a
.csvfile - Insert data directly using a client driver, such as JDBC, ODBC, or Node.js
- Insert data directly using a message queue, such as Kafka
Before you begin, make sure you have
created your Managed Service for TimescaleDB service,
and can connect to it using psql.
Preparing your new database
Section titled “Preparing your new database”- Connect to your MST service
Use
psqlto connect to your MST service.psql -h <HOSTNAME> -p <PORT> -U <USERNAME> -W -d <DATABASE_NAME>You retrieve the MST service URL, port, and login credentials from the MST service overview in the MST dashboard.
- Create a new database
Create a new database for your data. In this example, the new database is called
new_db:CREATE DATABASE new_db;\c new_db; - Create a table
Create a new SQL table in your database. The columns you create for the table must match the columns in your source data. In this example, the table is storing weather condition data, and has columns for the timestamp, location, and temperature:
CREATE TABLE conditions (time TIMESTAMPTZ NOT NULL,location text NOT NULL,temperature DOUBLE PRECISION NULL); - Load the TimescaleDB extension
Load the
timescaledbPostgreSQL extension:CREATE EXTENSION timescaledb;\dx - Convert to a hypertable
Convert the SQL table into a hypertable:
SELECT create_hypertable('conditions', by_range('time'));NoteThe
by_rangedimension builder is an addition to TimescaleDB 2.13.
When you have successfully set up your new database, you can ingest data using one of these methods.
Bulk upload from CSV files
Section titled “Bulk upload from CSV files”If you have a dataset stored in a .csv file, you can import it into an empty
hypertable. You need to begin by creating the new table, before you
import the data.
Before you begin, make sure you have prepared your new database.
- Insert data using timescaledb-parallel-copy
Insert data into the new hypertable using the
timescaledb-parallel-copytool. You should already have the tool installed, but you can install it manually from our GitHub repository if you need to. In this example, we are inserting the data using four workers:timescaledb-parallel-copy \--connection '<service_url>' \--table conditions \--file ~/Downloads/example.csv \--workers 4 \--copy-options "CSV" \--skip-headerWe recommend that you set the number of workers lower than the number of available CPU cores on your client machine or server, to prevent the workers having to compete for resources. This helps your ingest go faster.
- (Optional) Use PostgreSQL COPY instead
If you don’t want to use the
timescaledb-parallel-copytool, or if you have a very small dataset, you can use the PostgreSQLCOPYcommand instead:psql '<service_url>/new_db?sslmode=require' -c "\copy conditions FROM <example.csv> WITH (FORMAT CSV, HEADER)"
Insert data directly using a client driver
Section titled “Insert data directly using a client driver”You can use a client driver such as JDBC, Python, or Node.js, to insert data directly into your new database.
See the PostgreSQL instructions for using the ODBC driver.
See the Code Quick Starts for using various languages, including Python and node.js.
Insert data directly using a message queue
Section titled “Insert data directly using a message queue”If you have data stored in a message queue, you can import it into your MST service. This section provides instructions on using the Kafka Connect PostgreSQL connector.
This connector deploys PostgreSQL change events from Kafka Connect to a runtime service. It monitors one or more schemas in a Managed Service for TimescaleDB service, and writes all change events to Kafka topics, which can then be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance, which ensures the connectors are running and continually keeping up with changes in the database.
You can also use the PostgreSQL connector as a library without Kafka or Kafka Connect. This allows applications and services to directly connect to MST and obtain the ordered change events. In this environment, the application must record the progress of the connector so that when it is restarted, the connect can continue where it left off. This approach can be useful for less critical use cases. However, for production use cases, we recommend that you use the connector with Kafka and Kafka Connect.
See these instructions for using the Kafka connector.