Migrate schema and data separately
Migrate your data and schema to self-hosted TimescaleDB separately for easier failure recovery
Migrate larger databases by migrating your schema first, then migrating the data. This method copies each table or chunk separately, which allows you to restart midway if one copy operation fails.
For smaller databases, it may be more convenient to migrate your entire database at once. For more information, see the section on choosing a migration method.
This method does not retain continuous aggregates calculated using already-deleted data. For example, if you delete raw data after a month but retain downsampled data in a continuous aggregate for a year, the continuous aggregate loses any data older than a month upon migration. If you must keep continuous aggregates calculated using deleted data, migrate your entire database at once. For more information, see the section on choosing a migration method.
The procedure to migrate your database requires these steps:
- Migrate schema pre-data
- Restore hypertables in TimescaleDB
- Copy data from the source database
- Restore data into TimescaleDB
- Migrate schema post-data
- Recreate continuous aggregates (optional)
- Recreate policies (optional)
- Update table statistics
Depending on your database size and network speed, steps that involve copying data can take a very long time. You can continue reading from your source database during this time, though performance could be slower. To avoid this problem, fork your database and migrate your data from the fork. If you write to the tables in your source database during the migration, the new writes might not be transferred to TimescaleDB. To avoid this problem, see the section on migrating an active database.
Prerequisites
Section titled “Prerequisites”Before you begin, check that you have:
- Installed the PostgreSQL
pg_dumpandpg_restoreutilities. - Installed a client for connecting to PostgreSQL. These instructions use
psql, but any client works. - Created a new empty database in a self-hosted TimescaleDB instance. Provision your database with enough space for all your data.
- Checked that any other PostgreSQL extensions you use are compatible with TimescaleDB.
- Checked that you’re running the same major version of PostgreSQL on both your self-hosted TimescaleDB instance and your source database. For information about upgrading PostgreSQL on your source database, see the upgrade instructions for self-hosted TimescaleDB and Managed Service for TimescaleDB.
- Checked that you’re running the same major version of TimescaleDB on both your target and source database. For more information, see upgrading TimescaleDB.
Migrate schema pre-data
Section titled “Migrate schema pre-data”Migrate your pre-data from your source database to self-hosted TimescaleDB. This includes table and schema definitions, as well as information on sequences, owners, and settings. This doesn’t include TimescaleDB-specific schemas.
- Dump the schema pre-data from your source database into a
dump_pre_data.bakfileExclude TimescaleDB-specific schemas. If you are prompted for a password, use your source database credentials:
Terminal window pg_dump -U <SOURCE_DB_USERNAME> -W \-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \--section=pre-data --exclude-schema="_timescaledb*" \-f dump_pre_data.bak <DATABASE_NAME> - Restore the dumped data from the dump_pre_data.bak file into your self-hosted TimescaleDB instance
To avoid permissions errors, include the
--no-ownerflag:Terminal window pg_restore -U tsdbadmin -W \-h <HOST> -p <PORT> --no-owner -Fc \-v -d tsdb dump_pre_data.bak
Restore hypertables in your self-hosted TimescaleDB instance
Section titled “Restore hypertables in your self-hosted TimescaleDB instance”After pre-data migration, your hypertables from your source database become regular PostgreSQL tables in TimescaleDB. Recreate your hypertables in your self-hosted TimescaleDB instance to restore them.
- Connect to your self-hosted TimescaleDB instancepsql "postgres://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?sslmode=require"
- Restore the hypertableSELECT create_hypertable('<TABLE_NAME>',by_range('<COLUMN_NAME>', INTERVAL '<CHUNK_INTERVAL>'));
The by_range dimension builder is an addition to TimescaleDB 2.13.
Copy data from the source database
Section titled “Copy data from the source database”After restoring your hypertables, return to your source database to copy your data, table by table.
- Connect to your source database
Terminal window psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require" - Dump the data from the first table into a .csv file\COPY (SELECT * FROM <TABLE_NAME>) TO <TABLE_NAME>.csv CSV
Repeat for each table and hypertable you want to migrate.
If your tables are very large, you can migrate each table in multiple pieces. Split each table by time range, and copy each range individually. For example:
\COPY (SELECT * FROM <TABLE_NAME> WHERE time > '2021-11-01' AND time < '2011-11-02') TO <TABLE_NAME_DATE_RANGE>.csv CSVRestore data into TimescaleDB
Section titled “Restore data into TimescaleDB”When you have copied your data into .csv files, you can restore it to
self-hosted TimescaleDB by copying from the .csv files. There are two methods: using
regular PostgreSQL COPY, or using the TimescaleDB
timescaledb-parallel-copy function. In tests,
timescaledb-parallel-copy is 16% faster. The timescaledb-parallel-copy tool
is not included by default. You must install the function.
Because COPY decompresses data, any compressed data in your source
database is now stored uncompressed in your .csv files. If you
provisioned your self-hosted TimescaleDB storage for your compressed data, the
uncompressed data may take too much storage. To avoid this problem, periodically
recompress your data as you copy it in.
Restoring data with timescaledb-parallel-copy
Section titled “Restoring data with timescaledb-parallel-copy”- Install timescaledb-parallel-copy
Terminal window go get github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy - Import data into your self-hosted TimescaleDB instance
Set
<NUM_WORKERS>to twice the number of CPUs in your database. For example, if you have 4 CPUs,<NUM_WORKERS>should be8.Terminal window timescaledb-parallel-copy \--connection "host=<HOST> \user=tsdbadmin password=<PASSWORD> \port=<PORT> \dbname=tsdb \sslmode=require" \--table <TABLE_NAME> \--file <FILE_NAME>.csv \--workers <NUM_WORKERS> \--reporting-period 30sRepeat for each table and hypertable you want to migrate.
Restoring data into self-hosted TimescaleDB with COPY
Section titled “Restoring data into self-hosted TimescaleDB with COPY”- Connect to your self-hosted TimescaleDB instancepsql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"
- Restore the data to your self-hosted TimescaleDB instance\copy <TABLE_NAME> FROM '<TABLE_NAME>.csv' WITH (FORMAT CSV);
Repeat for each table and hypertable you want to migrate.
Migrate schema post-data
Section titled “Migrate schema post-data”When you have migrated your table and hypertable data, migrate your PostgreSQL schema post-data. This includes information about constraints.
- Dump the schema post-data from your source database into a
dump_post_data.dumpfileExclude Timescale-specific schemas. If you are prompted for a password, use your source database credentials:
Terminal window pg_dump -U <SOURCE_DB_USERNAME> -W \-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \--section=post-data --exclude-schema="_timescaledb*" \-f dump_post_data.dump <DATABASE_NAME> - Restore the dumped schema post-data from the dump_post_data.dump file into your self-hosted TimescaleDB instance
Using your connection details. To avoid permissions errors, include the
--no-ownerflag:Terminal window pg_restore -U tsdbadmin -W \-h <HOST> -p <PORT> --no-owner -Fc \-v -d tsdb dump_post_data.dump
Troubleshooting
Section titled “Troubleshooting”If you see these errors during the migration process, you can safely ignore them. The migration still occurs successfully.
pg_restore: error: could not execute query: ERROR: relation "<relation_name>" already existspg_restore: error: could not execute query: ERROR: trigger "ts_insert_blocker" for relation "<relation_name>" already existsRecreate continuous aggregates
Section titled “Recreate continuous aggregates”Continuous aggregates aren’t migrated by default when you transfer your schema and data separately. You can restore them by recreating the continuous aggregate definitions and recomputing the results on your self-hosted TimescaleDB instance. The recomputed continuous aggregates only aggregate existing data in your self-hosted TimescaleDB instance. They don’t include deleted raw data.
- Connect to your source database
Terminal window psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require" - Get a list of your existing continuous aggregate definitionsSELECT view_name, view_definition FROM timescaledb_information.continuous_aggregates;
This query returns the names and definitions for all your continuous aggregates. For example:
view_name | view_definition----------------+--------------------------------------------------------------------------------------------------------avg_fill_levels | SELECT round(avg(fill_measurements.fill_level), 2) AS avg_fill_level, +| time_bucket('01:00:00'::interval, fill_measurements."time") AS bucket, +| fill_measurements.sensor_id +| FROM fill_measurements +| GROUP BY (time_bucket('01:00:00'::interval, fill_measurements."time")), fill_measurements.sensor_id;(1 row) - Connect to your self-hosted TimescaleDB instance
Terminal window psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require" - Recreate each continuous aggregate definitionCREATE MATERIALIZED VIEW <VIEW_NAME>WITH (timescaledb.continuous) AS<VIEW_DEFINITION>
Recreate policies
Section titled “Recreate policies”By default, policies aren’t migrated when you transfer your schema and data separately. Recreate them on your self-hosted TimescaleDB instance.
- Connect to your source database
Terminal window psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require" - Get a list of your existing policies
This query returns a list of all your policies, including continuous aggregate refresh policies, retention policies, compression policies, and reorder policies:
SELECT application_name, schedule_interval, retry_period,config, hypertable_nameFROM timescaledb_information.jobs WHERE owner = '<SOURCE_DB_USERNAME>'; - Connect to your self-hosted TimescaleDB instancepsql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"
- Recreate each policy
For more information about recreating policies, see the sections on continuous-aggregate refresh policies, retention policies, and reorder policies.
Update table statistics
Section titled “Update table statistics”Update your table statistics by running ANALYZE on your entire
dataset. This might take some time depending on the size of your database:
ANALYZE;Troubleshooting
Section titled “Troubleshooting”If you see errors of the following form when you run ANALYZE, you can safely
ignore them:
WARNING: skipping "<TABLE OR INDEX>" --- only superuser can analyze itThe skipped tables and indexes correspond to system catalogs that can’t be accessed. Skipping them does not affect statistics on your data.