Troubleshoot import and ingest
Solutions to common errors when importing data and migrating databases
Imports and migrations often fail on version skew, encoding edge cases, or bulk-load limits. Start with these answers before you spend time debugging alone.
Live import from a database
Unsupported in live migration
Live migration tooling is currently experimental. You may run into the following shortcomings:
- Live migration does not yet support mutable columnstore compression (
INSERT,UPDATE,DELETEon data in the columnstore). - By default, numeric fields containing
NaN/+Inf/-Infvalues are not correctly replicated, and will be converted toNULL. A workaround is available, but is not enabled by default.
Should you run into any problems, please open a support request before losing any time debugging issues. You can open a support request directly from Tiger Console, or by email to support@tigerdata.com.
Where can I find logs for processes running during live migration?
Live migration involves several background processes to manage different stages of the migration. The logs of these processes can be helpful for troubleshooting unexpected behavior. You can find these logs in the <volume_mount>/logs directory.
Source and target databases have different TimescaleDB versions
When you migrate a self-hosted or Managed Service for TimescaleDB database to Tiger Cloud, the source database and the destination Tiger Cloud service must run the same version of TimescaleDB. Before you start live migration:
- Check the version of TimescaleDB running on the source database and the target Tiger Cloud service:
- If the version of TimescaleDB on the source database is lower than your Tiger Cloud service, either:
-
Downgrade: reinstall an older version of TimescaleDB on your Tiger Cloud service that matches the source database:
-
Connect to your Tiger Cloud service and check the versions of TimescaleDB available:
-
If an available TimescaleDB release matches your source database:
- Uninstall TimescaleDB from your Tiger Cloud service:
- Reinstall the correct version of TimescaleDB:
You may need to reconnect to your Tiger Cloud service using
psql -Xwhen you’re creating the TimescaleDB extension.
-
-
Upgrade: for self-hosted databases, upgrade TimescaleDB to match your Tiger Cloud service.
-
Why does live migration log “no tuple identifier” warning?
Live migration logs a warning WARNING: no tuple identifier for UPDATE in table when it cannot determine which specific rows should be updated after receiving an UPDATE statement from the source database during replication. This occurs when tables in the source database that receive UPDATE statements lack either a PRIMARY KEY or a REPLICA IDENTITY setting. For live migration to successfully replicate UPDATE and DELETE statements, tables must have either a PRIMARY KEY or REPLICA IDENTITY set as a prerequisite.
Set REPLICA IDENTITY on PostgreSQL partitioned tables
If your PostgreSQL tables use native partitioning, setting REPLICA IDENTITY on the root (parent) table will not automatically apply it to the partitioned child tables. You must manually set REPLICA IDENTITY on each partitioned child table.
Can I use read/failover replicas as source database for live migration?
Live migration does not support replication from read or failover replicas. You must provide a connection string that points directly to your source database for live migration.
Can I use live migration with a PostgreSQL connection pooler like PgBouncer?
Live migration does not support connection poolers. You must provide a connection string that points directly to your source and target databases for live migration to work smoothly.
Can I use Tiger Cloud instance as source for live migration?
No, Tiger Cloud cannot be used as a source database for live migration.
How can I exclude a schema/table from being replicated in live migration?
At present, live migration does not allow for excluding schemas or tables from replication, but this feature is expected to be added in future releases. However, a workaround is available for skipping table data using the --skip-table-data flag. For more information, please refer to the help text under the migrate subcommand.
Large migrations blocked
Tiger Cloud automatically manages the underlying disk volume. Due to platform limitations, it is only possible to resize the disk once every six hours. Depending on the rate at which you’re able to copy data, you may be affected by this restriction. Affected instances are unable to accept new data and error with: FATAL: terminating connection due to administrator command. If you intend on migrating more than 400 GB of data to Tiger Cloud, open a support request requesting the required storage to be pre-allocated in your Tiger Cloud service. You can open a support request directly from Tiger Console, or by email to support@tigerdata.com.
Import from a database with downtime
Dumping and locks
When pg_dump starts, it takes an ACCESS SHARE lock on all tables which it dumps. This ensures that tables aren’t dropped before pg_dump is able to drop them. A side effect of this is that any query which tries to take an ACCESS EXCLUSIVE lock on a table is be blocked by the ACCESS SHARE lock. A number of Tiger Cloud-internal processes require taking ACCESS EXCLUSIVE locks to ensure consistency of the data. The following is a non-exhaustive list of potentially affected operations:
- converting a chunk into the columnstore/rowstore and back
- continuous aggregate refresh (before 2.12)
- create hypertable with foreign keys, truncate hypertable
- enable hypercore on a hypertable
- drop chunks
The most likely impact of the above is that background jobs for retention policies, columnstore compression policies, and continuous aggregate refresh policies are blocked for the duration of the pg_dump command. This may have unintended consequences for your database performance.
Dumping with concurrency
When using the pg_dump directory format, it is possible to use concurrency to use multiple connections to the source database to dump data. This speeds up the dump process. Due to the fact that there are multiple connections, it is possible for pg_dump to end up in a deadlock situation. When it detects a deadlock it aborts the dump. In principle, any query which takes an ACCESS EXCLUSIVE lock on a table causes such a deadlock. As mentioned above, some common operations which take an ACCESS EXCLUSIVE lock are:
- retention policies
- columnstore compression policies
- continuous aggregate refresh policies
If you would like to use concurrency nonetheless, turn off all background jobs in the source database before running pg_dump, and turn them on once the dump is complete. If the dump procedure takes longer than the continuous aggregate refresh policy’s window, you must manually refresh the continuous aggregate in the correct time range. For more information, consult the refresh policies documentation. To turn off the jobs: To turn on the jobs:
Restoring with concurrency
If the directory format is used for pg_dump and pg_restore, concurrency can be employed to speed up the process. Unfortunately, loading the tables in the timescaledb_catalog schema concurrently causes errors. Furthermore, the tsdbadmin user does not have sufficient privileges to turn off triggers in this schema. To get around this limitation, load this schema serially, and then load the rest of the database concurrently.
Ownership of background jobs
The _timescaledb_config schema is an internal schema not part of the public API. For normal job management, use the timescaledb_information.jobs informational view and job management functions. Direct modification of internal tables should only be done in migration scenarios as described below.
The _timescaledb_config.bgw_jobs table is used to manage background jobs. This includes custom jobs, columnstore compression policies, retention policies, and continuous aggregate refresh policies. On Tiger Cloud, this table has a trigger which ensures that no database user can create or modify jobs owned by another database user. This trigger can provide an obstacle for migrations. If the --no-owner flag is used with pg_dump and pg_restore, all objects in the target database are owned by the user that ran pg_restore, likely tsdbadmin. If all the background jobs in the source database were owned by a user of the same name as the user running the restore (again likely tsdbadmin), then loading the _timescaledb_config.bgw_jobs table should work. If the background jobs in the source were owned by the postgres user, they are be automatically changed to be owned by the tsdbadmin user. In this case, one just needs to verify that the jobs do not make use of privileges that the tsdbadmin user does not possess. If background jobs are owned by one or more users other than the user employed in restoring, then there could be issues. To work around this issue, do not dump this table with pg_dump. Provide either --exclude-table-data='_timescaledb_config.bgw_job' or --exclude-table='_timescaledb_config.bgw_job' to pg_dump to skip this table. Then, use psql and the COPY command to dump and restore this table with modified values for the owner column. Once the table has been loaded and the restore completed, you may then use SQL to adjust the ownership of the jobs and/or the associated stored procedures and functions as you wish.
Common issues
Extension availability
There are a vast number of PostgreSQL extensions available in the wild. Tiger Cloud supports many of the most popular extensions, but not all extensions. Before migrating, check that the extensions you are using are supported on Tiger Cloud. Consult the list of supported extensions.
TimescaleDB extension in the public schema
When self-hosting, the TimescaleDB extension may be installed in an arbitrary schema. Tiger Cloud only supports installing the TimescaleDB extension in the public schema. How to go about resolving this depends heavily on the particular details of the source schema and the migration approach chosen.
Tablespaces
Tiger Cloud does not support using custom tablespaces. Providing the --no-tablespaces flag to pg_dump and pg_restore when dumping/restoring the schema results in all objects being in the default tablespace as desired.
Only one database per instance
While PostgreSQL clusters can contain many databases, Tiger Cloud services are limited to a single database. When migrating a cluster with multiple databases to Tiger Cloud, one can either migrate each source database to a separate Tiger Cloud service or “merge” source databases to target schemas.
Superuser privileges
The tsdbadmin database user is the most powerful available on Tiger Cloud, but it is not a true superuser. Review your application for use of superuser privileged operations and mitigate before migrating.
Migrate partial continuous aggregates
In order to improve the performance and compatibility of continuous aggregates, TimescaleDB v2.7 replaces partial continuous aggregates with finalized continuous aggregates. To test your database for partial continuous aggregates, run the following query: If you have partial continuous aggregates in your database, migrate them from partial to finalized before you migrate your database. If you accidentally migrate partial continuous aggregates across PostgreSQL versions, you see the following error when you query any continuous aggregates: