PostgreSQL replication of data between schemas within same db

78 Views Asked by At

We use Precisely CDC to replicate data from an IBMi DB2 source system into an AWS RDS PostgreSQL database (v16.1). Enormous amounts of big data, into multiple tables with 100+ columns. The target tables are continuously used by analysts. They're heavily indexed (essential btree indexes).

Apply of rows to PostgreSQL is a frequent bottleneck. There's frequently buildup of large queues for multiple tables. We're working on ways to optimize this.

I'd like to change this so that CDC writes to non-indexed versions of the tables, within a schema that's exclusively used by CDC. Then replicate the data into indexed tables in a separate schema used by analysts.

I doubt that postgreSQL logical replication can be used for this, because it requires that source/target schema & table names are identical.

How would you handle realtime replication of data between schemas within the same db? Maybe it can be solved using table triggers? High performance is essential here.

I'm exploring options, nothing concrete has been tried yet. But letting analysts use views wouldn't alleviate the problem, as the tables that CDC applies rows to would still need to be indexed.

1

There are 1 best solutions below

0
SerhiiH On

For your purposes there is no silver bullet, really depends on what is your requirements for latency you allowing for "realtime". Here is what I can suggest for your case:

  1. Crete worker that will pick newly added data from the table with no indexes and insert in into the table with indexes with a multi-row insert.
  2. Use AWS DMS task that captures ongoing changes from the source data store
  3. Consider using another solutions, specifically for Big Data Analysis (like Redshift) + Option 1. (you need to evaluate performance of insertion by yourself)

Both solutions fit for real time, however both solutions will have some lag.