Postgres connection failed while Debezium connector streaming changes

122 Views Asked by At

On Our Spring boot Auditing service(used Debezium embedded engine) occurs error at different time intervals, I don't know the cause, please help me to understand and fix the problem. Another information: This same application deployed to dev and stable servers, the problem only occuring on dev server Application error is here:

2024-02-01 11:50:48.100 
2024-02-01T11:50:48.100+05:00 ERROR 1 --- [debezium-postgresconnector-user_management_topic_dev3-change-event-source-coordinator] io.debezium.pipeline.ErrorHandler        : Producer failure

org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1176)
    at org.postgresql.core.v3.CopyDualImpl.flushCopy(CopyDualImpl.java:30)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:195)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:186)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:128)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:82)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:607)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:220)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:182)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:41)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:205)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:172)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:118)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1030)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at org.postgresql.core.PGStream.flush(PGStream.java:721)
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1174)
    ... 17 common frames omitted

    
    
2024-02-01 11:50:48.102 
2024-02-01T11:50:48.102+05:00  WARN 1 --- [debezium-postgresconnector-user_management_topic_dev3-change-event-source-coordinator] io.debezium.pipeline.ErrorHandler        : Retry 2 of unlimited retries will be attempted
    
    
2024-02-01 11:50:48.104 
2024-02-01T11:50:48.104+05:00  INFO 1 --- [pool-34-thread-1] io.debezium.jdbc.JdbcConnection          : Connection gracefully closed
    
    
2024-02-01 11:50:48.106 
2024-02-01T11:50:48.106+05:00 ERROR 1 --- [debezium-postgresconnector-user_management_topic_dev3-change-event-source-coordinator] i.d.c.p.c.PostgresReplicationConnection  : Unexpected error while closing Postgres connection

org.postgresql.util.PSQLException: Unable to close connection properly
    at org.postgresql.jdbc.PgConnection.close(PgConnection.java:854)
    at io.debezium.jdbc.JdbcConnection.lambda$doClose$4(JdbcConnection.java:946)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1030)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at org.postgresql.core.PGStream.flush(PGStream.java:721)
    at org.postgresql.core.QueryExecutorCloseAction.close(QueryExecutorCloseAction.java:71)
    at org.postgresql.jdbc.PgConnectionCleaningAction.onClean(PgConnectionCleaningAction.java:89)
    at org.postgresql.util.LazyCleaner$Node.onClean(LazyCleaner.java:218)
    at org.postgresql.util.LazyCleaner$Node.clean(LazyCleaner.java:209)
    at org.postgresql.jdbc.PgConnection.close(PgConnection.java:851)
    ... 5 common frames omitted

    
    
2024-02-01 11:50:48.107 
2024-02-01T11:50:48.107+05:00  INFO 1 --- [debezium-postgresconnector-user_management_topic_dev3-change-event-source-coordinator] i.d.p.ChangeEventSourceCoordinator       : Finished streaming
    
    
2024-02-01 11:50:48.107 
2024-02-01T11:50:48.107+05:00  INFO 1 --- [debezium-postgresconnector-user_management_topic_dev3-change-event-source-coordinator] i.d.p.ChangeEventSourceCoordinator       : Connected metrics set to 'false'
    
    
2024-02-01 11:50:48.228 
2024-02-01T11:50:48.228+05:00  WARN 1 --- [pool-8-thread-1] i.d.connector.common.BaseSourceTask      : Going to restart connector after 10 sec. after a retriable exception
    
    
2024-02-01 11:50:48.230 
2024-02-01T11:50:48.230+05:00  INFO 1 --- [pool-8-thread-1] i.d.pipeline.signal.SignalProcessor      : SignalProcessor stopped
    
    
2024-02-01 11:50:48.230 
2024-02-01T11:50:48.230+05:00  INFO 1 --- [pool-8-thread-1] io.debezium.embedded.EmbeddedEngine      : Retriable exception thrown, connector will be restarted; errors.max.retries=-1

org.apache.kafka.connect.errors.RetriableException: An exception occurred in the change event producer. This connector will be restarted.
    at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:63)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:185)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:41)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:205)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:172)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:118)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1176)
    at org.postgresql.core.v3.CopyDualImpl.flushCopy(CopyDualImpl.java:30)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:195)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:186)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:128)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:82)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:607)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:220)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:182)
    ... 9 common frames omitted
Caused by: java.net.SocketException: Broken pipe
    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1030)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at org.postgresql.core.PGStream.flush(PGStream.java:721)
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1174)
    ... 17 common frames omitted

    
    
2024-02-01 11:50:48.231 
2024-02-01T11:50:48.231+05:00  INFO 1 --- [pool-8-thread-1] io.debezium.embedded.EmbeddedEngine      : Starting connector, attempt 1
    
    
2024-02-01 11:50:48.231 
2024-02-01T11:50:48.231+05:00  INFO 1 --- [pool-8-thread-1] i.d.connector.common.BaseSourceTask      : Stopping down connector
    
    
2024-02-01 11:50:48.233 
2024-02-01T11:50:48.233+05:00  INFO 1 --- [pool-8-thread-1] i.d.connector.common.BaseSourceTask      : Starting PostgresConnectorTask with configuration:

I looked postgres logs also:

2024-02-01 11:20:49.430 +05 [39888] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27ACE618')
2024-02-01 11:20:49.619 +05 [39888] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:20:49.619 +05 [39888] DETAIL:  Streaming transactions committing after 0/27ACE618, reading WAL from 0/27ACE4F8.
2024-02-01 11:20:49.619 +05 [39888] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27ACE618 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:20:49.619 +05 [39888] LOG:  logical decoding found consistent point at 0/27ACE4F8
2024-02-01 11:20:49.619 +05 [39888] DETAIL:  There are no running transactions.
2024-02-01 11:20:49.619 +05 [39888] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27ACE618 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:22:55.884 +05 [22] LOG:  checkpoint starting: time
2024-02-01 11:22:56.807 +05 [22] LOG:  checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.103 s, total=0.923 s; sync files=1, longest=0.103 s, average=0.103 s; distance=0 kB, estimate=65 kB
2024-02-01 11:25:49.686 +05 [39888] LOG:  terminating walsender process due to replication timeout
2024-02-01 11:25:49.686 +05 [39888] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27ACE618 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:26:15.322 +05 [39924] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:26:15.322 +05 [39924] DETAIL:  Streaming transactions committing after 0/27ACE618, reading WAL from 0/27ACE5E0.
2024-02-01 11:26:15.322 +05 [39924] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27AD0000')
2024-02-01 11:26:15.322 +05 [39924] LOG:  logical decoding found consistent point at 0/27ACE5E0
2024-02-01 11:26:15.322 +05 [39924] DETAIL:  There are no running transactions.
2024-02-01 11:26:15.322 +05 [39924] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27AD0000')
2024-02-01 11:26:15.824 +05 [39924] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:26:15.824 +05 [39924] DETAIL:  Streaming transactions committing after 0/27AD0000, reading WAL from 0/27ACE640.
2024-02-01 11:26:15.824 +05 [39924] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27AD0000 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:26:15.824 +05 [39924] LOG:  logical decoding found consistent point at 0/27ACE640
2024-02-01 11:26:15.824 +05 [39924] DETAIL:  There are no running transactions.
2024-02-01 11:26:15.824 +05 [39924] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27AD0000 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:27:55.886 +05 [22] LOG:  checkpoint starting: time
2024-02-01 11:27:56.107 +05 [22] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.202 s, sync=0.003 s, total=0.221 s; sync files=2, longest=0.002 s, average=0.001 s; distance=14 kB, estimate=60 kB
2024-02-01 11:30:45.558 +05 [39230] LOG:  could not receive data from client: Operation timed out
2024-02-01 11:31:58.019 +05 [39955] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:31:58.019 +05 [39955] DETAIL:  Streaming transactions committing after 0/27AD0000, reading WAL from 0/27ACE728.
2024-02-01 11:31:58.019 +05 [39955] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27AD0000')
2024-02-01 11:31:58.019 +05 [39955] LOG:  logical decoding found consistent point at 0/27ACE728
2024-02-01 11:31:58.019 +05 [39955] DETAIL:  There are no running transactions.
2024-02-01 11:31:58.019 +05 [39955] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27AD0000')
2024-02-01 11:31:58.022 +05 [39955] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:31:58.022 +05 [39955] DETAIL:  Streaming transactions committing after 0/27AD0000, reading WAL from 0/27ACE728.
2024-02-01 11:31:58.022 +05 [39955] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27AD0000 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:31:58.022 +05 [39955] LOG:  logical decoding found consistent point at 0/27ACE728
2024-02-01 11:31:58.022 +05 [39955] DETAIL:  There are no running transactions.
2024-02-01 11:31:58.022 +05 [39955] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27AD0000 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:32:36.151 +05 [39227] LOG:  could not receive data from client: Operation timed out
2024-02-01 11:32:36.151 +05 [39227] LOG:  unexpected EOF on client connection with an open transaction
2024-02-01 11:32:55.190 +05 [22] LOG:  checkpoint starting: time
2024-02-01 11:32:56.002 +05 [22] LOG:  checkpoint complete: wrote 9 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.804 s, sync=0.002 s, total=0.813 s; sync files=8, longest=0.002 s, average=0.001 s; distance=19 kB, estimate=56 kB
2024-02-01 11:39:38.058 +05 [39955] LOG:  terminating walsender process due to replication timeout
2024-02-01 11:39:38.058 +05 [39955] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27AD0000 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:42:55.129 +05 [22] LOG:  checkpoint starting: time
2024-02-01 11:42:55.748 +05 [22] LOG:  checkpoint complete: wrote 7 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.603 s, sync=0.007 s, total=0.619 s; sync files=7, longest=0.004 s, average=0.001 s; distance=18 kB, estimate=52 kB
2024-02-01 11:50:54.529 +05 [40052] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:50:54.529 +05 [40052] DETAIL:  Streaming transactions committing after 0/27AD0000, reading WAL from 0/27ACE728.
2024-02-01 11:50:54.529 +05 [40052] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27ADB620')
2024-02-01 11:50:54.530 +05 [40052] LOG:  logical decoding found consistent point at 0/27ACE728
2024-02-01 11:50:54.530 +05 [40052] DETAIL:  There are no running transactions.
2024-02-01 11:50:54.530 +05 [40052] STATEMENT:  SELECT pg_replication_slot_advance('user_management_connector_dev3', '0/27ADB620')
2024-02-01 11:50:54.547 +05 [40052] LOG:  starting logical decoding for slot "user_management_connector_dev3"
2024-02-01 11:50:54.547 +05 [40052] DETAIL:  Streaming transactions committing after 0/27ADB620, reading WAL from 0/27AD1E98.
2024-02-01 11:50:54.547 +05 [40052] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27ADB620 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')
2024-02-01 11:50:54.547 +05 [40052] LOG:  logical decoding found consistent point at 0/27AD1E98
2024-02-01 11:50:54.547 +05 [40052] DETAIL:  There are no running transactions.
2024-02-01 11:50:54.547 +05 [40052] STATEMENT:  START_REPLICATION SLOT "user_management_connector_dev3" LOGICAL 0/27ADB620 ("proto_version" '1', "publication_names" 'dbz_publication', "messages" 'true')

I saw some similar question and tryed increasing debezium config offset.flush.timeout.ms to 10000 from default value 5000, but did not help

0

There are 0 best solutions below