Rabbitmq client loosing connection and does not recover

1.1k Views Asked by At

I am using Rabbitmq 3.8.16 with Vertx 3.9.4 Rabbitmq client for my backend application and noticed that sometimes the client looses the connection and it fails to recover correctly (autorecovery is turned ON). It means that after the recovery, It stops to consume messages and the durable queues has no consumers.

This happens randomly and with different exceptions. Since I am using Vertx Rabbitmq client, I cannot create different channel per consumer/publisher, I have something like 10 publishers and 10 consumers on the same node (same connection same channel) sometimes reaching up to 400 messages per second. Can it cause this kind of issues ??

Other problem can be that I have a queues representing websocket connections to my backend, these websocket connections are short living so it quite often declare and then remove a queue representing this user session. Is that a problem for Rabbitmq ?

From the Rabbitmq log I found following exceptions which were related to the connection crash:

First one :

2021-09-08 08:14:15.321 [error] <0.24867.686> ** Generic server <0.24867.686> terminating
** Last message in was {'$gen_cast',{method,{'basic.publish',0,<<"NOTIFICATION_EXCHANGE">>,<<"BROADCAST_NOTIFICATION_ADDRESS">>,false,false},{content,60,none,<<0,0>>,rabbit_framing_amqp_0_9_1,[<<"{MY_MESSAGE}">>]},flow}}
** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.24846.686>,<0.24862.686>,<0.24846.686>,<<"172.20.24.177:60848 -> 172.20.28.73:5672">>,undefined,{user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]},<<"backend">>,<<"8a4a4f71-251a-4ab6-85f6-50d594db46b9">>,<0.24850.686>,[{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"basic.nack">>,bool,true},{<<"publisher_confirms">>,bool,true},{<<"consumer_cancel_notify">>,bool,true}],none,200,134217728,900000,#{},1000000000},{lstate,<0.24877.686>,false},none,4927,{2,{[{4926,<<"amq.ctag-ckMzQHTLitAIZ86qk0_R0Q">>,1631088855286,{{'backend_SERVICE_QUEUE','[email protected]'},321}}],[{4925,<<"amq.ctag-xDoqkK1VmPptIM9OG3YaPw">>,1631088855286,{{'backend_SERVICE_QUEUE','[email protected]'},321}}]}},#{'backend_9fb541c5-a9ae-4bbc-99a4-ff444316b3f1' => {resource,<<"backend">>,queue,<<"9fb541c5-a9ae-4bbc-99a4-ff444316b3f1">>},'backend_5f188762-0a66-47e3-bfa8-76aa11481cb9' => {resource,<<"backend">>,queue,<<"5f188762-0a66-47e3-bfa8-76aa11481cb9">>},'backend_e4a3cec7-3f6f-4b05-bdf2-ade8a087425d' => {resource,<<"backend">>,queue,<<"e4a3cec7-3f6f-4b05-bdf2-ade8a087425d">>},'backend_e7a9834a-7ddc-4690-a317-4f19433edafd' => {resource,<<"backend">>,queue,<<"e7a9834a-7ddc-4690-a317-4f19433edafd">>},'backend_6fdcc521-4ff1-4721-bf12-cefffe1181eb' => ...,...},...}
** Reason for termination ==
** {{error,noproc},[{rabbit_fifo_client,enqueue,3,[{file,"src/rabbit_fifo_client.erl"},{line,168}]},{rabbit_quorum_queue,deliver,3,[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},{rabbit_amqqueue,'-deliver/3-fun-3-',4,[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},{rabbit_channel,deliver_to_queues,2,[{file,"src/rabbit_channel.erl"},{line,2204}]},{rabbit_channel,handle_method,3,[{file,"src/rabbit_channel.erl"},{line,1375}]},{rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,643}]}]}
2021-09-08 08:14:15.321 [error] <0.24867.686> CRASH REPORT Process <0.24867.686> with 0 neighbours exited with reason: {error,noproc} in rabbit_fifo_client:enqueue/3 line 168 in gen_server2:terminate/3 line 1183
2021-09-08 08:14:15.322 [error] <0.24859.686> Supervisor {<0.24859.686>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.24846.686>, <0.24862.686>, <0.24846.686>, <<"172.20.24.177:60848 -> 172.20.28.73:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]}, <<"backend">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.24850.686>, <0.24877.686>) at <0.24867.686> exit with reason {error,noproc} in rabbit_fifo_client:enqueue/3 line 168 in context child_terminated
2021-09-08 08:14:15.322 [error] <0.24859.686> Supervisor {<0.24859.686>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.24846.686>, <0.24862.686>, <0.24846.686>, <<"172.20.24.177:60848 -> 172.20.28.73:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"backend">>,[],[{rabbit_auth_backend_internal,none}]}, <<"backend">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.24850.686>, <0.24877.686>) at <0.24867.686> exit with reason reached_max_restart_intensity in context shutdown
2021-09-08 08:14:15.323 [error] <0.24846.686> Error on AMQP connection <0.24846.686> (172.20.24.177:60848 -> 172.20.28.73:5672, vhost: 'backend', state: running), channel 1:
{{error,noproc},
[{rabbit_fifo_client,enqueue,3,
[{file,"src/rabbit_fifo_client.erl"},{line,168}]},
{rabbit_quorum_queue,deliver,3,
[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},
{rabbit_amqqueue,'-deliver/3-fun-3-',4,
[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},
{rabbit_channel,deliver_to_queues,2,
[{file,"src/rabbit_channel.erl"},{line,2204}]},
{rabbit_channel,handle_method,3,
[{file,"src/rabbit_channel.erl"},{line,1375}]},
{rabbit_channel,handle_cast,2,
[{file,"src/rabbit_channel.erl"},{line,643}]}]}
2021-09-08 08:14:15.323 [warning] <0.24846.686> Non-AMQP exit reason '{{error,noproc},[{rabbit_fifo_client,enqueue,3,[{file,"src/rabbit_fifo_client.erl"},{line,168}]},{rabbit_quorum_queue,deliver,3,[{file,"src/rabbit_quorum_queue.erl"},{line,754}]},{rabbit_amqqueue,'-deliver/3-fun-3-',4,[{file,"src/rabbit_amqqueue.erl"},{line,2239}]},{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},{rabbit_amqqueue,deliver,3,[{file,"src/rabbit_amqqueue.erl"},{line,2236}]},{rabbit_channel,deliver_to_queues,2,[{file,"src/rabbit_channel.erl"},{line,2204}]},{rabbit_channel,handle_method,3,[{file,"src/rabbit_channel.erl"},{line,1375}]},{rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,643}]}]}'

Another one that I noticed is :

2021-09-02 14:59:43.260 [error] <0.17646.16> closing AMQP connection <0.17646.16> (172.20.23.201:35420 -> 172.20.36.197:5672):
missed heartbeats from client, timeout: 60s
2021-09-02 14:59:43.263 [info] <0.29592.28> Closing all channels from  '172.20.23.201:35420 -> 172.20.36.197:5672' because it has been closed

And the last one :

2021-09-08 09:47:14.948 [warning] <0.7610.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHES3OXCNJJBB4VW does not exist
2021-09-08 09:47:14.948 [warning] <0.7611.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHES0C5F3P5Y6WKO/00000001.segmenterror: enoent
2021-09-08 09:47:14.948 [warning] <0.199.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHES9RZ4RFC37MDG/00000001.segmenterror: enoent
2021-09-08 09:47:14.948 [warning] <0.7611.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHES0C5F3P5Y6WKO does not exist
2021-09-08 09:47:14.948 [warning] <0.199.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHES9RZ4RFC37MDG does not exist
2021-09-08 09:47:14.950 [warning] <0.7600.564> segment_writer: failed to open segment file /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHESAQL00HNNSPK4/00000001.segmenterror: enoent
2021-09-08 09:47:14.950 [warning] <0.7600.564> segment_writer: skipping segment as directory /var/lib/rabbitmq/mnesia/[email protected]/quorum/[email protected]/ORCHESAQL00HNNSPK4 does not exist

On the client side I noticed :

15:11:30.816 [vert.x-eventloop-thread-0]Fail to handle message
com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; cause: java.lang.IllegalStateException: Unsolicited delivery - see Channel.setDefaultConsumer to handle this case.
    at com.rabbitmq.client.impl.AMQConnection.ensureIsOpen(AMQConnection.java:172)
    at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:550)
    at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.createChannel(AutorecoveringConnection.java:165)
    at io.vertx.rabbitmq.impl.RabbitMQClientImpl.forChannel(RabbitMQClientImpl.java:476)
    at io.vertx.rabbitmq.impl.RabbitMQClientImpl.basicPublish(RabbitMQClientImpl.java:194)
    at io.vertx.reactivex.rabbitmq.RabbitMQClient.basicPublish(RabbitMQClient.java:363)
    at io.vertx.reactivex.rabbitmq.RabbitMQClient.lambda$rxBasicPublish$9(RabbitMQClient.java:376)
    at io.vertx.reactivex.rabbitmq.RabbitMQClient$$Lambda$699/0x00000000102c38c0.accept(Unknown Source)
    at io.vertx.reactivex.impl.AsyncResultCompletable.subscribeActual(AsyncResultCompletable.java:44) 

or

08:14:15.330 [AMQP Connection 10.100.56.15:5672] WARN  com.rabbitmq.client.impl.ForgivingExceptionHandler - An unexpected connection driver error occured (Exception message: Connection reset)
08:14:15.332 [AMQP Connection 10.100.56.15:5672] INFO  io.vertx.rabbitmq.impl.RabbitMQClientImpl - RabbitMQ connection shutdown! The client will attempt to reconnect automatically
com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=541, reply-text=INTERNAL_ERROR, class-id=0, method-id=0)
    at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:916)
    at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:906)
    at com.rabbitmq.client.impl.AMQConnection.handleConnectionClose(AMQConnection.java:844)
    at com.rabbitmq.client.impl.AMQConnection.processControlCommand(AMQConnection.java:799)
    at com.rabbitmq.client.impl.AMQConnection$1.processAsync(AMQConnection.java:242)
    at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:178)
    at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111)
    at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:650)
    at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48)
    at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:597)

Since those errors are not clear to my, does anybody has an idea why is this happening ??

Thank you for your help

0

There are 0 best solutions below