Celery consumer connection to broker (rabbitmq) lost

746 Views Asked by At

I have celery running in kubernetes pod. The command which I use to run celery is celery -A mycelery worker --time-limit 14400 -c 8 -Q myqueue1,myqueue2 --hostname mycelery_auto_worker@%h --loglevel INFO, and the installed packages are as below.

amqp==5.1.1
async-timeout==4.0.2
billiard==3.6.4.0
celery==5.2.7
certifi==2022.9.14
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
croniter==1.3.7
cryptography==38.0.1
Deprecated==1.2.13
docker==4.3.0
elastic-enterprise-search==7.14.0
elastic-transport==7.16.0
idna==3.4
kombu==5.2.4
ldap3==2.9.1
marshmallow==2.20.5
mongoengine==0.24.2
mysqlclient==2.0.1
netaddr==0.8.0
packaging==21.3
prettytable==3.4.1
prompt-toolkit==3.0.31
pyasn1==0.4.8
pycparser==2.21
PyJWT==2.5.0
pymongo==4.2.0
pymsteams==0.2.1
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.2.1
PyU4V==10.0.0.16
redis==4.3.4
requests==2.28.1
six==1.16.0
urllib3==1.26.12
vine==5.0.0
wcwidth==0.2.5
websocket-client==1.4.1
wrapt==1.14.1

Its working fine, but in between, it loose connection and gives error for connection time out as below.

2022-09-23T08:16:57.086066685-04:00 stderr F [2022-09-23 12:16:57,085: INFO/MainProcess] missed heartbeat from myworker1@a0b963a2e03f
2022-09-23T08:16:57.086119393-04:00 stderr F [2022-09-23 12:16:57,086: INFO/MainProcess] missed heartbeat from myworker2@3312756a8f0d
2022-09-23T08:17:09.05803789-04:00 stderr F [2022-09-23 12:17:09,054: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
2022-09-23T08:17:09.058110205-04:00 stderr F Traceback (most recent call last):
2022-09-23T08:17:09.058118751-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 332, in start
2022-09-23T08:17:09.058124009-04:00 stderr F     blueprint.start(self)
2022-09-23T08:17:09.058128801-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
2022-09-23T08:17:09.058133403-04:00 stderr F     step.start(parent)
2022-09-23T08:17:09.058137355-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 628, in start
2022-09-23T08:17:09.058141089-04:00 stderr F     c.loop(*c.loop_args())
2022-09-23T08:17:09.058145102-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/celery/worker/loops.py", line 97, in asynloop
2022-09-23T08:17:09.058150138-04:00 stderr F     next(loop)
2022-09-23T08:17:09.058153921-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
2022-09-23T08:17:09.058157717-04:00 stderr F     cb(*cbargs)
2022-09-23T08:17:09.058161636-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/kombu/transport/base.py", line 235, in on_readable
2022-09-23T08:17:09.058165427-04:00 stderr F     reader(loop)
2022-09-23T08:17:09.058169378-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/kombu/transport/base.py", line 217, in _read
2022-09-23T08:17:09.058174718-04:00 stderr F     drain_events(timeout=0)
2022-09-23T08:17:09.058178592-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/amqp/connection.py", line 525, in drain_events
2022-09-23T08:17:09.058182378-04:00 stderr F     while not self.blocking_read(timeout):
2022-09-23T08:17:09.058186046-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/amqp/connection.py", line 530, in blocking_read
2022-09-23T08:17:09.058189642-04:00 stderr F     frame = self.transport.read_frame()
2022-09-23T08:17:09.058193507-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/amqp/transport.py", line 294, in read_frame
2022-09-23T08:17:09.058197122-04:00 stderr F     frame_header = read(7, True)
2022-09-23T08:17:09.058218305-04:00 stderr F   File "/usr/local/lib/python3.9/site-packages/amqp/transport.py", line 627, in _read
2022-09-23T08:17:09.058222178-04:00 stderr F     s = recv(n - len(rbuf))
2022-09-23T08:17:09.058225496-04:00 stderr F TimeoutError: [Errno 110] Connection timed out

After this, there is warning related to late acknowledge task as below.

2022-09-26T10:49:18.975973308Z stderr F [2022-09-26 10:49:18,975: WARNING/MainProcess] /usr/local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:367: CPendingDeprecationWarning:
2022-09-26T10:49:18.976021014Z stderr F In Celery 5.1 we introduced an optional breaking change which
2022-09-26T10:49:18.976034128Z stderr F on connection loss cancels all currently executed tasks with late acknowledgement enabled.
2022-09-26T10:49:18.976043266Z stderr F These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered back to the queue.
2022-09-26T10:49:18.976052858Z stderr F You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss setting.
2022-09-26T10:49:18.976060383Z stderr F In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.

I already set the broker_transport_options as below.

broker_connection_timeout = 10.0
broker_transport_options = {
    "max_retries": 3,
    "interval_start": 0,
    "interval_step": 0.2,
    "interval_max": 0.5,
    "retry_policy": {
        "timeout": 5.0
    }
}

but still it not help. I check the rabbitmq logs, but that also not give proper answer, it has some errors as client unexpectedly closed TCP connection, but not sure, that might be the cause of this consumer connection lost.

2022-09-23 20:20:44.339 [warning] <0.11840.2379> closing AMQP connection <0.11840.2379> (10.4.106.3:40596 -> 172.17.102.130:5672, vhost: 'my-vhost', user: 'my-vhost'):
client unexpectedly closed TCP connection
2022-09-23 20:20:44.710 [warning] <0.3980.2377> closing AMQP connection <0.3980.2377> (10.15.192.78:53718 -> 172.17.102.130:5672, vhost: 'my-vhost', user: 'my-vhost'):
client unexpectedly closed TCP connection
2022-09-23 20:20:44.934 [warning] <0.19511.2380> closing AMQP connection <0.19511.2380> (10.15.192.90:40776 -> 172.17.102.130:5672, vhost: 'my-vhost', user: 'my-vhost'):
client unexpectedly closed TCP connection
2022-09-23 20:20:45.577 [warning] <0.8721.2386> closing AMQP connection <0.8721.2386> (10.15.192.83:41954 -> 172.17.102.130:5672, vhost: 'my-vhost', user: 'my-vhost'):
client unexpectedly closed TCP connection
2022-09-23 20:20:51.147 [info] <0.23491.2386> accepting AMQP connection <0.23491.2386> (10.4.106.3:39168 -> 172.17.102.130:5672)
2022-09-23 20:20:51.150 [info] <0.23491.2386> connection <0.23491.2386> (10.4.106.3:39168 -> 172.17.102.130:5672): user 'my-vhost' authenticated and granted access to vhost 'my-vhost'
2022-09-23 20:20:52.955 [info] <0.18223.2385> accepting AMQP connection <0.18223.2385> (10.4.106.3:51670 -> 172.17.102.130:5672)
2022-09-23 20:20:52.959 [info] <0.18223.2385> connection <0.18223.2385> (10.4.106.3:51670 -> 172.17.102.130:5672): user 'my-vhost' authenticated and granted access to vhost 'my-vhost'
2022-09-23 20:21:05.505 [warning] <0.6386.2362> closing AMQP connection <0.6386.2362> (10.15.192.79:40658 -> 172.17.102.130:5672, vhost: 'my-vhost', user: 'my-vhost'):
client unexpectedly closed TCP connection

Is there any other way to retry connection? when connection lost, let me know.

0

There are 0 best solutions below