I try to get dask sshcluster (https://docs.dask.org/en/latest/deploying-ssh.html) up and running results in:
RuntimeError: Cluster failed to start: Worker failed to start
Python = = 3.11.6
Dask == 2023.11.0
Distributed == 2023.11.0
Windows 11 Pro, Version 10.0.22621 Build 22621
PyCharm 2023.2.5 (Community Edition), Build #PC-232.10227.11, built on November 14, 2023
If i run the following little programm:
from dask.distributed import SSHCluster,Client,
cluster=SSHCluster(["localhost"])
client=Client(cluster)
I get the following error message during executing in PyCharm:
cluster=SSHCluster(["localhost"])
DEBUG:distributed.deploy.ssh:Created Scheduler Connection
WARNING:distributed.deploy.spec:Cluster closed without starting up
Traceback (most recent call last):
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\spec.py", line 325, in _start
self.scheduler = await self.scheduler
^^^^^^^^^^^^^^^^^^^^
File "D:\\Users\\fourb\\anaconda3\\envs\\Py11\\Lib\\site-packages\\distributed\\deploy\\spec.py", line 74, in await self.start()
File "D:\\Users\\fourb\\anaconda3\\envs\\Py11\\Lib\\site-packages\\distributed\\deploy\\ssh.py", line 250, in start
raise Exception(
Exception: Scheduler failed to set DASK_INTERNAL_INHERIT_CONFIG variable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\\Users\\fourb\\anaconda3\\envs\\Py11\\Lib\\site-packages\\distributed\\utils.py", line 408, in f
result = yield future
^^^^^^^^^^^^
File "D:\\Users\\fourb\\anaconda3\\envs\\Py11\\Lib\\site-packages\\tornado\\gen.py", line 767, in run
value = future.result()
^^^^^^^^^^^^^^^
File "D:\\Users\\fourb\\anaconda3\\envs\\Py11\\Lib\\site-packages\\distributed\\deploy\\spec.py", line 335, in \_start
raise RuntimeError(f"Cluster failed to start: {e}") from e
RuntimeError: Cluster failed to start: Scheduler failed to set DASK_INTERNAL_INHERIT_CONFIG variable
Here is a hint how to solve this issue: changing in ssh.py row 244 cmd /c ver to cmd.exe /c ver form https://github.com/dask/distributed/issues/5411
I changed that in ssh.py. After that I run in the following error:
DEBUG:distributed.deploy.ssh:Created Scheduler Connection
INFO:distributed.deploy.ssh:Traceback (most recent call last):
INFO:distributed.deploy.ssh:File "<frozen runpy>", line 198, in _run_module_as_main
INFO:distributed.deploy.ssh:File "<frozen runpy>", line 88, in _run_code
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\cli\dask_spec.py", line 67, in <module>
INFO:distributed.deploy.ssh:main()
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\click\core.py", line 1157, in call
INFO:distributed.deploy.ssh:return self.main(*args, **kwargs)
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\click\core.py", line 1078, in main
INFO:distributed.deploy.ssh:rv = self.invoke(ctx)
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\click\core.py", line 1434, in invoke
INFO:distributed.deploy.ssh:return ctx.invoke(self.callback, **ctx.params)
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\click\core.py", line 783, in invoke
INFO:distributed.deploy.ssh:return __callback(*args, **kwargs)
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\cli\dask_spec.py", line 33, in main
INFO:distributed.deploy.ssh:spec.update(json.loads(spec))
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\json_init.py", line 346, in loads
INFO:distributed.deploy.ssh:return _default_decoder.decode(s)
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\json\decoder.py", line 337, in decode
INFO:distributed.deploy.ssh:obj, end = self.raw_decode(s, idx=_w(s, 0).end())
INFO:distributed.deploy.ssh:^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:distributed.deploy.ssh:File "D:\Users\fourb\anaconda3\envs\Py11\Lib\json\decoder.py", line 355, in raw_decode
INFO:distributed.deploy.ssh:raise JSONDecodeError("Expecting value", s, err.value) from None
INFO:distributed.deploy.ssh:json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
WARNING:distributed.deploy.spec:Cluster closed without starting up
Traceback (most recent call last):
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\spec.py", line 325, in _start
self.scheduler = await self.scheduler
^^^^^^^^^^^^^^^^^^^^
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\spec.py", line 74, in _await self.start()
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\ssh.py", line 270, in startraise
Exception("Worker failed to start")
Exception: Worker failed to start
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\utils.py", line 408, in
fresult = yield future
^^^^^^^^^^^^
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\tornado\gen.py", line 767, in
runvalue = future.result()
^^^^^^^^^^^^^^^
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\spec.py", line 335, in _startraise
RuntimeError(f"Cluster failed to start: {e}") from e
RuntimeError: Cluster failed to start: Worker failed to start
Do you habe any hints how to solve this issue? The SSH is configured to work keylessly.
If I start the cluster from comand line: https://docs.dask.org/en/latest/deploying-cli.html it work:
(Py11) C:\Users\fourb>dask scheduler
-----------------------------------------------
WARNING:bokeh.server.util:Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly`
State start
Clear task state
Scheduler at: tcp://192.168.178.69:8786`
dashboard at: http://192.168.178.69:8787/status
Registering Worker plugin shuffle`
Register worker <WorkerState 'tcp://192.168.178.69:6057', status: init, memory: 0, processing: 0>`
Starting worker compute stream, tcp://192.168.178.69:60357`
Worker status init -> running - <WorkerState 'tcp://192.168.178.69:60357', status: running, memory: 0, processing: 0>` ```
(Py11) C:\Users\fourb>dask worker tcp://192.168.178.69:8786`
WARNING:bokeh.server.util:Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitlyStart worker at: tcp://192.168.178.69:60357Listening to: tcp://192.168.178.69:60357dashboard at: 192.168.178.69:60358Waiting to connect to: tcp://192.168.178.69:8786
Threads: 24
Memory: 31.92 GiB
Local Directory: C:\Users\fourb\AppData\Local\Temp\dask-scratch-space\worker-z6d6ss4p
-------------------------------------------------
Starting Worker plugin shuffle
Registered to: tcp://192.168.178.69:8786
Heartbeat: tcp://192.168.178.69:60357
Heartbeat: tcp://192.168.178.69:60357
Heartbeat: tcp://192.168.178.69:60357
Heartbeat: tcp://192.168.178.69:60357
Heartbeat: tcp://192.168.178.69:60357
Heartbeat: tcp://192.168.178.69:60357
If I try the following:
(Py11) C:\Users\fourb>dask-ssh localhost
D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\cli\dask_ssh.py:150: FutureWarning: dask-ssh is deprecated and will be removed in a future release; use `dask ssh\` instead
warnings.warn(
---------------------------------------------------------------
Dask.distributed v2023.11.0
Worker nodes: 10: localhost
scheduler node: localhost:8786
[ worker localhost ] : D:\Users\fourb\anaconda3\envs\Py11\python.exe -m distributed.cli.dask_worker localhost:8786 --nthreads 0 --host localhost --memory-limit auto
[ scheduler localhost:8786 ] : D:\Users\fourb\anaconda3\envs\Py11\python.exe -m distributed.cli.dask_scheduler --port 8786
Exception in thread Thread-2 (async_ssh):
Traceback (most recent call last):
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\distributed\deploy\old_ssh.py", line 198, in async_ssh
channel.send(b"\x03") # Ctrl-C
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\paramiko\channel.py", line 799, in send
return self._send(s, m)
^^^^^^^^^^^^^^^^
File "D:\Users\fourb\anaconda3\envs\Py11\Lib\site-packages\paramiko\channel.py", line 1196, in _send
raise socket.error("Socket is closed")
OSError: Socket is closed