How reboot remote machine via ssh avoiding timeout?

62 Views Asked by At

I have this part of code where I am trying to reboot a remote machine over ssh :

def check_and_reboot(ssh, hostname):
    # if not ssh_connect(ssh, hostname):
    #     print("SSH failed, restarting {}".format(hostname))
    # Step 2: If connection fails, reboot the machine
    if not reboot_machine(ssh, hostname):
        print("Failed to reboot machine {}. Exiting.".format(hostname))
        # return

    # Step 3: Run the command over SSH
    command = "ls -asl /raid/dfs/aiceph"
    ssh_connect(ssh, hostname)
    output, exit_code = execute_remote_command(ssh, command)

    # Step 4: If the command returns an error code, reboot the machine
    if exit_code != 0:
        try:
            # Execute the reboot command
            _, stdout, stderr = ssh.exec_command("sudo nohup /sbin/reboot -f > /dev/null 2>&1 &")

            # Wait for the command to complete
            exit_status = stdout.channel.recv_exit_status()

            # If the exit status is 0, the reboot command was sent successfully
            if exit_status == 0:
                print("Reboot command sent successfully")
                return True
            else:
                print("Failed to send reboot command with exit status {}".format(exit_status))
                return False

        except Exception as e:
            # If there is an exception, print the error and return False
            print("Error sending reboot command: {}".format(str(e)))
            return False

    # Step 5: Validation successful
    print("Validation successful for machine {}".format(hostname))

I had set the exit_code = 1 to verify the flow. And it appears that my code happens to be stuck at step 4 as my code keeps waiting for the prompt from the reboot command. How can I overcome this without having to wait for the response from the reboot command?

1

There are 1 best solutions below

0
jeb On

An explicit exit should avoid the timeout problem.
But it is not reliable, when it is appended to the reboot command, like in

ssh <HOST> "sudo reboot now; exit 42"

This works sometimes, but not always (Race condition).

Therefore the connection should be closed, before the shutdown command is executed.

ssh <HOST> "nohup bash -c 'sleep 1; sudo reboot now' & exit 42"

This starts a new bash with sleep 1; sudo reboot now with nohup, so it runs even after the ssh connection is lost.
The exit 42 should always executes before the sleep is done, but if you are paranoid or your system have a high load, you could couple it stronger.

ssh <HOST> 'nohup bash -c "tail --pid=$$ -f /dev/null; sudo reboot now" & exit 42'

This waits until the parent bash is closed by the exit 42