Unfortunately I'm not able to train Lora on dreambooth and I'm getting these errors. Does anybody could tell me what's wrong? Any help would be appreciated :)
`The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 2024-01-31 20:03:04.188282: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT prepare tokenizerprepare tokenizer
Using DreamBooth method.Using DreamBooth method.
====================================== oduleNotFoundError: No module named 'scipy'
During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/igpu/sd/kohya_ss/./train_network.py", line 1012, in trainer.train(args) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 804003) of binary: /home/igpu/sd/kohya_ss/venv/bin/python Traceback (most recent call last): File "/home/igpu/sd/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 977, in launch_command multi_gpu_launcher(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher distrib_run.run(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ./train_network.py FAILED Failures: [1]: time : 2024-01-31_20:03:38 host : igpu rank : 1 (local_rank: 1) exitcode : 1 (pid: 804004) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2024-01-31_20:03:38 host : igpu rank : 0 (local_rank: 0) exitcode : 1 (pid: 804003) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I tried to train a model through Lora Dreambooth through kohya.
I have 2 RTX 4090 GPU installed.`