Getting error (No module named 'scipy') for Lora training with Kohya

76 Views Asked by At

Unfortunately I'm not able to train Lora on dreambooth and I'm getting these errors. Does anybody could tell me what's wrong? Any help would be appreciated :)

`The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 2024-01-31 20:03:04.188282: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT prepare tokenizerprepare tokenizer

Using DreamBooth method.Using DreamBooth method.

====================================== oduleNotFoundError: No module named 'scipy'

During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/igpu/sd/kohya_ss/./train_network.py", line 1012, in trainer.train(args) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 804003) of binary: /home/igpu/sd/kohya_ss/venv/bin/python Traceback (most recent call last): File "/home/igpu/sd/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 977, in launch_command multi_gpu_launcher(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher distrib_run.run(args) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/igpu/sd/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ./train_network.py FAILED Failures: [1]: time : 2024-01-31_20:03:38 host : igpu rank : 1 (local_rank: 1) exitcode : 1 (pid: 804004) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2024-01-31_20:03:38 host : igpu rank : 0 (local_rank: 0) exitcode : 1 (pid: 804003) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I tried to train a model through Lora Dreambooth through kohya.

I have 2 RTX 4090 GPU installed.`

0

There are 0 best solutions below