Training the XSeg model for Deepfacelabs fails due to memory error

832 Views Asked by At

I'm new to deepfakes and I'm trying to do the 5XSeg) train.bat and everytime it finishes the filtering I get the following error. I use wf, and tried batch sizes from 1-8, always the same result. I have a Ryzen 5 3600, a 3080 Ti and 16 GB of RAM.

Using 26519 xseg labeled samples.
Traceback (most recent call last):
  File "multiprocessing\queues.py", line 234, in _feed
  File "multiprocessing\reduction.py", line 51, in dumps
MemoryError
Error:
Traceback (most recent call last):
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
Traceback (most recent call last):
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "multiprocessing\queues.py", line 234, in _feed
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
  File "multiprocessing\reduction.py", line 51, in dumps
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[{{node MatMul}}]]
         [[concat_6/concat/_3]]
  (1) Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[{{node MatMul}}]]
0 successful operations.
0 derived errors ignored.
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 263, in update_sample_for_preview
self.get_history_previews()
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 383, in get_history_previews
return self.onGetPreview (self.sample_for_preview, for_history=True)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_XSeg\Model.py", line 209, in onGetPreview
    I, M, IM, = [ np.clip( nn.to_data_format(x,"NHWC", self.model_data_format), 0.0, 1.0) for x in ([image_np,mask_np] + self.view (image_np) ) ]
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_XSeg\Model.py", line 141, in view
return nn.tf_sess.run ( [pred], feed_dict={self.model.input_t :input_np})
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
    run_metadata_ptr)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
    run_metadata)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[node MatMul (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Dense.py:66) ]]
         [[concat_6/concat/_3]]
  (1) Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[node MatMul (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Dense.py:66) ]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
 XSeg/dense1/weight/read (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Dense.py:47)
Reshape_60 (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\ops\__init__.py:182)
Input Source operations connected to node MatMul:
 XSeg/dense1/weight/read (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Dense.py:47)
Reshape_60 (defined at E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\ops\__init__.py:182)
Original stack trace for 'MatMul':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_XSeg\Model.py", line 17, in __init__
super().__init__(*args, force_model_class_name='XSeg', **kwargs)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_XSeg\Model.py", line 103, in on_initialize
    gpu_pred_logits_t, gpu_pred_t = self.model.flow(gpu_input_t, pretrain=self.pretrain)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\facelib\XSegNet.py", line 85, in flow
return self.model(x, pretrain=pretrain)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\XSeg.py", line 124, in forward
x = self.dense1(x)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Dense.py", line 66, in forward
x = tf.matmul(x, weight)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 3655, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5713, in mat_mul
name=name)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
    debug=debug)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_XSeg\Model.py", line 17, in __init__
super().__init__(*args, force_model_class_name='XSeg', **kwargs)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 216, in __init__
self.update_sample_for_preview(choose_preview_history=self.choose_preview_history)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 265, in update_sample_for_preview
self.sample_for_preview = self.generate_next_samples()
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 461, in generate_next_samples
    sample.append ( generator.generate_next() )
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\samplelib\SampleGeneratorBase.py", line 21, in generate_next
self.last_generation = next(self)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\samplelib\SampleGeneratorFace.py", line 112, in __next__
return next(generator)
  File "E:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\joblib\SubprocessGenerator.py", line 73, in __next__
    gen_data = self.cs_queue.get()
  File "multiprocessing\queues.py", line 94, in get
  File "multiprocessing\connection.py", line 216, in recv_bytes
  File "multiprocessing\connection.py", line 318, in _recv_bytes
  File "multiprocessing\connection.py", line 344, in _get_more_data
MemoryError

Reducing the batch size didn't help as well as increasing the page file. I tried to Google it but I couldn't find a solution.

0

There are 0 best solutions below