I have a keras model that has a ModelCheckpoint callback.
when I set the path in the callback to tmp folder, it works perfectly, but when I set it to another folder that called kaggle I get an error.
The error is quite long, and this is the last part of it:
21/22 [===========================>..] - ETA: 0s - loss: 0.7804 - acc: 0.50482020-04-28 17:36:20.771950: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]]
2020-04-28 17:36:20.778527: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]]
[[dense_1_target/_2]]
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/DIRECTORY2/train.py", line 76, in <module>
Train(args)
File "/DIRECTORY2/train.py", line 28, in __init__
Train.train(params.read(configs))
File "/DIRECTORY2/train.py", line 69, in train
verbose = 1)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
batch_outs = batch_function(*batch_data)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3443, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in __call__
return self._call_flat(args)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 660, in _call_flat
outputs = self._inference_function.call(ctx, args)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 434, in call
ctx=ctx)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]] [Op:__inference_keras_scratch_graph_3082]
I printed the permission for both folders and it looks that they have the same permission!

Edited (1):
The directory that caused the error was transfered to my linux user using WinSCP program from another windows machine, while the other one (tmp) was created by locally in linux.
Edited (2):
I deleted the directory that caused the error and created the same one locally and the error disapeared!. I'm quite sure that the error is due to directories permissions but I don't know what was the source.