Data sharding seems not work properly with tf.distribute.MultiWorkerMirroredStrategy()

22 Views Asked by mjrobin At 17 September 2023 at 13:41

I have a model trained under tf.distribute.MultiWorkerMirroredStrategy(), which could be run without errors. However, the training time doesn't decrease as expected compared with training with single-worker.

I checked some details and there're two main things which I suspect there should be something wrong with the autoshard:

Each worker are caching all of the data from my data source.
The outputs per epoch shows a strange accuracy value of 1.9, which is exactly the sum of accuracy on two workers. (checked with 3 workers and accuracy is close to 3 then)

I turn off the shuffle as suggested on this tutorial when using tf.data.Dataset.list_files , but the problem remains.

Original Q&A

Data sharding seems not work properly with tf.distribute.MultiWorkerMirroredStrategy()

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW2.0

Related Questions in DISTRIBUTED-TENSORFLOW

Trending Questions

Popular # Hahtags

Popular Questions