Worse performance with increased direct_num_workers when running preprocessing of DLRM with Apache Beam

24 Views Asked by Eric At 29 January 2024 at 08:36

I am now trying to run preprocessing tasks of DLRM with Apache Beam https://github.com/tensorflow/models/tree/master/official/recommendation/ranking/preprocessing. The dataset is Criteo Kaggle 10GB and I have used the script shard_balancer.py to split it into 512 subfiles.

The problem is that when I run the program in my local machine (DirectRunner), the performance is even worse when I increase direct_num_workers to a higher value (with one sub-512 file).

I want to know is it necessary to run Apache Beam in Google Cloud Dataflow? Or there is some necessary optimizations? Or this is because I need to read/write from the disk multiple times? Thanks.

I have tried in both AMD EPYC 7313 16-Core Processor and Intel(R) Xeon(R) Gold 6248 CPU. When the input dataset is sub-512, for AMD CPU, the running time is 65s (1 thread), 159s (8 thread), 357s (16 threads); for Intel CPU, the running time is 93s (1 thread), 318s (8 thread), 626s (16 thread).

What I expect is that with increased number of threads, the performance should be better. But more threads only result in worse performance.

Original Q&A

Worse performance with increased direct_num_workers when running preprocessing of DLRM with Apache Beam

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-BEAM

Related Questions in DATA-PREPROCESSING

Trending Questions

Popular # Hahtags

Popular Questions