Can somebody please explain the following TensorFlow terms
inter_op_parallelism_threadsintra_op_parallelism_threads
or, please, provide links to the right source of explanation.
I have conducted a few tests by changing the parameters, but the results have not been consistent to arrive at a conclusion.
The
inter_op_parallelism_threadsandintra_op_parallelism_threadsoptions are documented in the source of thetf.ConfigProtoprotocol buffer. These options configure two thread pools used by TensorFlow to parallelize execution, as the comments describe:There are several possible forms of parallelism when running a TensorFlow graph, and these options provide some control multi-core CPU parallelism:
If you have an operation that can be parallelized internally, such as matrix multiplication (
tf.matmul()) or a reduction (e.g.tf.reduce_sum()), TensorFlow will execute it by scheduling tasks in a thread pool withintra_op_parallelism_threadsthreads. This configuration option, therefore, controls the maximum parallel speedup for a single operation. Note that if you run multiple operations in parallel, these operations will share this thread pool.If you have many operations that are independent in your TensorFlow graph— because there is no directed path between them in the dataflow graph— TensorFlow will attempt to run them concurrently, using a thread pool with
inter_op_parallelism_threadsthreads. If those operations have a multithreaded implementation, they will (in most cases) share the same thread pool for intra-op parallelism.Finally, both configuration options take a default value of
0, which means "the system picks an appropriate number." Currently, this means that each thread pool will have one thread per CPU core in your machine.