How does Spark perform I/O?

849 Views Asked by At

It is my understanding that Spark uses parallel IO to read files. That conclusion comes from other stack overflow responses.

My question is does spark read data using an independent approach or a collective approach? In other words, does each worker read a set chunk of data, or do the workers communicate with each other and collaborate to efficiently read data?

2

There are 2 best solutions below

2
Yugerten On

Each Apache Spark workers has Executors, Workers can be deployed as distributed or standalone mode.
Each Worker process its own data that it processes. For more detail see this answer or this link

0
A Khe On

The workers communicate by the driver And each worker process its own data