How does ECL distribute the computation internally?

32 Views Asked by At

Let us consider the following scenario:

I have uploaded some ECL code to work on a dataset of large size. The cluster has finished computing the result and has returned the output in the form of a workunit. However, the nature of the output isn't of particular concern in this scenario.

What I wish to know is how the computation has been handled internally within the HPCC architecture. In specific, I want to know how many nodes of the cluster have been involved in the computation and how much memory/space each node has used for its computation.

Please inform me of a simple way to find the above mentioned information.

PS: The Graph under Metrics tab of an ECL workunit may provide details on the distributed computation of any workunit. Links to video materials or text documents that teach how to interpret said graph would be greatly appreciated.

1

There are 1 best solutions below

0
Richard Taylor On

A Thor cluster workunit executes exactly the same object code (the compiled .SO file) on every node, in parallel. That way, each node performs the same operations on the data that is on that node, or sent to that node during execution. If a particular node has no data and none is sent to it, no work is done on that node.

Any disk write stores the current data on each node to that node's storage (its local drive for bare metal, or the data plane on cloud native) as that node's file part. File parts all aggregate to a single logical file through the DFU (Distributed File Utility).

If your data is evenly distributed, the performance on each node will be very similar. That is the "ideal scene" that we're always aiming at.

But if your data is heavily skewed, the slowest node will be the one with the most data to work on (and the fastest, the least). So, if you want to improve workunit performance, the first/best thing to look at is data skew (which you can see in the execution graphs in ECL Watch).