I have systems that have a large number of cores as well as a cluster. For a particular task for which no serial implementation is available, I can only benchmark w.r.t. time taken for tasks running on different input sizes. I see that even when data size was increased by a factor of 10 times, the time for completion is less than 10 times while using identical resources. I would like to know how to measure the performance, as this does not appear to fall under typical definitions of strong/weak scaling. This appears to be related to efficiency, but I am not certain. From what I could gather about the three:
- Strong scaling (Amdhal's law): speedup = 1 / ( s + p / N ) = T( 1 ) / T( N )
- Weak scaling (Gustafson’s law): scaled speedup = s + p × N
- Efficiency: speedup / N
As I don't have speedup due to lack of serial implementation and that N a is constant, I can only think of finding ratios of efficiencies using strong scaling. Is such a parameter used in CS?
Thanks for this note. The problem gets ground to answer it :
The answer to the questions about the observations on the above depicted problem has nothing to do with DATA-size per-se, the DATA-sizing is important, yet the core understanding is related to the internal functioning of the distributed-computing where overheads matter :
Whereas :
For further reading on Amdahl's argument & Gustafson/Barsis promoted scaling, feel free to continue here.