I have some questions about the following code
using UnityEngine;
private void Calc(int n) {
float time = Time.realtimeSinceStartup;
(int fact, int fact1, int fact2, int fact3) =(1,1,1,1);
for (int i = 1; i < n; i+=4) {
fact *= i;
fact1 *= i+1;
fact2 *= i+2;
fact3 *= i+3;
}
time = Time.realtimeSinceStartup - time;
Debug.Log("calc: "+time);
}
private void Calc_2(int n) {
float time = Time.realtimeSinceStartup;
int fact = 1;
for (int i = 2; i < n; i++) {
fact *= i;
}
time = Time.realtimeSinceStartup - time;
Debug.Log("calc_2: "+time);
}
When n=2000000000, the calculation time of the two pieces of code is very different
calc: 0.4483032
calc_2: 1.34198
why is there such a big difference in performance between two similar pieces of code?
not coding in your environment so I might be wrong but modern CPUs can do several independent instructions at the "same" time (how many depends on how many prefetch statges there is inside CPU core)
if you unrol naive iteration:
you got:
Now if you unrol your faster version:
you (on good compiler) should got something like this:
so still a lot of waits but when you look closer you wait for instructions that was executed in most cases 4 instructions before so they most likely already executed so they might not wait a at all and if your CPU core has at least 4 prefetch stages it might execute 4 iterations at once ... if you enlarge the number of executed instructions per iterations the performance will stop improving once you have more "paralel" computations than you have prefetch stages on your CPU.
Also the faster version has 4 times less brunches (as it has 4 times less loop iterations) which has also some minor impact on speed.
However once you change your factorial to bigint as
2000000000!will not fit 32 nor 64 bitintthis will no longer matter because even basic operations like++,--,+,-,*,/will no longer be just single instruction but entire program with its own complexity ...For such high results you need to combine fast algorithms like these:
or better...