Here is what I measured under Cascade lake platform:
Intel(R) Memory Latency Checker - v3.11
Command line parameters: --c2c_latency
Measuring cache-to-cache transfer latency (in ns)...
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 113.0
1 112.6 -
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
Reader Numa Node
Writer Numa Node 0 1
0 - 176.5
1 173.5 -
In this case, remote L2->L2 data access shows different latency. Data address homed in reader socket shows higher latency than in writer socket.
The results is correct. I just want to know how to explain these results. I think for both cases, data is transferred from remote L2 cache. However, it seems like CPU do more things when data address is homed in reader's socket?