What does L2 poison mean in CPU?

180 Views Asked by At

I have encountered the same problem as this. What does L2 poison mean?

I'm using AMD CPU.

1

There are 1 best solutions below

0
Peter Cordes On

I'd guess "L2 poison" means setting L2 cache entries to have wrong ECC values, so a cache hit (or even miss?) results in an uncorrectable ECC error. (See also https://lwn.net/Articles/348886/ which talks about "memory poisoning" and Linux's HWPOISON patch from 2009).

In general, "poisoning" in programming is when you initialize something to a state that will fault if used or at least have a recognizable pattern, so you can detected errors like read-uninitialized. For example, MSVC debug builds "poison" stack memory with 0xcc bytes, which forms an invalid pointer that will fault if dereferenced, and be easily recognizable in the fault address of an exception. And it's x86 machine code for an int3 debug breakpoint, in case you somehow manage to jump to a stack address by accident. (This isn't hardening against code-injection, but it could conceivably catch a case where bad inline asm left ESP pointing to a pointer to stack memory instead of a return address.)

IDK if current CPUs actually have a mechanism for software to poison L2 cache, either the whole thing or the entry for a specific cache-line.

In any case, the actual cause of the error in the question you linked appears to be uncorrectable ECC errors in DRAM, or in transmission of data from DRAM to memory controllers. Apparently the machine-check exception or whatever gets raised doesn't distinguish between cache ECC failure vs. DRAM? Or just Linux's driver doesn't?

(It's normal for most levels of cache to be protected by ECC, if they're write-back. It's been claimed that Intel L1d caches only use parity to keep overhead low while supporting byte stores and wider misaligned stores with full performance as long as they don't cross a cache-line boundary. Are there any modern CPUs where a cached byte store is actually slower than a word store? - yes, apparently most non-x86! I haven't seen discussion about AMD, but I assume AMD CPUs really do support full-performance unaligned stores in all cases where the same unaligned load wouldn't have any penalty. I phrased it that way because some AMD CPUs have a penalty (and lack of atomicity) for crossing a 32-byte boundary, not just a 64-byte cache-line boundary.)


Semi-related: https://en.wikipedia.org/wiki/Cache_poisoning - Wikipedia only mentions stuff like DNS cache or ARP cache poisoning, where invalid entries actually get used after an attacker created some invalid entries via a vulnerability.