Often in internet I find that LFENCE makes no sense in processors x86, ie it does nothing , so instead MFENCE we can absolutely painless to use SFENCE, because MFENCE = SFENCE + LFENCE = SFENCE + NOP = SFENCE.
But if LFENCE does not make sense, then why we have four approaches to make Sequential Consistency in x86/x86_64:
LOAD(without fence) andSTORE+MFENCELOAD(without fence) andLOCK XCHGMFENCE+LOADandSTORE(without fence)LOCK XADD( 0 ) andSTORE(without fence)
Taken from here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
As well as performances from Herb Sutter on page 34 at the bottom: https://skydrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&wdo=2&authkey=!AMtj_EflYn2507c
If LFENCE did not do anything, then the approach (3) would have the following meanings: SFENCE + LOAD and STORE (without fence), but there is no point in doing SFENCE before LOAD. Ie if LFENCE does nothing , the approach (3) does not make sense.
Does it make any sense instruction LFENCE in processors x86/x86_64?
ANSWER:
1. LFENCE required in cases which described in the accepted answer, below.
2. The approach (3) should be viewed not independently, but in combination with the previous commands. For example, approach (3):
MFENCE
MOV reg, [addr1] // LOAD-1
MOV [addr2], reg //STORE-1
MFENCE
MOV reg, [addr1] // LOAD-2
MOV [addr2], reg //STORE-2
We can rewrite the code of approach (3) as follows:
SFENCE
MOV reg, [addr1] // LOAD-1
MOV [addr2], reg //STORE-1
SFENCE
MOV reg, [addr1] // LOAD-2
MOV [addr2], reg //STORE-2
And here SFENCE makes sense to prevent reordering STORE-1 and LOAD-2. For this after STORE-1 command SFENCE flushes Store-Buffer.
Bottom line (TL;DR):
LFENCEalone indeed seems useless for memory ordering, however it does not makeSFENCEa substitute forMFENCE. The "arithmetic" logic in the question is not applicable.Here is an excerpt from Intel's Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in another answer
From here, it follows that:
MFENCEis a full memory fence for all operations on all memory types, whether non-temporal or not.SFENCEonly prevents reordering of writes (in other terminology, it's a StoreStore barrier), and is only useful together with non-temporal stores and other instructions listed as exceptions.LFENCEprevents reordering of reads with subsequent reads and writes (i.e. it combines LoadLoad and LoadStore barriers). However, the first two bullets say that LoadLoad and LoadStore barriers are always in place, no exceptions. ThereforeLFENCEalone is useless for memory ordering.To support the last claim, I looked at all places where
LFENCEis mentioned in all 3 volumes of Intel's manual, and found none which would say thatLFENCEis required for memory consistency. EvenMOVNTDQA- the only non-temporal load instruction so far - mentionsMFENCEbut notLFENCE.Update: see answers on Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE? for correct answers to the guesswork below
Whether
MFENCEis equivalent to a "sum" of other two fences or not is a tricky question. At glance, among the three fence instructions onlyMFENCEprovides StoreLoad barrier, i.e. prevents reordering of reads with earlier writes. However the correct answer requires to know more than the above rules; namely, it's important that all fence instructions are ordered with respect to each other. This makes theSFENCE LFENCEsequence more powerful than a mere union of individual effects: this sequence also prevents StoreLoad reordering (because loads cannot passLFENCE, which cannot passSFENCE, which cannot pass stores), and thus constitutes a full memory fence (but also see the note (*) below). Note however that order matters here, and theLFENCE SFENCEsequence does not have the same synergy effect.However, while one can say that
MFENCE ~ SFENCE LFENCEandLFENCE ~ NOP, that does not meanMFENCE ~ SFENCE. I deliberately use equivalence (~) and not equality (=) to stress that arithmetic rules do not apply here. The mutual effect ofSFENCEfollowed byLFENCEmakes the difference; even though loads are not reordered with each other,LFENCEis required to prevent reordering of loads withSFENCE.(*) It still might be correct to say that
MFENCEis stronger than the combination of the other two fences. In particular, a note toCLFLUSHinstruction in the volume 2 of Intel's manual says that "CLFLUSHis only ordered by theMFENCEinstruction. It is not guaranteed to be ordered by any other fencing or serializing instructions or by anotherCLFLUSHinstruction."(Update,
clflushis now defined as strongly ordered (like a normal store, so you only needmfenceif you want to block later loads), butclflushoptis weakly ordered, but can be fenced bysfence.)