Memory Barriers on x86

Recently, there have been some interesting discussions at work going on around the correctness of system critical code and how memory barriers actually operate on x86. This is a very poorly understood concept mainly due to the ambiguity of the Intel / AMD manuals on the subject.

x86 has a much stronger memory model than other RISC architectures like ARM making it easier for programmers to make assumptions on memory ordering semantics. The two simple rules to remember while writing concurrent or lock free code on x86 are:

The x86 programming model allows for reordering of memory loads and stores as long as they never change the execution of a single threaded program.

AND

Loads can be reordered with stores to different locations.

To put it differently, a single threaded program will always see memory reads and writes in program order and only loads are reordered with stores to different locations. The following rules hold true (as laid out by Bartosz here):

  1. Loads are not reordered with other loads.
  2. Stores are not reordered with other stores.
  3. Stores are not reordered with older loads.
  4. In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).
  5. In a multiprocessor system, stores to the same location have a total order.
  6. In a multiprocessor system, locked instructions have a total order.
  7. Loads and stores are not reordered with locked instructions.

One important point to remember here is that we are talking about instructions in program order that the compiler emitted during the compilation phase. The compiler itself is also free to reorder instructions in any way it sees fit to optimize your code! What this means is that it is not just the processor we need to worry about, but also the compiler. Using compiler memory barriers can help in ensuring this.

To prevent the processor from reordering loads with stores, x86 provides for the lfence / sfence / mfence memory barrier instructions that cause the processor to drain the load or store buffers ensuring that no stores or loads may be reordered across this instruction. A certain class of instructions on the x86 also cause serialization of memory loads and stores. These instructions include invalidating page tables (INVLPG), CPUID, IRET, interrupts, exceptions and a whole bunch of others documented here along with any lock prefix instructions.

As compared to other processors with weak memory models, x86 is a much easier platform to program on due to the stronger guarantees it provides on memory ordering. But even so, its very easy to let your guard down on such a platform and let the compiler and processor hoodwink you into introducing hard to reproduce consistency bugs.

Leave a Reply

Your email address will not be published. Required fields are marked *