Understand strange code found in ReadOnlyMemory<T>

90 Views Asked by At

I looked with IL disassembler into the code of the Length property of the ReadOnlyMemory struct (.NET 461) and found this strange code:

public int Length => this._length & int.MaxValue;

what is the reason of AND-ing with int.MaxValue here?

1

There are 1 best solutions below

0
Dai On BEST ANSWER

Disclaimers:

  • This answer concerns both the original design of Memory<T> (i.e. how it was prior to its redesign in 2018) - and its layout in the builds of the current latest version of System.Memory.dll for .NET Framework 4.x specifically and not the far more recent versions that ship with .NET 5+ - not that they're significant, though - they aren't).

  • (As a disclaimer on style: I see the authors of Memory<T> chose to use underscore prefixes for instance state field names - which is not something I would ever do myself - I'm just using their field-names verbatim for clarity)

  • Note that both Memory<T> and ReadOnlyMemory<T> have identical layouts - the only difference is in their exposed interface design so I'm using the terms ReadOnlyMemory<T> and Memory<T> interchangeably here.


  • The Memory<T> type is a struct type that needs to be lightweight, which means it can't have any more fields than absolutely necessary:
    1. It needs a pointer (strictly speaking, an object-reference) to the actual memory target, such as the byte-array buffer or whathaveyou.
      • In Memory<T> this is represented by an opaque Object field named _object, as it can represent not just a Byte[], but also a Char[] or String or any other type that Span<T> can work with.
    2. It needs an Int32 _length field to store the length of the memory-area/buffer that it's pointing to.
      • .NET has a practical limit of 2GiB per object, so there's no point making this Int64 either (and doing so would make Memory<T> consume 2x more storage).
      • And incidentally, a range of 0-2GiB only requires 31 bits to represent, not 32.
    3. It also needs an Int32 index field to store its current read/write "head" into
  • So that means Memory<T> (if packed in-memory) already consumes 32 + 32 + 32 = 96 bytes on x86, or 64 + 32 + 32 = 128 bytes on x64 - which is pushing at the limit for how big a struct can become.
  • But that's not all: Memory<T> needs 2 more pices of information:
    1. It needs to store information about the type of memory that _object points to so it can dereference it correctly, otherwise it will need to rely on (very expensive) runtime type-tests (e.g. the is operator) before it can safely performing any action, which would be ruinous for performance. (This is because the actual low-level operations for getting data out of a Byte[] is different to a Byte* - or a Char[] vs. a String).
    2. It also needs to know if _object is a reference to a pinned-memory object - as that tells it if it needs to Pin() the memory before it can be used safely or if it can skip that step.
  • So that's two more separate bool fields to store - fortunately boolean values are simple 2-state values that can be represented by a single bit in-memory (whereas the System.Boolean (bool in C#) type uses a whole 8-bits - and four bytes when marshalled to a Win32 BOOL value).
  • Given that Memory<T> is already using 2x 32-bit integers for _index and _length already and both of those fields will only ever hold a 31-bit value - it means there's 2 unused bits right there for the taking: - so it sneakily appropriates the MSBs of _length and _index to store that "is pinned-memory or not" and "is-.NET-array-or-not" state.

In 2018, some point after System.Memory was first released for .NET Core - and also made available for .NET Framework 4.x users - the design of Memory<T> was changed to no-longer require two bit-fields (as type-information about the _object can be carried "for free" within the type-system). Here's the diff, where we can see the old and new versions side-by-side, with their respective explanatory comments about both designs:

Quoeth the source:

// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory
// if (_index >> 31) == 1, object _object is an MemoryManager<T>
// else, object _object is a T[] or a string.
//     if (_length >> 31) == 1, _object is a pre-pinned array, so Pin() will not allocate a new GCHandle
//     else, Pin() needs to allocate a new GCHandle to pin the object.
// It can only be a string if the Memory<T> was created by
// using unsafe / marshaling code to reinterpret a ReadOnlyMemory<char> wrapped around a string as
// a Memory<T>.

In this diff, they implemented the change to reduce this down to just using _index for the "is-pinned-or-not" state thus leaving _length alone. I speculate that because _length is accessed more than _index it made sense to use _index to store the flag-bit instead of _length.


Unfortunately I'm not able to get the exact source for the .NET Framework port of System.Memory (if it even exists). The assembly metadata for our System.Memory.dll v.4.5.5 says it's [assembly: AssemblyInformationalVersion("4.6.31308.01 @BuiltBy: cloudtest-841353dfc000000 @Branch: release/2.1-MSRC @SrcCode: https://github.com/dotnet/corefx/tree/32b491939fbd125f304031c35038b1e14b4e3958")] - but no such commit exists in the public corefx repo.