Understand strange code found in ReadOnlyMemory<T>

Question

Understand strange code found in ReadOnlyMemory<T>

90 Views Asked by codymanix At 05 March 2024 at 14:44

I looked with IL disassembler into the code of the Length property of the ReadOnlyMemory struct (.NET 461) and found this strange code:

public int Length => this._length & int.MaxValue;

what is the reason of AND-ing with int.MaxValue here?

Original Q&A

There are 1 best solutions below

**Dai** · Accepted Answer · 2024-03-05T14:57:16.373000

Disclaimers:

This answer concerns both the original design of Memory<T> (i.e. how it was prior to its redesign in 2018) - and its layout in the builds of the current latest version of System.Memory.dll for .NET Framework 4.x specifically and not the far more recent versions that ship with .NET 5+ - not that they're significant, though - they aren't).
_{(As a disclaimer on style: I see the authors of Memory<T> chose to use underscore prefixes for instance state field names - which is not something I would ever do myself - I'm just using their field-names verbatim for clarity)}
Note that both Memory<T> and ReadOnlyMemory<T> have identical layouts - the only difference is in their exposed interface design so I'm using the terms ReadOnlyMemory<T> and Memory<T> interchangeably here.

The Memory<T> type is a struct type that needs to be lightweight, which means it can't have any more fields than absolutely necessary:
1. It needs a pointer (strictly speaking, an object-reference) to the actual memory target, such as the byte-array buffer or whathaveyou.
  - In Memory<T> this is represented by an opaque Object field named _object, as it can represent not just a Byte[], but also a Char[] or String or any other type that Span<T> can work with.
2. It needs an Int32 _length field to store the length of the memory-area/buffer that it's pointing to.
  - .NET has a practical limit of 2GiB per object, so there's no point making this Int64 either (and doing so would make Memory<T> consume 2x more storage).
  - And incidentally, a range of 0-2GiB only requires 31 bits to represent, not 32.
3. It also needs an Int32 index field to store its current read/write "head" into
So that means Memory<T> (if packed in-memory) already consumes 32 + 32 + 32 = 96 bytes on x86, or 64 + 32 + 32 = 128 bytes on x64 - which is pushing at the limit for how big a struct can become.
But that's not all: Memory<T> needs 2 more pices of information:
1. It needs to store information about the type of memory that _object points to so it can dereference it correctly, otherwise it will need to rely on (very expensive) runtime type-tests (e.g. the is operator) before it can safely performing any action, which would be ruinous for performance. (This is because the actual low-level operations for getting data out of a Byte[] is different to a Byte* - or a Char[] vs. a String).
2. It also needs to know if _object is a reference to a pinned-memory object - as that tells it if it needs to Pin() the memory before it can be used safely or if it can skip that step.
So that's two more separate bool fields to store - fortunately boolean values are simple 2-state values that can be represented by a single bit in-memory (whereas the System.Boolean (bool in C#) type uses a whole 8-bits - and four bytes when marshalled to a Win32 BOOL value).
Given that Memory<T> is already using 2x 32-bit integers for _index and _length already and both of those fields will only ever hold a 31-bit value - it means there's 2 unused bits right there for the taking: - so it sneakily appropriates the MSBs of _length and _index to store that "is pinned-memory or not" and "is-.NET-array-or-not" state.

In 2018, some point after System.Memory was first released for .NET Core - and also made available for .NET Framework 4.x users - the design of Memory<T> was changed to no-longer require two bit-fields (as type-information about the _object can be carried "for free" within the type-system). Here's the diff, where we can see the old and new versions side-by-side, with their respective explanatory comments about both designs:

Quoeth the source:

// The highest order bit of _index is used to discern whether _object is an array/string or an owned memory
// if (_index >> 31) == 1, object _object is an MemoryManager<T>
// else, object _object is a T[] or a string.
//     if (_length >> 31) == 1, _object is a pre-pinned array, so Pin() will not allocate a new GCHandle
//     else, Pin() needs to allocate a new GCHandle to pin the object.
// It can only be a string if the Memory<T> was created by
// using unsafe / marshaling code to reinterpret a ReadOnlyMemory<char> wrapped around a string as
// a Memory<T>.

In this diff, they implemented the change to reduce this down to just using _index for the "is-pinned-or-not" state thus leaving _length alone. I speculate that because _length is accessed more than _index it made sense to use _index to store the flag-bit instead of _length.

Unfortunately I'm not able to get the exact source for the .NET Framework port of System.Memory (if it even exists). The assembly metadata for our System.Memory.dll v.4.5.5 says it's [assembly: AssemblyInformationalVersion("4.6.31308.01 @BuiltBy: cloudtest-841353dfc000000 @Branch: release/2.1-MSRC @SrcCode: https://github.com/dotnet/corefx/tree/32b491939fbd125f304031c35038b1e14b4e3958")] - but no such commit exists in the public corefx repo.

Understand strange code found in ReadOnlyMemory<T>

There are 1 best solutions below

Related Questions in C#

Related Questions in .NET

Related Questions in PERFORMANCE

Related Questions in DISASSEMBLY

Trending Questions

Popular # Hahtags

Popular Questions