Memory Fragmentation with byte[] in C#

870 Views Asked by At

The C#/.NET application I am working on makes use of huge byte arrays and is having memory fragmentation issues. Checked memory usage using CLRMemory

please ref the image for LOH and Free Space

The Code we use is as follows

PdfLoadedDocument loadedDocument = new PdfLoadedDocument("myLoadedDocument.pdf");

// Operations on pdf document

using (var stream = new MemoryStream())
{
    loadedDocument.Save(stream);
    loadedDocument.Close(true);
    return stream.ToArray(); //byte[]
}

And we use similar code at multiple places across our application and we call this in loop for generating bulk audits ranging from a few 100's to 10000's

  1. Now is there a better way to handle this to avoild fragmentation

And as part of audits, we also download large files from Amazon S3 using the following code

using (var client = new AmazonS3Client(_accessKey, _secretKey, _region))
{
   var getObjectRequest = new GetObjectRequest();
   getObjectRequest.BucketName = "bucketName";
   getObjectRequest.Key = "keyName";

   using (var downloadStream = new MemoryStream())
   {
       using (var response = await client.GetObjectAsync(getObjectRequest))
       {
           using (var responseStream = response.ResponseStream)
           {
               await responseStream.CopyToAsync(downloadStream);
           }
           return downloadStream.ToArray(); //byte[]
       }
   }
}
  1. Is there a better alternative to download large files without them moving to LOH which is taking a toll with Garbage Collector
1

There are 1 best solutions below

1
Marc Gravell On

There's two different things here:

  1. the internals of MemoryStream
  2. the usage of .ToArray()

For what happens inside MemoryStream: it is implemented as a simple byte[], but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream instead via the Microsoft.IO.RecyclableMemoryStream nuget package, which re-uses buffers between independent usages.

For ToArray(), frankly: don't do that. When using vanilla MemoryStream, the better approach is TryGetBuffer(...), which gives you the oversized backing buffer, along with the start/end tokens:

if (!memStream.TryGetBuffer(out var segment))
    throw new InvalidOperationException("Unable to obtain data segment; oops?");
// see segment.Offset, .Count, and .Array

It is then your job to not look outside those bounds. If you want to make that easier: consider treating the segment as a span (or memory) instead:

ReadOnlySpan<byte> muchSafer = segment;
// now you can't read out of bounds, and you don't need to apply the offset yourself

This TryGetBuffer(...) approach, however, does not work well with RecyclableMemoryStream - as it makes a defensive copy to prevent problems with independent data; in that scenario, you should treat the stream simply as a stream, i.e. Stream - just write to it, rewind it (Position = 0), and have the consumer read from it, then dispose it when they are done.


As a side note: when reading (or writing) using the Stream API: consider using the array-pool for your scratch buffers; so instead of:

var buffer = new byte[1024];
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{...}

instead try:

var buffer = ArrayPool<byte>.Shared.Rent(1024);
try
{
    int bytesRead;
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
    {...}
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}

In more advanced scenarios, it may be wise to use the pipelines API rather than the stream API; the point here is that pipelines allows discontiguous buffers, so you never need ridiculously large buffers even when dealing with complex scenarios. This is a niche API, however, and has very limited support in public APIs.