Running out of memory modifying a long string

88 Views Asked by At

I am receiving files via JSON that include any number of PDF files. I have to split the file and remove the back slash characters to convert them into a PDF file. This has worked with the bundled file sizes usually in the tens of thousand Megabytes. Recently I started getting files 10 times that size and the program crashes out of memory. The memory usage at that time is less than 3GB and I have 32GB in the computer.

temp = buffer.IndexOf("FileData");

while (temp > 0)
{
   docBuffer = buffer;
   int gen = 1;
   string curFile = wdirl + outFileZ + "AAA" + gen.ToString("D3") + ".pdf";

   FileStream strmFileA = File.Create(curFile);
   docBuffer = docBuffer.Substring(temp + 11);
   temp = docBuffer.IndexOf('"');
   buffer = docBuffer.Substring(temp);
   docBuffer = docBuffer.Substring(0, temp);
   // string docBufferA = docBuffer.Replace("\\", string.Empty);
   StringBuilder docBufferA = new StringBuilder(docBuffer);
   docBufferA.Replace("\\", "");
   docBuffer = docBufferA.ToString();

   bytes = Convert.FromBase64String(docBuffer);
   writer = new BinaryWriter(strmFileA);
   writer.Write(bytes, 0, bytes.Length);
   writer.Close();
   temp = buffer.IndexOf("FileData";
}

I tried using StringBuilder while removing the backslashes and it put off the problem for a little while.

1

There are 1 best solutions below

1
Akhilesh Pandey On

Strings in C# are immutable. This means that every time you call methods like Substring, a new string is created in memory. When dealing with large strings this behavior can quickly consume a lot of memory.

To fix it, you try the following approach:

1. Stream the JSON: Instead of loading the entire JSON file into memory at once, use a streaming parser like JsonTextReader from the Newtonsoft.Json library. This way, you can process the JSON content piece-by-piece without loading the entire thing into memory.

2. Directly Write to File Stream: Instead of constructing the decoded byte array in memory, you can directly write the decoded bytes to the file stream.

using Newtonsoft.Json;
using System.IO;

// Open the JSON file for reading.
using (StreamReader file = File.OpenText(jsonFilePath))
using (JsonTextReader reader = new JsonTextReader(file))
{
    while (reader.Read())
    {
        if (reader.Value != null && reader.TokenType == JsonToken.PropertyName && (string)reader.Value == "FileData")
        {
            // Move to the next token, which should be the file data.
            reader.Read();
            string fileData = (string)reader.Value;

            // Open a FileStream for writing the decoded bytes.
            using (FileStream fs = new FileStream(outputFilePath, FileMode.Create))
            {
                // Create a buffer to hold chunks of the Base64 string.
                int bufferSize = 4 * 1024;  // 4KB buffer. Adjust this value based on your needs.
                int position = 0;

                while (position < fileData.Length)
                {
                    int length = Math.Min(bufferSize, fileData.Length - position);
                    string chunk = fileData.Substring(position, length);

                    // Remove backslashes from the current chunk.
                    chunk = chunk.Replace("\\", "");

                    // Decode the chunk and write to the FileStream.
                    byte[] bytes = Convert.FromBase64String(chunk);
                    fs.Write(bytes, 0, bytes.Length);

                    position += length;
                }
            }
        }
    }
}