I am receiving files via JSON that include any number of PDF files. I have to split the file and remove the back slash characters to convert them into a PDF file. This has worked with the bundled file sizes usually in the tens of thousand Megabytes. Recently I started getting files 10 times that size and the program crashes out of memory. The memory usage at that time is less than 3GB and I have 32GB in the computer.
temp = buffer.IndexOf("FileData");
while (temp > 0)
{
docBuffer = buffer;
int gen = 1;
string curFile = wdirl + outFileZ + "AAA" + gen.ToString("D3") + ".pdf";
FileStream strmFileA = File.Create(curFile);
docBuffer = docBuffer.Substring(temp + 11);
temp = docBuffer.IndexOf('"');
buffer = docBuffer.Substring(temp);
docBuffer = docBuffer.Substring(0, temp);
// string docBufferA = docBuffer.Replace("\\", string.Empty);
StringBuilder docBufferA = new StringBuilder(docBuffer);
docBufferA.Replace("\\", "");
docBuffer = docBufferA.ToString();
bytes = Convert.FromBase64String(docBuffer);
writer = new BinaryWriter(strmFileA);
writer.Write(bytes, 0, bytes.Length);
writer.Close();
temp = buffer.IndexOf("FileData";
}
I tried using StringBuilder while removing the backslashes and it put off the problem for a little while.
Strings in C# are immutable. This means that every time you call methods like Substring, a new string is created in memory. When dealing with large strings this behavior can quickly consume a lot of memory.
To fix it, you try the following approach:
1. Stream the JSON: Instead of loading the entire JSON file into memory at once, use a streaming parser like JsonTextReader from the Newtonsoft.Json library. This way, you can process the JSON content piece-by-piece without loading the entire thing into memory.
2. Directly Write to File Stream: Instead of constructing the decoded byte array in memory, you can directly write the decoded bytes to the file stream.