I have several txt files that each file contains more than 3 million lines. Each line contains customer's connections and there are Customer ID, IP address....
I need to find specific IP address and get Customer ID related to it.
I read the file and Split it in an array and search in each line by foreach, but because there are many lines, below error occur.
Exception of type 'System.OutOfMemoryException' was thrown.
I should decompress txt files, because they are compressed. I use below code:
string decompressTxt = decompressTxt = this.Decompress(new FileInfo(filePath));
char[] delRow = { '\n' };
string[] rows = decompressTxt.Split(delRow);
for (int i = 0; i < rows.Length; i++){
if(rows[i].Contains(ip)){
}
}
string Decompress(FileInfo fileToDecompress)
{
string newFileName = "";
string newFIleText = "";
using (FileStream originalFileStream =fileToDecompress.OpenRead())
{
string currentFileName = fileToDecompress.FullName;
newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);
using (FileStream decompressedFileStream = File.Create(newFileName))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
newFIleText = File.ReadAllText(newFileName);
File.Delete(newFileName);
}
return newFIleText;
}
Okay, so there's a lot of things you're doing that aren't necessary, even before we get to how you're running out of memory.
First off, you don't need an intermediate file for decompression, just read off
GZipStreamdirectly. But wait, did you think that you had to useFile.ReadAllTextto read text, and thus that's why you uncompress the file first?That's unecessary. When you want to read text from a stream, you can just use a
StreamReaderto do it (this is whatFile.ReadAllTextuses underneath).The reader can also be used to read line by line without having to fit the entire file in memory, just each individual line, one at a time. Just call
ReadLine()until it returnsnull.Putting it all together, here's code that decompresses the data and reads it one line at a time, without having to split anything. Not only does it scale with very large files, it's also much faster.