I'm facing a problem to read a file with txt format. The file contains a huge amount of data (88604154 lines, 2695.7893953323364 MB) and I have to analyze the data then plot a histogram of them.
The problem is that it takes ages for the computer to read that much data, so I thought I could read the data partly and add the parts together. I did a little search and came up with this code:
import resource
file_name = '/home/lam/Downloads/C3--Trace--00001.txt'
lines_num = []
for i in range(1,50001):
lines_num.append(i)
with open (r"/home/lam/Downloads/C3--Trace--00001.txt", 'r') as fp:
lines = []
for i, line in enumerate(fp):
if i in lines_num:
lines.append(line.strip())
elif i > 50001:
break
txt_file.close()
With this I can have the lines in the certain amount (for example from line one to 50000), but I want to repeat the code for like 1775 times in order to read all the data and then append them all in one list. How can I write a function for this?
You need to read in chunks until there are no more chunks available:
Here I'm reading the chunk size and then writing that data into another file.
The read function moves the pointer automatically so you don't need to provide indexing.
You could also use the code you shared but remove the break exception:
Example of how to calculate the mean: