Process and delete lines from file, stop if CancellationToken is send

290 Views Asked by At

I have to process the lines of a file that contain path and status information and start an upload operation for the line/path. In some cases the upload might fail, so that I have to keep the line, in other cases the line can be deleted when the upload was successful. The main problem is, that everything is running in the background and that the user can close the software whenever he wants. In that case I set a Cancellation-token and the file operation finishes. Until now the file was always very small. So I copied all the lines that were still needed into a new file and replaced the old one. Simplified code:

bool valid=false;
bool cancelled=false;
using (StreamWriter sw = new StreamWriter(filename_tmp, false))
{
    try
    {
        using (StreamReader sr = new StreamReader(filename)) //alternative:
            // foreach (string line in File.ReadLines(filename))
        {
            // process each line of file
            while (sr.Peek() >= 0)
            {
                string line = sr.ReadLine();
                rowCnt++;
                //separate line into content:
                string[] content = line.Split(delimiter);
                if (canceltoken.IsCancellationRequested && (rowCnt > 3))
                {
                    cancelled = true;
                }
                else
                {
                    data_path = content[0];
                    //start upload:
                    valid = UploadData(data_path);
                }
                if( cancelled || valid==false)
                {
                     sw.WriteLine("{0},{1},{2},{3}", data_path, uploadCnt,
                         DateTime.Now.ToString(), errorMsg);
                }
            }
        }
    }       
}
File.Replace(filename_tmp, filename, filename_backup);  

Now we have a situation where the file can get very big and I'm afraid, that copying everything into a new file will take too long. The user currently gets a message, that there are still processes running and that the software will close afterwards. After 1-5 sec the software was closed so far. Now it will take longer and I don't want the user to use task manager to kill the process. What is the best way to process lines and delete it afterwards? I have full control over the file, because I write it. So I can define the format and writer myself (e.g. StreamWriter vs. BinaryWriter).

I thought of 2 possible options:

  1. Process the whole file. Set a status flag for every row (like 1=remove me, 2=has to be processed/kept). After the file was processed iterate again over the lines and copy the needed ones. In case of an cancellation, keep the old file.

I would love to do something like:

var linesToKeep = File.ReadLines(fileName).Where(l => l.Contains("remove me") ==false);
File.WriteAllLines(tempFile, linesToKeep);

But that would require that I write to the same line to change the status. I'm not sure if thats working. I could use a BinaryWriter to overwrite the "flag", but then I couldn't use the above line and would need to iterate over every line again.

  1. Process file from the end using seek. If I would use a BinaryWriter I would exactly know the length of the lines, so that would not be a problem. Write lines with error that need to be processed again in an extra file. "Cut" the original file at the last processed line with FileStream.SetLength. This would result in 2 files (original one with non processed lines, second file with lines which need to be processed again). But I don't know yet, how to handle the extra file. I could process this one first next time I start, but then I could get more and more files which seems wrong.

I'm somehow stuck here and I have no idea how to further proceed. Any hints would be appreciated very much.

1

There are 1 best solutions below

2
Michael On

Instead of writing (or marking) all unprocessed lines after cancellation I would invert the problem: Define all jobs (one per line), write it in a "queue" and delete it after the upload finishes.

To persist your queue, you can use LiteDb. It's a small and handy NoSQL file database, so you don't have the overhead of an OR Mapper.

The logic may look like this:

  1. Define an UploadJob class with your content and some additional properties (URL, ...)
  2. Write the list of jobs into a LiteDB collection.
  3. Iterate over each job.
  4. After a job has been successfully completed delete this item in LiteDB.