`fseek` issue when printing last $N$ lines to a file using C++

75 Views Asked by At

I'm using the following code to print last N lines of one file to the other.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N);
int main()
{
    printLastNLines("test.csv", "test2.csv", 200);

}
void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N) {
    FILE* in, * out;
    int count = 0;
    long int pos;
    char s[100];

    fopen_s(&in, inputFileName.c_str(), "rb");
    /* always check return of fopen */
    if (in == NULL) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }
    fopen_s(&out, outputFileName.c_str(), "wb");
    if (out == NULL) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }
    
    
    fseek(in, 0, SEEK_END);
    pos = ftell(in);
    
    while (pos) {
        pos--;
        fseek(in, pos, SEEK_SET); 
        char c = fgetc(in);
        if (c == '\n') {
            if (count++ == N) break;
        }
    }
    //fseek(in, pos, SEEK_SET);
    /* Write line by line, is faster than fputc for each char */
    while (fgets(s, sizeof(s), in) != NULL) {
        fprintf(out, "%s", s);
    }
    fclose(in);
    fclose(out);
}

The contents of the sample file test.csv is given below:

2
3

However, when I run the code, test2.csv contains the following (not that the first line is there but doesn't contain any character:

3

Can anyone guide what's wrong with the code? In general, when I give it even a bigger file, the first character of the first line is always missing.

I assumed it has something to do with the file pointer position. So, I used another fseek(in, pos, SEEK_SET); (currently commented out) and the first line with 2 started printing. However, I'm not sure why does it need this extra fseek. When I debugged the code, the last line executed is in fact fseek(in, 0, SEEK_SET);. Why do we need an extra fseek(in, 0, SEEK_SET); to make it work?

1

There are 1 best solutions below

0
Gurnet On

The solution is basically already visible in your source code. But you commented it out: //fseek(in, pos, SEEK_SET);

The root cause is that your use getc to read a character and compare it against a newline character \n. And if you found a \n which read before, then the file pointer is one after the \n. Then you terminate the loop and the file pointer is at the wrong position.

So, you could uncomment your //fseek(in, pos, SEEK_SET); statement, but this will also be not reliable. Depending on your operating system, a new line may be marked with \r\n, so a carriage return and a line feed. This is most probably true for your system. And then the seek operation will also not do what you expect.

So, there is no easy portable solution. There is a quick fix, but it is not recommended. You may try to set the file pointer based on the know how of the line end marked.

But, it would be better, if you would use C++ for your solution.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std::string_literals;

void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N) {
    // Open files and check, if they could be opened
    if (std::ifstream ifs(inputFileName); ifs) 
        if (std::ofstream ofs(outputFileName); ofs) {

            // We will read all lines into a vector
            std::vector<std::string> lines{};
            for (std::string line{}; std::getline(ifs, line); lines.push_back(line));

            // If N is greater then number of lines that we read, then limit the value
            size_t numberOfLines = N < 0 ? 0 : N;
            if (numberOfLines > lines.size()) numberOfLines = lines.size();

            // And now we write the lines to the output file
            for (size_t i = lines.size() - numberOfLines; i< lines.size(); ++i)
                ofs << lines[i] << '\n';
        }
        else std::cout << "\nError: Coud not open input file '" << inputFileName << "'\n";
    else std::cout << "\nError: Coud not open output file '" << outputFileName << "'\n";
}
int main() {
    printLastNLines("r:\\test.csv"s, "r:\\test2.csv"s, 2);
}