fgetc vs getline or fgets - which is most flexible

879 Views Asked by At

I am reading data from a regular file and I was wondering which would allow for the most flexibility.

I have found that both fgets and getline both read in a line (one with a maximum number of characters, the other with dynamic memory allocation). In the case of fgets, if the length of the line is bigger than the given size, the rest of the line would not be read but remain buffered in the stream. With getline, I am worried that it may attempt to assign a large block of memory for an obscenely long line.

The obvious solution for me seems to be turning to fgetc, but this comes with the problem that there will be many calls to the function, thereby resulting in the read process being slow.

Is this compromise in either case between flexibility and efficiency unavoidable, or can it worked through?

2

There are 2 best solutions below

3
chux - Reinstate Monica On BEST ANSWER

Much is case dependent.

getline() is not part of the standard C library. Its functionality may differ - depends on the implementation and what other standards it follows - thus an advantage for the standard fgetc()/fgets().

... case between flexibility and efficiency unavoidable, ...

OP is missing the higher priorities.

  • Functionality - If code cannot function right with the selected function, why use it? Example: fgets() and reading null characters create issues.

  • Clarity - without clarity, feel the wrath of the poor soul who later has to maintain the code.


would allow for the most flexibility. (?)

  • fgetc() allows for the most flexibility at the low level - yet helper functions using it to read lines tend to fail corner cases.

  • fgets() allows for the most flexibility at mid level - still have to deal with long lines and those with embedded null characters, but at least the low level of slogging in the weeds is avoided.

  • getline() useful when high portability not needed and risks of allowing the user to overwhelm resources is not a concern.


For robust handing of user/file input to read a line, create a wrapping function (e.g. int my_read_line(size_t buf, char *buf, FILE *f)) and call that and only that in user code. Then when issues arise, they can be handled locally, regardless of the low level input function selected.

0
Luis Colorado On

The three functions you mention do different things:

  • fgetc() reads a single character from a FILE * descriptor, it buffers input and so, you can process the file in a buffered way without having the overhelm of making a system call for each character. when your problem can be handled in a character oriented way, it is the best.
  • fgets() reads a single line from a FILE * descriptor, it's like calling fgetc() to fill the character array you pass to it in order to read line by line. It has the drawback of making a partial read in case your input line is longer than the buffer size you specify. This function buffers also input data, so it is very efficient. If you know that your lines will be bounded, this is the best to read your data line by line. Sometimes you want to be able to process data in an unbounded line size way, and you must redesign your problem to use the available memory. Then the one below is probably better election.
  • getline() this function is relatively new, and is not ANSI-C, so it is possible you port your program to some architecture that lacks it. It's the most flexible, at the price of being the less efficient. It requires a reference to a pointer that is realloc()ated to fill more and more data. It doesn't bind the line length at the cost of being possible to fill all the memory available on a system. Both, the buffer pointer and the size of the buffer are passed by reference to allow them to be updated, so you know where the new string is located and the new size. It mus be free()d after use.

The reason of having three and not only one function is that you have different needs for different cases and selecting the mos efficient one is normally the best selection.

If you plan to use only one, probably you'll end in a situation where using the function you selected as the most flexible will not be the best election and you will probably fail.