I'm trying to implement a simplified Boyer-Moore string search algorithm that reads its input text from a file. The algorithm requires that I start at a given file position and read its characters backwards, periodically jumping forward a precomputed number of characters. The jumps are computed based on the pattern's length and indices, so I was storing them as type size_t. What function should I use to read file characters at specific positions, and what type should I use to store these positions? I'm new to C, but these are the options I've considered:
Fseek
I could use fseek and getc to jump around the file, but this uses a long int as its character index. I don't know if it's safe to cast between this and a size_t, and regardless, the GNU C manual recommends against fseeking text streams for portability reasons.
Fsetpos
This is supposed to be more portable, but I don't think I can use this to jump forward or backward an arbitrary number of characters.
Binary Stream
I could get around the fseek compatibility issue by opening the file as a binary stream. But I don't know if this could cause other compatibility issues when dealing with pattern/input text, and anyways, this doesn't solve the issue of casting between long int and size_t.
File Descriptor
I could use file descriptors instead of streams. But then I need to cast between size_t and off_t, and I don't know how safe that is. I would also give up FILE's buffering, which I'm not sure is advisable. I could try to roll my own buffering, or maybe use an alternate library, but this seems like a massive pain.
My first implementation passed the input text as a command line argument, so it didn't deal with file IO at all. But I don't think this would scale well for large text inputs, and the more I've read about file IO the more stuck I feel. What do you suggest?
size_t⇔longconversionsIf
longis large enough for a file offset, and if yoursize_tvalue represents a file offset, then there's no problem with converting between these two. (And no need for an explicit cast.)Portability
So is
longactually large enough for a file offset?longis well known to be its minimum size on Windows, 32 bits. Even in 64-bit programs. So there could be portability issues if you plan on handling files with a size of 2 GiB or greater while using thefseekinterface. You should have no problems with smaller files.Jumping forward or backward an arbitrary number of characters
The CRLF line endings used in Windows will bite you here, no matter what interface you use.
It's very likely you can work around this problem. It depends on your definition of "character", and it might depend on how precise the jump needs to be. You haven't provided enough information for us to help you.