The C Standard specifies that ftell() returns the position of a character from the beginning of the file when it's opened in binary mode.
... obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
If the text file has a wide character, like ñ, then the position of any char after ñ would be greater than the corresponding column in the text file. Just to be specific, what I mean for position here is that the corresponding column if one read the text file as a linear sequence of symbols.
For example, the string " ñ ñññ a ñ a" has 12 char, but printing ftell() inside this loop:
void printPosition(FILE *file){
int c;
long i;
while((c=fgetc(file)) != EOF){
i = ftell(file);
printf("%c %i\n", c, i);
}
}
gives the output:
1
├ 2
▒ 3
4
├ 5
▒ 6
├ 7
▒ 8
├ 9
▒ 10
11
a 12
13
├ 14
▒ 15
16
a 17
I tried opening in text/binary read mode and got the same result for both.
IF your platform supports
UTF-8compatible locale, you can use wide characters to read the file wide char by wide char.Executing the program gives the output on godbolt https://godbolt.org/z/cdbrGKPss :
I am not sure if "linear sequence of symbols" makes sense in unicode. The required reading on unicode is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) . You might be interested in libunistring and ICU libraries for unicode handling in C.