Questions regarding "strxfrm()" function in "C"

103 Views Asked by At

First, I'm aware of another threads on this matter, like this one.

Unfortunately to me, the explanations are not very clear, and the results from the tests I provided are confusing me further. Let's start from the very begining with this function.

The function definition is:

size_t strxfrm(argument 1, argument 2, argument 3);

Where:

size_t is the integer type of the value, returned by the function.

argument 1 is value of type "char *", and serves as destination.

argument 2 is value of type "const char *", and serves as a source.

argument 3 is integer of type size_t, and determines how many elements from "argument 2" will be copied into "argument 1", overwriting the values there.

So far - so good.

But by definition, the function returns

"The length of the transformed string, not including the terminating null-character."

By "transformed string" I understant "the destination" i.e. "argument 1.". The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source". For example:

#include <stdio.h>
#include <string.h>

int main()
{
    char arr1[100] = "Hello, World!", arr2[] = "Baxlazazasad";


    int retValue = strxfrm(arr1, arr2, 3);
    printf("Content of arr1:\t%s\nLength of arr1:\t%lu\n\nContent of arr2:\t%s\nLength of arr2:\t%lu\n\n", arr1, strlen(arr1), arr2, strlen(arr2));
    printf("retValue =\t%i\n", retValue);
    return 0;
}

Output:

Content of arr1:        Baxlo, World!
Length of arr1: 13

Content of arr2:        Baxlazazasad
Length of arr2: 12

retValue =      12

My second question regarding the function "strxfrm()", is about it's action. It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1". Why is then the function considered "function for string compare" and not for "string copying"?

3

There are 3 best solutions below

0
0___________ On BEST ANSWER

Why is then the function considered "function for string compare" and not for "string copying"?

Who is considering it this way? It is wrong. It copies and transforms. The result of transformation is in a "form such that the result of strcmp(3) on two strings that have been transformed with strxfrm() is the same as the result of strcoll(3) on the two strings before their transformation."

and the results from the tests I provided are confusing me further.

It is because the example is invalid. As documentation says: " The strxfrm() function returns the number of bytes required to store the transformed string in dest excluding the terminating null byte ('\0'). If the value returned is n or more, the contents of dest are indeterminate."

The correct example:

int main(void)
{
    char arr1[100] = "Hello, World!", arr2[] = "Baxlazazasad";


    size_t retValue = strxfrm(arr1, arr2, 3);
    if(retValue >= 3)
    {
        printf("There result of this operation is indeterminate\n");
    }
    else
    {
        printf("Content of arr1:\t%s\nLength of arr1:\t%zu\n\nContent of arr2:\t%s\nLength of arr2:\t%zu\n\n", arr1, strlen(arr1), arr2, strlen(arr2));
        printf("retValue =\t%zu\n", retValue);
    }
    return 0;
}
0
Eric Postpischil On

The function definition is:

size_t strxfrm(argument 1, argument 2, argument 3);

The function declaration in the C standard is:

size_t strxfrm(char * restrict s1, const char * restrict s2, size_t n);

The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source".

The function returns the length of the proper output string, meaning the string that would be the result if there were enough room in the destination buffer.

Given some input, the ideal output string has some length l. If the argument n is l+1 or more, then strxfrm puts all l characters of the ideal output string in the destination and a terminating null character (which is why space for l+1 characters is needed). If n is l or less, strxfrm is not able to put all of the desired characters in the buffer. In this case, it still returns l so that a caller knows how much space they should allocate, so they can allocate more space and call strxfrm again.

In the latter case, the C standard does not require strxfrm to put anything particular in the destination buffer. It might have started work on the buffer and put something there. It might have left it incomplete with no null terminator. It might have put a null terminator in it. Or it might just have checked the length and not started work.

This is the meaning of C 2018 7.24.4.5 3, which specifies the return value of strxfrm:

The strxfrm function returns the length of the transformed string (not including the terminating null character). If the value returned is n or more, the contents of the array pointed to by s1 are indeterminate.

So the intended use of strxfrm with some source string s2 is:

  • Start with some initial buffer s1 with length n+1. It is allowed for s1 to be a null pointer and n to be zero, or you can use a larger value for n with an actual buffer.
  • Execute size_t r = strxfrm(s1, s2, n);.
  • If the return value r from strxfrm is less than n, you are done. Otherwise:
    • Allocate a new buffer with r+1 bytes and set s1 to point to it.
    • Execute strxfrm(s1, s2, r);.

It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1".

That is not clear, and it is not true. It may be in the cases that you tried, strxfrm copied characters from s2 to s1. However, there are other cases, dependent on the locale, where certain characters in s2 result in not only different characters in s1 but different numbers of characters.

0
John Bollinger On

But by definition, the function returns

"The length of the transformed string, not including the terminating null-character."

Yes, that's what POSIX says, for instance. But you have omitted an important qualifier:

If the value returned is n or more, the contents of the array pointed to by s1 are unspecified.

That is in fact your case: you specified n as 3, but the return value is 12.

By "transformed string" I understant "the destination" i.e. "argument 1.".

In light of the qualification I called out above, no, that's a misunderstanding. The "transformed string" means the transformed form of the source string, which might or might not have been recorded in the destination array.

The design here accommodates the fact that the transformation performed may in some cases produce a result longer than the source, and there is no good way to be sure exactly how long it will be without attempting the transformation itself. You are meant to pass the size of the destination buffer as the third argument, and you can judge from the return value

  1. whether that was enough, and
  2. if not, how much space you actually need.

The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source".

That is a reasonably likely result. There's nothing wrong with that. It just shows that the transformed form is the same length as the source. But you told it that the destination buffer only has capacity for 3 characters, which is not enough, so the contents are unspecified. In particular, strxfrm is not obligated to write a string terminator anywhere in that space.

My second question regarding the function "strxfrm()", is about it's action. It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1".

Well, no. That's what it did in your test, but it is by no means clear (or correct) that it does that in every case. Even if we assume that the destination is large enough to accommodate the transformed string, including its terminator (for otherwise, the effect on the destination is undefined).

Why is then the function considered "function for string compare" and not for "string copying"?

Because under some circumstances, it will do differently. The details are unspecified and locale dependent, but the description is of a normalization function. For example, in a Unicode locale, it might convert the source into one of the standard Unicode normalization forms. In that particular case, normalization would leave many strings unchanged, but not all of them.