Differences in Casting: char vs. unsigned char

84 Views Asked by At

In this function, why do we need to cast with unsigned char? Can't we cast with char and get the same result since both have a range of "255"? Why choose unsigned char?

Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value. If you tell me we choose it because we are working with bytes, and the maximum value of it is 255, I would say we are just comparing. So, in s1 and s2, the result will always be an ASCII code. Why do we choose unsigned char?

#include "libft.h"

int ft_strncmp(const char *s1, const char *s2, size_t n)
{
    size_t  i;

    i = 0;
    if (n == 0)
        return (0);
    while (i < n && (s1[i] != '\0' || s2[i] != '\0'))
    {
        if (s1[i] != s2[i])
            return ((unsigned char)s1[i] - (unsigned char)s2[i]);
        i++;
    }
    return (0);
}
3

There are 3 best solutions below

0
chux - Reinstate Monica On

The standard C library performs string functions as if the the characters were unsigned char.

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

As a char may be signed or unsigned, subtracting 2 char has a different result than 2 unsigned char when one of the char is negative. So casting to unsigned char forms a difference like the C library.


Pedantic

  • On rare implementations the width of char and int are the same, so subtracting to return the difference with the correct sign risks overflow. Instead do multiple compares.

  • With strings, and the nearly obsolete non-2's complement formats, ((unsigned char *)s1)[i] can differ from (unsigned char)s1[i] and is the preferred form.

Below fixes both issues:

int ft_strncmp(const char *s1, const char *s2, size_t n) {
  const unsigned char *u1 = (const unsigned char *)s1;
  const unsigned char *u2 = (const unsigned char *)s2;
  size_t  i = 0;
  // if (n == 0)      // Not needed
  //    return (0);
  while (i < n && (u1[i] != '\0' || u2[i] != '\0')) {
    if (u1[i] != u2[i]) {
      return (u1[i] > u2[i]) - (u1[i] < u2[i]);
    } 
    i++;
  }
  return 0;
}

or

int ft_strncmp_alt(const char *s1, const char *s2, size_t n) {
  const unsigned char *u1 = (const unsigned char *)s1;
  const unsigned char *u2 = (const unsigned char *)s2;
  size_t  i = 0;
  while (i < n && (u1[i] == u2[i]) && u1[i]) {
    i++;
  }
  if (i == n) {
    return 0;
  } 
  return (u1[i] > u2[i]) - (u1[i] < u2[i]);
}
0
Liza Jindgar On

The use of unsigned char in this context is related to how character comparison works, especially when dealing with characters outside the ASCII range.

In the C standard, the behavior of functions like memcmp and strcmp is defined in terms of unsigned character values. When you cast characters to unsigned char before comparison, you ensure that the comparison is done in an unsigned context. This is important when dealing with characters that have negative values in the signed char range.

In your specific example, casting to unsigned char is used to handle characters that might have negative values when treated as signed chars. This is relevant because the standard allows characters to have negative values in a signed char representation, and comparing them directly as signed chars may not produce the correct result.

Consider the case where char is signed and has a range of -128 to 127. If a character has a value greater than 127 (e.g., 255), it would be treated as a negative value when stored in a signed char. Casting it to unsigned char ensures that it is treated as a positive value during the comparison.

By casting both s1[i] and s2[i] to unsigned char in your ft_strncmp function, you are explicitly stating that the comparison should be done in an unsigned context, avoiding issues related to signed char representations.

So, while you are correct that the ASCII range is 0 to 127, and values above 127 might be treated as garbage in some contexts, the use of unsigned char in character comparison functions is a good practice for handling all possible character values correctly, especially in cases where the underlying representation might be signed

0
John Bollinger On

In this function, why do we need to cast with unsigned char?

Because the function is duplicating the behavior of the standard library function strncmp(), which compares the bytes of the arguments as if they have type unsigned char.

Can't we cast with char and get the same result since both have a range of "255"?

Not reliably, no. The C language specification explicitly allows char to have the same range and behavior as either unsigned char or signed char, and the latter is pretty common. Where the signed char equivalence applies (and supposing 8-bit bytes, which is not guaranteed prior to C23), the range of char is -128 to 127.

You could still do the comparison with type char, but that would produce different results on some systems than on others.

(Also: the elements are already chars. No casts would be needed to do the comparison in that type.)

Why choose unsigned char?

Because that produces the desired order, whereas char might not. And because using unsigned char yields a consistent order across implementations, even if you wanted to implement a different order.

Suppose there is no ASCII code equal to -126. I can say the same about 255; both will give you a garbage value.

ASCII has very little to do with it. C does not assume that char values are specifically ASCII codes. The runtime character set can be different from and incompatible with ASCII -- EBCDIC, say -- and there are machines in use today where that is the case. There is no assumption of or reliance on any particular character set here.