Assignment: create my own memcpy. Why cast the destination and source pointers to unsigned char* instead of char*?

403 Views Asked by At

I'm trying to create my own versions of C functions and when I got to memcpy and memset I assumed that I should cast the destination and sources pointers to char *. However, I've seen many examples where the pointers were casted to unsigned char * instead. Why is that?

void *mem_cpy(void *dest, const void *src, size_t n) {

    if (dest == NULL || src == NULL)
        return NULL;
    int i = 0;
    char *dest_arr = (char *)dest;
    char *src_arr = (char *)src;
    while (i < n) {
        dest_arr[i] = src_arr[i];
        i++;
    }
    return dest;
}
3

There are 3 best solutions below

0
ShadowRanger On

It doesn't matter for this case, but a lot of folks working with raw bytes will prefer to explicitly specify unsigned char (or with stdint.h types, uint8_t) to avoid weirdness if they have to do math with the bytes. char has implementation-defined signedness, and that means, when the integer promotions & usual arithmetic conversions are applied, a char with the high bit set is treated as a negative number if signed, and a positive number if unsigned.

While neither behavior is necessarily wrong for a given problem, the fact that the behavior can change between compilers or even with different flags set on the same compiler, means you often need to be explicit about signedness, using either signed char or unsigned char as appropriate, and 99% of the time, the behaviors of unsigned char are what you want, so people tend to default to it even when it's not strictly required.

9
Lundin On

There's no particular reason in this specific case, it's mostly stylistic.

But in general it is always best to stick to unsigned arithmetic when dealing with raw data. That is: unsigned char or uint8_t.

The char type is problematic because it has implementation-defined signedness and is therefore avoided in such code. Is char signed or unsigned by default?


NOTE: this is dangerous and poor style:

char *src_arr = (char *)src;

(And the cast hid the problem underneath the carpet)

Since you correctly used "const correctness" for src, the correct type is: const char *src_arr; I'd change to code to:

unsigned char *dest_arr = dest;
const unsigned char *src_arr = src;

A good rule of thumb for beginners is to never use a cast. I'm serious. Some 90% of all casts we see on SO in beginner-level programs are wrong, in one way or the other.


Btw (advanced topic) there's a reason why memcpy has the prototype as:

void *memcpy(void * restrict s1,
      const void * restrict s2,
      size_t n);

The restrict qualifier on the pointers tell the user of the function "hey I'm counting on you to not pass on two pointers to the same object or pointers that may overlap". Doing so would cause problems in various situations and for various targets, so this is a good idea.

It's much more likely that the user passes on overlapping pointers than null pointers, so if you are to have slow, superfluous error checking against NULL, you should also restrict qualify the pointers.

If the user passes on null pointers I'd just let the function crash, instead of slowing it down with extra branches that are pointless bloat in some 99% of all use cases.

7
chux - Reinstate Monica On

Why ... unsigned char* instead of char*?

Short answer: Because the functionality differs in select operations when char is signed and the C spec specifies unsigned char like functionality for str...() and mem...().


When does it make a difference?

When a function (like memcmp(), strcmp(), etc.) compares for order, one byte is negative and the other is positive, the order of the two bytes differ. Example: -1 < 1, yet when viewed as an unsigned char: 255 > 1.

When does it not make a difference?

When copying data and comparing for equality*1.


Non-2's compliment

*1 One's compliment and sign-magnitude encoding are expected to be dropped in the upcoming version C2x. Until then, those signed encodings support 2 zeroes. For str...() and mem...() functions, C specifies data access as unsigned char. This means only the +0 is a null character and order depends on pure binary, unsigned, encoding.