32-bit integer representation in big and little endian

339 Views Asked by At

I have a textbook that says:

It is important to understand that in both the big endian and little endian systems, a 32-bit integer with the numerical value of, say, 6, is represented by the bits 110 in the rightmost (low-order) 3 bits of a word and zeros in the leftmost 29 bits.

Is this accurate?

2

There are 2 best solutions below

2
unalignedmemoryaccess On

It is true, if you have a 32-bit data type. Bits 000000000000000110 will always be 6, regardless of little or big endian.

This is not true for memory representation (byte level) between little and big endian. In little endian, LSB byte will be first, MSB last.

Imagine a 32-bit number, in uint32_t variable, it holds value such as b1<<24 | b2<<16 | b3<<8 | b4 where bx is a byte.

  • In big endian system, memory will look like: b1,b2,b3,b4
  • In little endian system, memory will look like: b4,b3,b2,b1

An example using C syntax1:

uint8_t bytes[4] = {0x01, 0x02, 0x03, 0x04};
uint32_t value = *(uint32_t *)bytes;

value can either be 0x01020304 (big) or 0x04030201 (little), depending on the endianness. (There are systems like PDP-11 where the byte order isn't either of those. PDP Endian and bit shifts is a PDP version of this question on the difference between bit layout in an integer vs. byte access to memory.)


Footnote 1: That actually has strict-aliasing undefined behaviour; only safe with memcpy or compiling with -fno-strict-aliasing, or with compilers like MSVC that define the behaviour even without any options. It also potentially has alignment UB if the compiler happens to align the array by less than alignof(uint32_t). memcpy also makes that safe.

Going the other way, pointing an unsigned char* at &value is well-defined because char* and unsigned char* are special types that can alias anything.

Of course, the CPU itself doesn't care about C rules; if you're writing in assembly, access to memory or not is explicit and any optimizations are up to the programmer.

0
Erik Eidt On

A number is a number, no matter the number base, the value of any given number doesn't change, only the string of text we use to describe that number in a different base.

Let's use a larger number to make things more interesting.

  • 6010 is 60 in base 10
  • 3C16 is hex for 60 in base 10
  • 001111002 is binary for 60 in base 10
  • 748 is octal for 60 in base 10

The statement you quoted goes to this fact, that numbers are the same no matter their their external visualization (e.g. as a string in a number base), but also no matter their endian-ness, if you can determine one, though that concept is irrelevant to this particular examination.


0110 in base 2 as the number 6 (in base 10) is neither big nor little endian, it is just how humans write numbers in text and it is called Positional Notation.

The form is digits in base, in positions, where each digit represents the base to the next power, i.e. 62 in base 10 = 6×10¹ + 2×10⁰.  The 6 is in what we can the ten's position (10¹) and the 2 is in the one's position (10⁰).


Endian-ness is a property of using multiple units to store a larger item, where the order of the items stored is at issue.  Therefore, we have to have addresses for each of the sub-units for endian-ness to be a factor. 

I might argue first and foremost that numbers written out are just strings in some number base using our traditional positional notation.

But if you were to take the power position as an address — 0 for the one's position (10⁰), and 1 for the ten's position (10¹) — then this would be little endian, as the lowest address contains the least significant digit.

Of course, while this is a little endian format, we do write it backwards, hence Positional Notation is reverse little endian.

In order to describe 0110 as big endian, we would have to number the positions from left to right, and, that is fine if you look at it as a simple string, but it disagrees with the addressing by positional notation, which is what gives numbers their values.