checksums sum8, sum16, sum24, sum32, fletcher4 fletcher8 and so on original implementation

975 Views Asked by At

Just for brains training, I decided to write simple library which accumulates different crc, checksum, hash-function algos. So in Wikipedia I found enough information about implementations most of them...but not for all

for example checksum algorithms, see below

Name Length Type
BSD checksum (Unix) 16 bits sum with circular rotation
SYSV checksum (Unix) 16 bits sum with circular rotation
sum8 8 bits sum
sum16 16 bits sum
sum24 24 bits sum
sum32 32 bits sum
fletcher-4 4 bits sum
fletcher-8 8 bits sum
fletcher-16 16 bits sum
fletcher-32 32 bits sum
Adler-32 32 bits sum
xor8 8 bits sum
Luhn algorithm 1 decimal digit sum
Verhoeff algorithm 1 decimal digit sum
Damm algorithm 1 decimal digit Quasigroup operation

But I can not find reference implementation of these algorithms.

for example sum8-32 - what is it ?? - simple sum of all bytes or not? Or fletcher-4 and fletcher-8 - where I could find reference(default) implementation for this algos

Maybe there exists papers or books or well tested libraries which provides implementation of those algos... I can't even find(or generate) any test vectors for testing my own implementation

1

There are 1 best solutions below

0
bryc On

Sum

Short for checksum, which was traditionally implemented using addition. The number (8, 16...) refers to the result's word size. So sum8 means the result is saved to a single byte, which ignores the overflow entirely.

Essentially, you are calculating the sum of bytes, then masking the result to remove the expected overflow.

There is no real standard. Variants sometimes invert the bits of the final sum for easier verification (CPUs like to compare against zero). Or some versions add the carry bit (which is a 0 or a 1) into the result as well.

XOR

Same concept as Sum, except the XOR operation is used instead of addition. When reading one byte at a time and xoring with the result, it will always be 8 bits.

CRC

A "proper" error-detection algorithm using polynomial division. It is most effective when used to check for errors in small amounts of data (<20 bytes in network packets or EEPROM chips etc.), typically outcompeting both 'checksums', and (most) equally-sized hash functions.

It is not designed to be a hash function, but because the operations used typically result in significant diffusion of bits, it has been used as such. It is also not as efficient as a typical hash because it requires more operations. However, Modern CPUs have special CRC32 instructions which are magnitudes faster, and so have found use as just one part of a larger algorithm.

There are no official reference implementations. It is defined mathematically. However various resources and libraries are available to let you verify if your implementation is correct.

Though the traditional way, is to do two nested loops: one that XORs the input bytes into the CRC, and one that updates the CRC based on the polynomial.

Fletcher

A specific published implementation of a checksum, which stores two different modulo 2^n-1 sums, which can be configured to return 8, 16 or 32 results. It was designed to allow the checksum to be position-dependent, something that Sum or XOR alone is not.

The two final sums are combined to form a larger final sum. One sum then serves as the "high word", and one serves as the low word.

BSD Checksum

Essentially this is just an unsigned 16-bit checksum, but it uses a rotate left operation beforehand, allowing it to be position-dependent.

Luhn / Verhoeff / Damm

These are check digit algorithms, exclusively designed for use with very small amounts of data, such as barcodes and serial numbers.

They typically are restricted to a specific type of input and size, but as a result, are quite effective. So in effect, their strict description prevents them from being used as a general purpose checksum.

Final thoughts

That list should be taken with a grain of salt, as it represents just a small set of loosely-connected error-detection schemes. Some of them involve such simple operations that giving them specific names as the list does, might mislead readers into thinking it's more than the sum of its parts, so to speak. Some of them such as BSD and SYSV, are not formally defined, but a random example of a custom checksum that one particular project implemented. So the list is incomplete in that respect, as many such other checksum schemes have been devised that result in different outputs.

Some such as CRC and Fletcher may have come from a published paper, but are typically described analytically and mathematically, and may not contain a reference implementation, or even pseudo-code at all.

So generally speaking, reading the paper, using trial-and-error and comparing other third-party implementations, is necessary to verify if an implementation is correct.