Just for brains training, I decided to write simple library which accumulates different crc, checksum, hash-function algos. So in Wikipedia I found enough information about implementations most of them...but not for all
for example checksum algorithms, see below
| Name | Length | Type |
|---|---|---|
| BSD | checksum (Unix) | 16 bits sum with circular rotation |
| SYSV | checksum (Unix) | 16 bits sum with circular rotation |
| sum8 | 8 bits | sum |
| sum16 | 16 bits | sum |
| sum24 | 24 bits | sum |
| sum32 | 32 bits | sum |
| fletcher-4 | 4 bits | sum |
| fletcher-8 | 8 bits | sum |
| fletcher-16 | 16 bits | sum |
| fletcher-32 | 32 bits | sum |
| Adler-32 | 32 bits | sum |
| xor8 | 8 bits | sum |
| Luhn algorithm | 1 decimal digit | sum |
| Verhoeff algorithm | 1 decimal digit | sum |
| Damm algorithm | 1 decimal digit | Quasigroup operation |
But I can not find reference implementation of these algorithms.
for example sum8-32 - what is it ?? - simple sum of all bytes or not? Or fletcher-4 and fletcher-8 - where I could find reference(default) implementation for this algos
Maybe there exists papers or books or well tested libraries which provides implementation of those algos... I can't even find(or generate) any test vectors for testing my own implementation
Sum
Short for checksum, which was traditionally implemented using addition. The number (8, 16...) refers to the result's word size. So
sum8means the result is saved to a single byte, which ignores the overflow entirely.Essentially, you are calculating the sum of bytes, then masking the result to remove the expected overflow.
There is no real standard. Variants sometimes invert the bits of the final sum for easier verification (CPUs like to compare against zero). Or some versions add the carry bit (which is a 0 or a 1) into the result as well.
XOR
Same concept as Sum, except the XOR operation is used instead of addition. When reading one byte at a time and xoring with the result, it will always be 8 bits.
CRC
A "proper" error-detection algorithm using polynomial division. It is most effective when used to check for errors in small amounts of data (<20 bytes in network packets or EEPROM chips etc.), typically outcompeting both 'checksums', and (most) equally-sized hash functions.
It is not designed to be a hash function, but because the operations used typically result in significant diffusion of bits, it has been used as such. It is also not as efficient as a typical hash because it requires more operations. However, Modern CPUs have special CRC32 instructions which are magnitudes faster, and so have found use as just one part of a larger algorithm.
There are no official reference implementations. It is defined mathematically. However various resources and libraries are available to let you verify if your implementation is correct.
Though the traditional way, is to do two nested loops: one that XORs the input bytes into the CRC, and one that updates the CRC based on the polynomial.
Fletcher
A specific published implementation of a checksum, which stores two different modulo 2^n-1 sums, which can be configured to return 8, 16 or 32 results. It was designed to allow the checksum to be position-dependent, something that Sum or XOR alone is not.
The two final sums are combined to form a larger final sum. One sum then serves as the "high word", and one serves as the low word.
BSD Checksum
Essentially this is just an unsigned 16-bit checksum, but it uses a rotate left operation beforehand, allowing it to be position-dependent.
Luhn / Verhoeff / Damm
These are check digit algorithms, exclusively designed for use with very small amounts of data, such as barcodes and serial numbers.
They typically are restricted to a specific type of input and size, but as a result, are quite effective. So in effect, their strict description prevents them from being used as a general purpose checksum.
Final thoughts
That list should be taken with a grain of salt, as it represents just a small set of loosely-connected error-detection schemes. Some of them involve such simple operations that giving them specific names as the list does, might mislead readers into thinking it's more than the sum of its parts, so to speak. Some of them such as BSD and SYSV, are not formally defined, but a random example of a custom checksum that one particular project implemented. So the list is incomplete in that respect, as many such other checksum schemes have been devised that result in different outputs.
Some such as CRC and Fletcher may have come from a published paper, but are typically described analytically and mathematically, and may not contain a reference implementation, or even pseudo-code at all.
So generally speaking, reading the paper, using trial-and-error and comparing other third-party implementations, is necessary to verify if an implementation is correct.