I'm trying to implement Huffman compression and decompression of .txt file. Compression works fine: in brief, I build binary tree where symbols are leafs, encode each symbol with List<int> of 0's and 1's, form a long string of that 0's and 1's for the whole text(by Join method) and write each 8-bit word(contains 8 binary digits, which were just of Char) to the file. Wow, file decreased in size!

Now on the next stage - DECOMPRESSION - I need to read byte by byte from file using BinaryReader, for example, and convert it to List<int> or String again, but not to convert to int or char, which occupy 4 or 1 bytes respectively.

P.S That's my assumption. Maybe are there better ideas? I would like to know how to deal with this issue.

1

There are 1 best solutions below

9
FKarlsson On

Well, you've already created a binary tree where the leafs corresponds to all of the different characters, right? That's all information you need have to decompress/decode the binary file.

There's a good explanation from Geeksforgeeks:

  1. We start from root and do following until a leaf is found.
  2. If current bit is 0, we move to left node of the tree.
  3. If the bit is 1, we move to right node of the tree.
  4. If during traversal, we encounter a leaf node, we print character of that particular leaf node and then again continue the iteration of the encoded data starting from step 1

You know there's a leaf node when there's no child nodes either to the right or to the left in the tree.

Hope this helps!