I'm struggling to understand how to decode, say, a text file that has been compressed using Huffman's method. Let's say I'm reading a text file, I get a list of all the characters and the frequency in which they occur, I create a Huffman tree and all the characters have a specific code assigned do them. Say, a: 110 b: 11 c: 010 etc.
When I want to decompress this text file and print/read its contents, how do I do that? How do I know if the file reads "abc" or "bac"?
A small solution I made up was after the Huffman tree has been created, I read the file all over again and create an array to insert every character code as I read it.
Say, a while loop where I read a character until I've reached EOF. character = a; insert 110 into array. Character = b; insert 11 into array until we are left with 11011010. But I feel like there should be a better way.
EDIT: The codes for a,b, and c are random, not actual Huffman codes. I put in random ones as it's irrelevant for the question, I'm only interested in how it would be decoded with or without a real life example. But here's an example of Huffman code for "Hello World."
l: 11 o: 001 H: 100 e: 0101 spacebar: 0000 w: 0001 r: 101 d: 011 .: 0100
A Huffman code is a prefix code, which means that no code can be a prefix of any other code. Your example of a Huffman code is most definitely not a Huffman code. There you have
11(c), which is a prefix of110(b). That cannot be the result of a correct implementation of Huffman's algorithm.Update for question update:
You are incorrect. The codes are extremely relevant for the question. The examples you gave cannot be unambiguously decoded.
Second update of question:
It is still not clear what you're asking, but here is an answer to the question: "How do I decode a stream of bits that are a Huffman-coded sequence of symbols?"
Here is the tree for the example prefix code:
You see that if you follow any sequence of branches to a symbol, the branches you followed are the bits in that code. That is exactly how you decode the incoming stream of bits.