Huffman Coding Complexity

130 Views Asked by At

In Huffman coding we perform a series of iterations in which each time we remove from a priority queue the two minimum elements and add to it an element corresponding to the sum of the two elements removed. Instead of a priority queue, could we have the elements sorted, each time removing the first two and entering in the correct taxonomic order their sum? Why don't we do that?

3

There are 3 best solutions below

0
templatetypedef On

You absolutely could do this. Huffman coding works by repeatedly finding the two smallest-weight trees and linking them together. We typically use a standard priority queue for this because the cost of finding the two lowest-weight trees and inserting the resulting tree back into the priority queue is O(log n). Across all steps of building the tree, this takes time O(n log n).

Your proposal would also maintain the ordering of the items and so would let you quickly find the two smallest-weight trees in time O(1). However, inserting the new tree into the proper position in the sorted sequence may take time O(n) due to the cost of shifting over elements. This would make the cost of building the Huffman tree O(n2), slower than using a binary heap.

(In a sense, your proposal is equivalent to the original Huffman algorithm but using a sorted array as a priority queue rather than a binary heap. For the same reason we typically don’t use sorted arrays as priority queues, though, you’d be better off with a binary heap.)

0
Matt Timmermans On

You're on the right track, but as @templatetypedef mentions, your method has a significant cost when inserting the combined symbol back into the sorted list.

In practice, we use a clever trick to avoid that problem:

  1. Create a list of your symbols with probabilities, and sort it in ascending order

  2. Create an initially empty list for combined symbols. This will remain sorted as we work.

  3. While there is more than one symbol in the lists:

    1. Remove the two smallest symbols from the beginnings of two lists
    2. Combine them and add them to the end of the combined list. Because the new symbol has a higher combined probability than any combined symbol created before, this list remains sorted.

Since both lists remain sorted as the algorithm progresses, the smallest probability symbol will always be the first one of one of the two lists, so no priority queue or searching is required to find it.

0
Mark Adler On

It is often done in O(n) time, and in fact in place, on an already-sorted list of frequencies. See https://people.eng.unimelb.edu.au/ammoffat/abstracts/compsurv19moffat.pdf