27 November, 2014

Huffman coding: Tutorial

Lets say you have a set of characters and their frequency of use and you want to create a huffman encoding for them:
  FREQUENCY        VALUE 
  ---------       -----
     A            0.35
     B            0.1
     C            0.2
     D            0.2
     E            0.15

Creating a huffman tree is simple.


Step 1 : 

Sort this list by frequency.



So after sorting in ascending order, it would look something like this
  FREQUENCY        VALUE
  ---------       -----
     B            0.1
     E            0.15
     C            0.2        (You can break the tie between C and D arbitrarily)
     D            0.2
     A            0.35

Step 2  : 

make the two-lowest elements into leaves, creating a parent node with a frequency that is the sum of the two lower element's frequencies:

       0.25:v1*
       _/  \_
      /      \
    B:0.1   E:0.15

The two elements are removed from the list and the new parent node, with frequency 0.25, is inserted into the list by frequency. So now the list, sorted by frequency is:
      0.2 : C      0.2 : D      0.25 : v1*      0.35 : A


Step 3 : 

Repeat the loop, combining the two lowest elements again. This results in:
     
                0.4:v2*
              _/       \_
            /               \
   0.2:C         0.2:D

and the list is now:
   
    0.25 : v1*       0.35 : A        0.4 : v2*


Step 4 : 


            0.6:v3*  
       /  \
0.25:v1*   0.35:A



The new sorted list is 

    0.4 : v2*      0.6 : v3*




Step 5:

1.0:v4*
        /   \
 0.4:v2*     0.6:v3*

Redrawing the tree would give us something similar to this
   
                  1.0:v4*
            ___/       \____________
           /                                        \
       0.4:v2*                              0.6:v3*
     _/     \_                             __/      \____
    /            \                         /                      \
 0.2:C         0.2:D             0.25:v1*          0.35:A
                                     _/            \_
                                    /                   \
                                   0.1:B          0.15:E
     



Now the list is just one element containing 1.0:v4* so you are done.

This element becomes the root of your binary huffman tree. To generate a huffman code you traverse the tree to the value you want, outputting a 0 every time you take a left-hand branch, and a 1 every time you take a right-hand branch(normally you traverse the tree backwards from the code you want and build the binary huffman encoding string backwards as well, since the first bit must start from the top).

                  1.0:v4*
          0 ___/    \_____1_______
           /                                        \
       0.4:v2*                                0.6:v3*
   0 _/     \_  1                     0   __/       \____   1
    /             \                          /                       \
 0.2:C       0.2:D                0.25:v1*                0.35:A
                                 0   _/           \ _  1
                                    /                    \
                                0.1:B               0.15:E





Example: The encoding for the value C (0.2) is 00.
                 The encoding for the value  A(0.35) is 11.

Similarly 

     D(0.2)    =  01
     B(0.1)    = 100
     E(0.15)  = 101


Decoding a huffman encoding is just as easy : as you read bits in from your input stream you traverse the tree beginning at the root, taking the left hand path if you read a 0 and the right hand path if you read a 1. When you hit a leaf, you have found the code.
Generally, any huffman compression scheme also requires the huffman tree to be written out as part of the file, otherwise the reader cannot decode the data. For a static tree, you don't have to do this since the tree is known and fixed.

Hope it helped!

No comments:

Post a Comment