Problem Statement
The goal is to implement Huffman coding, which involves:
- Building a frequency table from the input characters.
- Creating a binary tree based on the frequencies.
- Generating binary codes for each character from the tree.
Key Concepts
- Priority Queue: We will use a min-heap (priority queue) to efficiently extract the two nodes with the lowest frequency during the tree construction.
- Binary Tree: Each character and its frequency will be represented as a leaf node in a binary tree. Internal nodes will represent the combined frequency of their children.
- Code Generation: Traverse the tree to generate codes based on the path taken (left for 0 and right for 1).
Dynamic Programming Approach
- Step 1: Count the frequency of each character in the input string.
- Step 2: Build a priority queue from the frequency data.
- Step 3: Create the Huffman tree by repeatedly merging the two least frequent nodes.
- Step 4: Generate codes by traversing the tree.
C Program for Huffman Coding
Here’s a complete C program to implement Huffman coding:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Node structure for the Huffman tree
struct Node {
char character;
int frequency;
struct Node *left, *right;
};
// Min-Heap structure
struct MinHeap {
int size;
int capacity;
struct Node **array;
};
// Function to create a new min-heap node
struct Node* newNode(char character, int frequency) {
struct Node* temp = (struct Node*)malloc(sizeof(struct Node));
temp->left = temp->right = NULL;
temp->character = character;
temp->frequency = frequency;
return temp;
}
// Function to create a min-heap
struct MinHeap* createMinHeap(int capacity) {
struct MinHeap* minHeap = (struct MinHeap*)malloc(sizeof(struct MinHeap));
minHeap->size = 0;
minHeap->capacity = capacity;
minHeap->array = (struct Node**)malloc(minHeap->capacity * sizeof(struct Node*));
return minHeap;
}
// Function to swap two nodes
void swapNode(struct Node** a, struct Node** b) {
struct Node* t = *a;
*a = *b;
*b = t;
}
// Function to heapify the min-heap
void minHeapify(struct MinHeap* minHeap, int idx) {
int smallest = idx;
int left = 2 * idx + 1;
int right = 2 * idx + 2;
if (left < minHeap->size && minHeap->array[left]->frequency < minHeap->array[smallest]->frequency) {
smallest = left;
}
if (right < minHeap->size && minHeap->array[right]->frequency < minHeap->array[smallest]->frequency) {
smallest = right;
}
if (smallest != idx) {
swapNode(&minHeap->array[smallest], &minHeap->array[idx]);
minHeapify(minHeap, smallest);
}
}
// Function to check if size is 1
int isSizeOne(struct MinHeap* minHeap) {
return (minHeap->size == 1);
}
// Function to extract the minimum node
struct Node* extractMin(struct MinHeap* minHeap) {
struct Node* temp = minHeap->array[0];
minHeap->array[0] = minHeap->array[minHeap->size - 1];
--minHeap->size;
minHeapify(minHeap, 0);
return temp;
}
// Function to insert a new node
void insertMinHeap(struct MinHeap* minHeap, struct Node* node) {
++minHeap->size;
int i = minHeap->size - 1;
while (i && node->frequency < minHeap->array[(i - 1) / 2]->frequency) {
minHeap->array[i] = minHeap->array[(i - 1) / 2];
i = (i - 1) / 2;
}
minHeap->array[i] = node;
}
// Function to build the Huffman tree
struct Node* buildHuffmanTree(char characters[], int frequencies[], int size) {
struct Node *left, *right, *top;
struct MinHeap* minHeap = createMinHeap(size);
for (int i = 0; i < size; ++i) {
minHeap->array[i] = newNode(characters[i], frequencies[i]);
}
minHeap->size = size;
while (!isSizeOne(minHeap)) {
left = extractMin(minHeap);
right = extractMin(minHeap);
top = newNode('$', left->frequency + right->frequency);
top->left = left;
top->right = right;
insertMinHeap(minHeap, top);
}
return extractMin(minHeap);
}
// Function to print the Huffman codes
void printCodes(struct Node* root, int arr[], int top) {
if (root->left) {
arr[top] = 0;
printCodes(root->left, arr, top + 1);
}
if (root->right) {
arr[top] = 1;
printCodes(root->right, arr, top + 1);
}
if (!root->left && !root->right) {
printf("%c: ", root->character);
for (int i = 0; i < top; ++i) {
printf("%d", arr[i]);
}
printf("\n");
}
}
// Main function
int main() {
char characters[] = {'a', 'b', 'c', 'd', 'e', 'f'};
int frequencies[] = {5, 9, 12, 13, 16, 45};
int size = sizeof(characters) / sizeof(characters[0]);
struct Node* root = buildHuffmanTree(characters, frequencies, size);
int arr[100], top = 0;
printf("Huffman Codes:\n");
printCodes(root, arr, top);
return 0;
}
Explanation of the Code
- Node Structure:
- Each node represents a character and its frequency, along with pointers to its left and right children.
- Min-Heap Implementation:
- We create a min-heap to store the nodes. This helps in efficiently extracting the two nodes with the smallest frequencies.
- Functions are provided for min-heap operations like insertion, extraction, and heapification.
- Building the Huffman Tree:
- We build the tree by repeatedly extracting the two smallest nodes, creating a new internal node with these two as children, and inserting it back into the min-heap.
- Generating Huffman Codes:
- We traverse the tree recursively to generate codes. Moving left corresponds to appending a
0
to the code, while moving right appends a1
.
- We traverse the tree recursively to generate codes. Moving left corresponds to appending a
- Main Function:
- The main function initializes a set of characters and their frequencies, builds the Huffman tree, and prints the generated codes.
Input and Output Example
The program uses a predefined set of characters and their frequencies, so there’s no user input in this version. Here’s an example of the output:
Output
yamlCopy codeHuffman Codes:
e: 000
a: 001
d: 01
b: 10
c: 110
f: 111
Explanation of the Output
In the output, each character is followed by its corresponding Huffman code. The most frequent characters (like ‘f’) get the longest codes, while the least frequent characters (like ‘e’) get shorter codes.
Conclusion
This C program implements Huffman coding efficiently using dynamic data structures like trees and heaps. It provides a clear way to compress data based on character frequencies, making it valuable for various applications in data compression and encoding. You can modify the character set and frequencies to see how the Huffman codes change based on different inputs.