PPMd Compression

7-Zip icon

PPMd is a lossless compression algorithm. It is effective at compressing text files containing natural language text. The 7-Zip open-source compression utility provides several compression options—including the PPMd algorithm.

This article looks at the PPMd Prediction by Partial Matching algorithm in 7-Zip.

Original file:

URL: http://shakespeare.mit.edu/macbeth/full.html
Compressed gzip on server: 55186 bytes
Decompressed html on disk: 195747 bytes

PPMd Test:

Format:            7z
Compression level: Ultra
Dictionary size:   192 MB
Word size:         32
Solid block size:  'solid'
SIZE:              37211 bytes

GZIP Test:

Format:            GZip
Compression level: Ultra
Dictionary size:   32 KB
Word size:         258 bytes
SIZE:              52530 bytes

LZMA Test:

Format:            7z
Compression level: Ultra
Dictionary size:   64 MB
Word size:         273
Solid block size:  'solid'
SIZE:              47921 bytes

BZIP2 Test:

Format:            7z
Compression level: Ultra
Dictionary size:   900 KB
Solid block size:  'solid'
SIZE:              39892 bytes

Introduction

First, the acronym PPMd stands for Prediction by Partial Matching, and it describes an algorithm that chooses how to compress data further in the stream by the data it has most recently encountered. Wikipedia describes this approach as an adaptive statistical data compression technique.

Prediction by partial matching

PPMd in 7-Zip

It is simple to use the PPMd algorithm in recent versions of the 7-Zip open-source compression utility for Windows and other platforms. To compress a file with the PPMd algorithm, which will yield a "filename.7z" archive, right-click on the file in Windows explorer and select 7-Zip... Add to archive. When the archive format is specified as 7z, you can select "PPMd" for the compression method.

Results

Note

Here we summarize the figures at the top of this document. The document tested was the Macbeth play hosted on the MIT campus network. This document is fairly large (around 200 KB) when uncompressed. On the MIT servers, the document is gzipped and is about 55000 bytes. When the document was compressed with PPMd, the size was reduced to about 37000 bytes, which was a better ratio than GZIP, BZIP2, and LZMA yielded. PPMd here had an improvement of about 32.57% percent over the GZIP algorithm used at MIT.

Uses

Because the PPMd algorithm yields outstanding compression ratios for text files, which make up most of the important information in the world, it could greatly decrease the bandwidth on many internet sites if it were implemented in all software. Unfortunately, this is not imminent so its best use is to compress private text files or database dumps containing massive amounts of important information.

Summary

We saw how the PPMd compression algorithm implemented in the 7-Zip compression utility can yield excellent results on text files in the real world. The PPMd algorithms were used as the basis for the PAQ algorithms, which implement the best lossless compression in the world and received the Hutter Prize (which promotes artificial intelligence research). The results in this article are not optimal, but show how PPMd can be useful in decreasing text file compression sizes by about 30% over more popular algorithms.

Compression Tips
Home