Compression. Data increases at a phenomenal pace.
All things are digitized,
stored. This requires more and more storage. In compression, we tame this data dragon.
Ratios. A data compression ratio indicates how well an algorithm works. Often we trade space for time. A slower, more thorough algorithm yields a greater ratio.
Strategies. We use 7-Zip to compress files. Then we explore the compression tools in the .NET Framework. We even compress images and minify CSS files.
7-Zip. This compression utility is an open-source project developed by Igor Pavlov. It has an excellent compression ratio, greater than that of many other algorithms.7-Zip Command-Line
DEFLATE. We test DEFLATE in 7-Zip. It is used in GZIP. We test various DEFLATE command-line options and present an optimal command line.DEFLATE
PPMd stands for Prediction by Partial Matching. It is often effective on certain kinds of text-based files. This is a good option if we must compress Shakespeare plays.PPMd
PAQ8. With compression, we can trade time for smaller file sizes (less space). PAQ is a more advanced yet slower algorithm. It chooses models based on context.PAQ8
DeflOpt. This utility is an additional optimization for some compressed files. It improves compression ratios. But the improvements are small.DeflOpt
C# programs. We can directly compress and decompress data in the C# language. The code is reliable and tested. These examples use the System.IO.Compression namespace.CompressDecompressGZipStream7-Zip Executable
ASP.NET sites. We can build GZIP compression directly into an ASP.NET website. But often it is better (and easier) to use IIS and let it compress.Accept-EncodingHTTP CompressionGZIP Output
Test GZIP files. GZIP files have specific header bytes. We detect and rewrite GZIP files directly in the C# language. These methods help in programs that handle compressed files.GZIP File TestGZIP Header Flag Byte
Classes. The .NET Framework provides the System.IO.Compression namespace. In the past, developers had to turn to third-party compression libraries. This is no longer required.
Difference. GZipStream is implemented with DeflateStream. It is a simple wrapper type around an actual DeflateStream. We find GZipStream provides support for GZIP headers.
CompressionMode. When we use DeflateStream or GZipStream, we pass in an existing stream (like a MemoryStream or FileStream). We also supply a CompressionMode or CompressionLevel.CompressionLevel
ZipFile. In .NET 4.5,
makes compression easier. This class, and its methods CreateFromDirectory and ExtractToDirectory, enables compression of a directory.
Values. Strings in the C# language use two bytes for each character. But ASCII strings require only one byte per character. By using byte arrays, we can reduce memory usage of data.ASCII Strings
Styles. Every time a visitor loads a website, the CSS content is downloaded and processed. We reduce the amount of time this requires by minifying CSS text.Minify CSS
Images. We optimize images such as PNGs and ICOs. With lossless compression, we change an image's internal data structures to require less space, with no visual change.favicon.icoPNG
Research. We use an algorithm called Huffman coding for many kinds of lossless data compression. It represents the most frequent symbols with the shortest codes.
We can encode data more efficiently... if we assign shorter codes to the frequent symbols.SICP
Compression of data saves not just space. By representing the data in a more compact way, algorithms acting upon that data touch fewer memory regions. This makes them faster.