Compression

Array Collections File Keyword String .NET Cast Class Compression Data Directive Enum Exception If Interface Lambda LINQ Loop Method Number Process Property Regex Sort StringBuilder Struct Switch Time Windows WPF

Compression

Compression. Data increases at a phenomenal pace.
All things are digitized,
recorded,
stored. This requires more and more storage. In compression, we tame this data dragon.


Compression

Ratios. A data compression ratio indicates how well an algorithm works. Often we trade space for time. A slower, more thorough algorithm yields a greater ratio.


Steps

Strategies. We use 7-Zip to compress files. Then we explore the compression tools in the .NET Framework. We even compress images and minify CSS files.


7z: 7-Zip icon

7-Zip. This compression utility is an open-source project developed by Igor Pavlov. It has an excellent compression ratio, greater than that of many other algorithms.

7-Zip Command-Line
Cover logo

DEFLATE. We test DEFLATE in 7-Zip. It is used in GZIP. We test various DEFLATE command-line options and present an optimal command line.

DEFLATE
About part

PPMd stands for Prediction by Partial Matching. It is often effective on certain kinds of text-based files. This is a good option if we must compress Shakespeare plays.

PPMd
PAQ compression algorithm uses content mixing

PAQ8. With compression, we can trade time for smaller file sizes (less space). PAQ is a more advanced yet slower algorithm. It chooses models based on context.

PAQ8
About part

DeflOpt. This utility is an additional optimization for some compressed files. It improves compression ratios. But the improvements are small.

DeflOpt
C# programming language

C# programs. We can directly compress and decompress data in the C# language. The code is reliable and tested. These examples use the System.IO.Compression namespace.

CompressDecompressGZipStream7-Zip Executable
ASP header

ASP.NET sites. We can build GZIP compression directly into an ASP.NET website. But often it is better (and easier) to use IIS and let it compress.

Accept-EncodingHTTP CompressionGZIP Output
GZIP compression

Test GZIP files. GZIP files have specific header bytes. We detect and rewrite GZIP files directly in the C# language. These methods help in programs that handle compressed files.

GZIP File TestGZIP Header Flag Byte
Framework: NET

Classes. The .NET Framework provides the System.IO.Compression namespace. In the past, developers had to turn to third-party compression libraries. This is no longer required.


Question

Difference. GZipStream is implemented with DeflateStream. It is a simple wrapper type around an actual DeflateStream. We find GZipStream provides support for GZIP headers.


ABC: letters

CompressionMode. When we use DeflateStream or GZipStream, we pass in an existing stream (like a MemoryStream or FileStream). We also supply a CompressionMode or CompressionLevel.

CompressionLevel
Copy: new object copied

ZipFile. In .NET 4.5,
ZipFile,
a class,
makes compression easier. This class, and its methods CreateFromDirectory and ExtractToDirectory, enables compression of a directory.

ZipFile
Char

Values. Strings in the C# language use two bytes for each character. But ASCII strings require only one byte per character. By using byte arrays, we can reduce memory usage of data.

ASCII Strings
Cascading style sheet: CSS

Styles. Every time a visitor loads a website, the CSS content is downloaded and processed. We reduce the amount of time this requires by minifying CSS text.

Minify CSS
Copyright

Images. We optimize images such as PNGs and ICOs. With lossless compression, we change an image's internal data structures to require less space, with no visual change.

favicon.icoPNG
Reading

Research. We use an algorithm called Huffman coding for many kinds of lossless data compression. It represents the most frequent symbols with the shortest codes.

We can encode data more efficiently... if we assign shorter codes to the frequent symbols.

SICP
Time

Compression of data saves not just space. By representing the data in a more compact way, algorithms acting upon that data touch fewer memory regions. This makes them faster.

C#