C# Decompress GZIP

GZIP compression

GZIP data is often decompressed before use. A byte array containing GZIP bytes can be translated into a byte array containing the original representation of bits. It is possible to use a wrapper method around the GZipStream and other streams to decompress the data.

This C# program decompresses a GZIP byte array using GZipStream.

Example

Note

First this program shows how you can receive a byte array that contains GZIP data and transform it into a byte array that contains the original representation of bytes. In this example, we use a specific GZIP-compressed file on the C:\ directory; to use this program on your computer, you will need to change the path to point to a GZIP file. The program contains a single Decompress method, which receives a GZIP byte array and returns the uncompressed byte array.

Program that decompresses GZIP file [C#]

using System;
using System.IO;
using System.IO.Compression;

class Program
{
    static void Main()
    {
	// Open a compressed file on disk.
	// ... Then decompress it with the method below.
	// ... Then write the length of each array.
	byte[] file = File.ReadAllBytes("C:\\perlgzips\\~stat.gz");
	byte[] decompressed = Decompress(file);
	Console.WriteLine(file.Length);
	Console.WriteLine(decompressed.Length);
    }

    static byte[] Decompress(byte[] gzip)
    {
	// Create a GZIP stream with decompression mode.
	// ... Then create a buffer and write into while reading from the GZIP stream.
	using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
	{
	    const int size = 4096;
	    byte[] buffer = new byte[size];
	    using (MemoryStream memory = new MemoryStream())
	    {
		int count = 0;
		do
		{
		    count = stream.Read(buffer, 0, size);
		    if (count > 0)
		    {
			memory.Write(buffer, 0, count);
		    }
		}
		while (count > 0);
		return memory.ToArray();
	    }
	}
    }
}

Output
    (Please change the filename in the program to a GZIP file.)

9106
36339

Program details. Here we describe some of the details of the Decompress method, which receives an array of GZIP bytes. First, the GZipStream object is instantiated. The backing store of the GZipStream is a MemoryStream wrapped around the GZIP buffer. The second argument to the GZipStream is the CompressionMode.Decompress enumerated constant.

Array type

Allocation. A byte buffer array is allocated; we use 4096 elements for this. The reason 4096 bytes is used is because arrays that are powers of two are better aligned on memory caches and therefore faster. Then, the GZIP array is read from the GZipStream and decompressed; this is written to the MemoryStream.

MemoryStream Use

Use

Programming tip

Let's note some uses of the code here, which reads in a byte array and then decompresses that array to another byte array. Because GZIP compression is often used for websites, you can store web pages as byte arrays in compressed form and then decompress them when required. Because the GZIP version is more compact, this form can be used to store the pages on the disk.

Decompress web page

This C# console program decompresses web pages downloaded in GZIP format. It uses types from the System.IO, System.IO.Compression, and System.Net namespace. When you pass it a URL from the command line, such as http://en.wikipedia.org/, it will download the page in GZIP form. Next, it passes that byte array to the Decompress method. Finally, it converts that byte array to a string.

Convert Byte Array to String WebClient Tutorial
Program that decompresses web pages [C#]

using System;
using System.IO;
using System.IO.Compression;
using System.Net;

class Program
{
    static byte[] Decompress(byte[] gzip)
    {
	using (GZipStream stream = new GZipStream(new MemoryStream(gzip),
						  CompressionMode.Decompress))
	{
	    const int size = 4096;
	    byte[] buffer = new byte[size];
	    using (MemoryStream memory = new MemoryStream())
	    {
		int count = 0;
		do
		{
		    count = stream.Read(buffer, 0, size);
		    if (count > 0)
		    {
			memory.Write(buffer, 0, count);
		    }
		}
		while (count > 0);
		return memory.ToArray();
	    }
	}
    }

    static void Main(string[] args)
    {
	try
	{
	    Console.WriteLine("*** Decompress web page ***");
	    Console.WriteLine("    Specify file to download");
	    Console.WriteLine("Downloading: {0}", args[0]);

	    // Download url.
	    using (WebClient client = new WebClient())
	    {
		client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
		byte[] data = client.DownloadData(args[0]);
		byte[] decompress = Decompress(data);
		string text = System.Text.ASCIIEncoding.ASCII.GetString(decompress);

		Console.WriteLine("Size from network: {0}", data.Length);
		Console.WriteLine("Size decompressed: {0}", decompress.Length);
		Console.WriteLine("First chars:       {0}", text.Substring(0, 5));
	    }
	}
	finally
	{
	    Console.WriteLine("[Done]");
	    Console.ReadLine();
	}
    }
}

Output [argument = http://en.wikipedia.org/]

*** Decompress web page ***
    Specify file to download
Downloading: http://en.wikipedia.org/
Size from network: 15228
Size decompressed: 56362
First chars:       <!DOC
[Done]

File size difference. The compressed page from the example required 15,228 bytes. The expanded form required 56,362 bytes, which is several times larger. Thus, we can see that by getting the GZIP page with the WebClient would enhance network (and likely overall) performance significantly. Expanding a page in memory is typically much faster than downloading an additional 41,000 bytes.

Note: This console program demonstrates how you can download a GZIP page and expand it in memory. It does not contain adequate error-handling mechanisms, so will fail on servers that do not support GZIP. This could be solved by using exception handling.

Summary

The C# programming language

We looked at how you can decompress an array of GZIP bytes into an array of the original bytes. The C# method shown receives a GZIP byte array and returns the original byte array. The method translates the two arrays using stream interfaces. Finally, we noted that there are ways you can use byte arrays for website pages to improve efficiency, and this code is useful here.

Compress Data: GZIP Compression Tips
.NET