C# GZIP File Test

GZIP compression

A GZIP file has certain header bytes. We detect GZIP files by testing these bytes in the C# programming language. Files have two header bytes set. We evaluate other options for detecting GZIP.

Introduction

Note

First, we should reference the "RFC 1952 GZIP File Format Specification Version 4.3". This is the document shown in the screenshot. It contains useful information about the file structure of GZIP files. It states that the first two bytes contain fixed values:

ID1 (IDentification 1)
ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format.

Description of quote from the RFC. The above text indicates that the first two bytes of GZIP files are 31 and 139. Therefore, we can test those two bytes in C# for a valid GZIP file.

GZIP file format specification

Implementation

In this example, I use the following console program written in the C# language to test a file called "2About.gz", which contains GZIP data produced by 7-Zip.

Program that tests gzip files [C#]

using System;
using System.IO;

class Program
{
    static void Main()
    {
	byte[] gzip = File.ReadAllBytes("2About.gz");
	Console.WriteLine(GZipTool.IsGZipHeader(gzip)); // True

	byte[] text = File.ReadAllBytes("Program.cs");
	Console.WriteLine(GZipTool.IsGZipHeader(text)); // False
    }
}

/// <summary>
/// GZIP utility methods.
/// </summary>
public static class GZipTool
{
    /// <summary>
    /// Checks the first two bytes in a GZIP file, which must be 31 and 139.
    /// </summary>
    public static bool IsGZipHeader(byte[] arr)
    {
	return arr.Length >= 2 &&
	    arr[0] == 31 &&
	    arr[1] == 139;
    }
}

Output

True
False
Main method

Overview. The code first shows the Main() method, which demonstrates how the IsGZipHeader method works. The first part of Main reads in a GZIP file, "2About.gz". The result value from IsGZipHeader on this file is true.

Description of GZipTool static class. This class is a public static class, which means you can access it from other namespaces and files. It does not store state. It contains one method: IsGZipHeader.

Description of IsGZipHeader(byte[]) method. This method is a static method that returns true if the byte data passed to it contains the values 31 and 139 in its first two bytes.

Alternative

Programming tip

There is an alternative way to check an entire file to see if it is a GZIP file and its data is valid. You can simply try to decompress it, and catch errors with a try/catch block. By simply trying to decompress the data, you will also detect other errors in the file. However, this is much slower than testing two bytes.

Note on file extensions. In many systems, file extensions can be changed and are not reliable. I would not want a system to fail when a critical file was renamed to ".gzip" from ".gz". Therefore, it is good to scan each file, testing its header.

Summary

Here we saw how you can test the first two bytes in a file to detect the signature bytes for GZIP. It is reliable and tested. It is much faster and more suitable for some situations than trying to decompress the file blindly.

Compression Tips
.NET