C# CSV File Method

File image with lines of text

This method separates CSV files. It turns a comma-separated values file into smaller files containing parts of the original data. For database input, sometimes you can only upload one megabyte of CSV records at a time. This C# method is ideal for this case.

This C# program example splits a CSV file into many separate files.

Example

Main method

Here we see a static class that uses methods from System.IO in the C# programming language to divide a large input CSV files, such as example.csv, into smaller files of one megabyte. First, pay attention to the method call in the Main method, which specifies files of 1024 times 1024 bytes, or one megabyte.

Program that uses CSV files [C#]

using System;

class Program
{
    static void Main()
    {
	// Split this CSV file into 1 MB chunks.
	CSVSplitTool.SplitCSV("example.csv", "split", 1024 * 1024);
    }
}

/// <summary>
/// Tool for splitting CSV files at a certain byte size on a line break.
/// </summary>
static class CSVSplitTool
{
    /// <summary>
    /// Split CSV files on line breaks before a certain size in bytes.
    /// </summary>
    public static void SplitCSV(string file, string prefix, int size)
    {
	// Read lines from source file
	string[] arr = System.IO.File.ReadAllLines(file);

	int total = 0;
	int num = 0;
	var writer = new System.IO.StreamWriter(GetFileName(prefix, num));

	// Loop through all source lines
	for (int i = 0; i < arr.Length; i++)
	{
	    // Current line
	    string line = arr[i];
	    // Length of current line
	    int length = line.Length;

	    // See if adding this line would exceed the size threshold
	    if (total + length >= size)
	    {
		// Create a new file
		num++;
		total = 0;
		writer.Dispose();
		writer = new System.IO.StreamWriter(GetFileName(prefix, num));
	    }
	    // Write the line to the current file
	    writer.WriteLine(line);

	    // Add length of line in bytes to running size
	    total += length;

	    // Add size of newlines
	    total += Environment.NewLine.Length;
	}
	writer.Dispose();
    }

    /// <summary>
    /// Get an output file name based on a number.
    /// </summary>
    static string GetFileName(string prefix, int num)
    {
	return prefix + "_" + num.ToString("00") + ".txt";
    }
}

Description. Here are the details of the CSVSplitTool.SplitCSV method. This is a static method in a static class, meaning it doesn't store state and can't be created as an instance. You call it with the dot notation. The method receives three parameters, which specify the source file name, the output file name prefix, and the size in bytes you want the output files to be. The second parameter, prefix, is the first part of the output file names.

Static Method Static ClassSteps

Internal implementation. The method uses System.IO.File.ReadLines to read in the entire source CSV file. These lines are then looped over. In the for loop, it adds up the current byte length of the strings, and when it exceeds the maximum length in bytes, it outputs a new file. It generates file names programmatically with GetFileName. This example will generate file names "split_00.txt", "split_01.txt" and more.

Verify correctness

Note

Here we verify the correctness of the above method to make sure it works correctly. The example CSV file is a 6,409,636-byte CSV file containing 60,000 lines, each with 10 fields. Each field is a random number.

File output. The sum of the six output files is 6.11 MB (6,409,636 bytes), which is exactly the same as the input file. The first five output files are 1024 KB each, which is displayed as 0.99 MB in the file manager. The final file is 116 KB, containing the final few KB. The lines in the output files were also checked for accuracy. The first file split occurs after line 9816. Therefore, line 9816 is the final line in the first output file, and line 9817 is the first line in the second output file.

Summary

The C# programming language

In this article, we saw a simple static method that splits CSV files based on byte size. You can use it to split your CSV files on any size boundaries, usually one megabyte or two megabytes. This is useful for inputting CSV files to a database.

File Handling
.NET