C# File

Array Class Collections File Keyword String .NET ASP.NET Cast Compression Data Delegate Directive Enum Exception If Interface LINQ Loop Method Number Process Property Regex Sort StringBuilder Struct Switch Time Windows WPF

File handling with open, close, read and write

Files store persistent objects. In memory, objects cease to exist when a program ends. But files exist until deletion. They are handled with types in System.IO.
Files cause errors
and performance problems,
so we must be careful.

StreamReader

For text files, StreamReader and StreamWriter are often the most useful types. That is why we are starting with them. We use StreamReader in a using block, a special syntax form. It begins with the "using" keyword.

Often:We achieve better performance with StreamReader and StreamWriter than with static File methods.

StreamReaderStringWriterReadLine
Based on:

.NET 4.5.1

Program that uses StreamReader, ReadLine: C#

using System.IO;

class Program
{
    static void Main()
    {
	// Read every line in the file.
	using (StreamReader reader = new StreamReader("file.txt"))
	{
	    string line;
	    while ((line = reader.ReadLine()) != null)
	    {
		// Do something with the line.
		string[] parts = line.Split(',');
	    }
	}
    }
}

Intro

Path type

Before any file can be opened, it must be addressed. File paths are complex.
They include the volume,
directory,
name
and extension. These parts together lead to increased complexity. The Path type helps reduce this.

Path

Directory:You can manipulate directories on the file system. The Directory type, and its static methods, is necessary for this.

DirectoryProgramming tip

FileInfo:You can get information about a file from the file system with FileInfo. This does not load the entire file into memory.

FileInfo

Markup

Title element in HTML

Some files have lots of brackets and tags. These are usually HTML files. Sometimes they are XML files. You could write custom methods for each program, but standardized approaches exist. They usually make your life easier.

HTML:A universal language, HTML is used throughout the world. But handling it in a C# program leads to problems.

HTMLExtensible markup language: XML

XML:This is a standardized, text-based markup language.
XML is easy to use.
The System.Xml namespace helps.

XML

ReadAllText

String type

Next we test ReadAllText. This program uses this method to load in the file "file.txt" on the C: volume. Then it prints the contents of the file. The data is now stored in a string object.

File.ReadAllText

And:It includes the System.IO namespace at the top. The System namespace includes the Console class.

Console
System.IO namespace: C#

//
// Include this namespace for all the examples.
//
using System.IO;

Program that uses ReadAllText: C#

using System;
using System.IO;

class Program
{
    static void Main()
    {
	string file = File.ReadAllText("C:\\file.txt");
	Console.WriteLine(file);
    }
}

ReadAllLines

Array type

Here we read all the lines from a file and place them in an array. This code reads all lines in "file.txt" with File.ReadAllLines. This is efficient code. It avoids unneeded operations.

File.ReadAllLines

Performance. When you read in a file with File.ReadAllLines, many strings are allocated and put into an array in a single method call. But with StreamReader you can allocate each string as you pass over the file by calling ReadLine.

Tip:This makes StreamReader more efficient unless you need all the file data in memory at once.

Program that uses ReadAllLines: C#

using System.IO;

class Program
{
    static void Main()
    {
	// Read in every line in specified file.
	// ... This will store all lines in an array in memory,
	// ... which you may not want or need.
	string[] lines = File.ReadAllLines("file.txt");
	foreach (string line in lines)
	{
	    // Do something with line
	    if (line.Length > 80)
	    {
		// Example code
	    }
	}
    }
}

List

List type

Next, we use the List constructed type with file handling methods. List and ArrayList are useful data structures. They allow object collections to rapidly expand or shrink. We use LINQ to get a List of lines from a file in one line.

ToList
Program that uses ReadAllLines with List: C#

using System.Collections.Generic;
using System.IO;
using System.Linq;

class Program
{
    static void Main()
    {
	// Read in all lines in the file,
	// ... and then convert to a List with LINQ.
	List<string> fileLines = File.ReadAllLines("file.txt").ToList();
    }
}

Count lines

Steps

Here we need to count the number of lines in a file but don't want to write lots of code. Note that the example here doesn't have ideal performance characteristics. We reference the Length property on the array returned.

Line Count

Also:We can use LINQ to query the lines in a file. Please see the next example for more details.

Program that counts lines: C#

using System.IO;

class Program
{
    static void Main()
    {
	// Another method of counting lines in a file.
	// ... This is not the most efficient way.
	// ... It counts empty lines.
	int lineCount = File.ReadAllLines("file.txt").Length;
    }
}

Query

Question and answer

Does a line containing a specific string exist in the file? Maybe you want to see if a name or location exists in a line in the file. We harness the power of LINQ to find any matching line.

LINQ

Tip:This query uses the Count() extension method, which evaluates the entire expression.

Count
Program that uses LINQ on file: C#

using System.IO;
using System.Linq;

class Program
{
    static void Main()
    {
	// One way to see if a certain string is a line
	// ... in the specified file. Uses LINQ to count elements
	// ... (matching lines), and then sets |exists| to true
	// ... if more than 0 matches were found.
	bool exists = (from line in File.ReadAllLines("file.txt")
		       where line == "Some line match"
		       select line).Count() > 0;
    }
}

ReadLines

Note

In contrast to File.ReadAllLines, File.ReadLines does not read in every line immediately upon calling it. Instead, it reads lines only as they are needed. It is best used in a foreach-loop.

File.ReadLinesForeach

WriteAllLines

We can write an array to a file. When you are done with your in-memory processing, you often need to write the data to disk. Fortunately the File class offers an excellent WriteAllLines method.

Then:It receives the file path and then the array to write. This will replace all the file contents.

Program that writes array to file: C#

using System.IO;

class Program
{
    static void Main()
    {
	// Write a string array to a file.
	string[] stringArray = new string[]
	{
	    "cat",
	    "dog",
	    "arrow"
	};
	File.WriteAllLines("file.txt", stringArray);
    }
}

Output

cat
dog
arrow

WriteAllText

A simple method, File.WriteAllText receives two arguments. It receives the path of the output file, and the exact string contents of the text file. Sometimes you need just a simple text file. In those times, this is an ideal methods.

Program that uses File.WriteAllText: C#

using System.IO;

class Program
{
    static void Main()
    {
	File.WriteAllText("C:\\perls.txt",
	    "Dot Net Perls");
    }
}
File: text page

AppendAllText. You can append text to files in a simple method. We could read in the file, append to that in memory, and then write it out completely again. That is slow. It is more efficient to use an append.

File.AppendAllText

Also:The File.AppendText method returns a StreamWriter instance that you can use to append string data to the specified file.

But:It is not covered on this site in detail. It is usually easier to use the StreamWriter constructor directly.

ReadAllBytes

Image

Here we use File.ReadAllBytes to read an image (a PNG) into memory. With this code, we could cache an image in memory for performance. This works well. It greatly outperforms reading the image in each time.

File.ReadAllBytes
Program that caches binary file: C#

static class ImageCache
{
    static byte[] _logoBytes;
    public static byte[] Logo
    {
	get
	{
	    // Returns logo image bytes.
	    if (_logoBytes == null)
	    {
		_logoBytes = File.ReadAllBytes("Logo.png");
	    }
	    return _logoBytes;
	}
    }
}
Byte type

The File.WriteAllBytes method does exactly as you might expect. It writes the bytes in a byte array to a file at the location specified. We show code that compresses byte data. It uses File.WriteAllBytes.

File.WriteAllBytes: Compress

TextReader

Letter A

The TextReader and TextWriter types form the base class that other, more useful types derive from. Usually they are not useful on their own. But they help us form a more comprehensive view of the Framework.

TextReaderTextWriter

Tip:The .NET Framework uses inheritance to implement its types. Studying it helps us learn to be better programmers.

Binary

Abstract squares

The .NET Framework has two types that make reading or writing a binary file much easier: BinaryReader and BinaryWriter. These types introduce a level of abstraction over the raw data. We avoid dealing with the raw bits and bytes.

BinaryReaderBinaryWriter

Seek:You can seek to a specific location in your file with the Seek method. We demonstrate, and benchmark, this method.

Seek

Actions

Chaos

You can copy,
delete,
rename
or get time information about files. These actions are available through the File type and the FileInfo type. As always, some methods are more useful than others.

File.CopyFile.DeleteFile.ExistsFile.GetLastWriteTimeUtcFile.MoveFile.OpenFile.Replace

Stream

Stream abstract type

Streams take many forms in the .NET Framework. Sometimes leaving a file on the disk would impact performance or stability in a negative way.
In these cases,
please consider MemoryStream:
it stores a file in memory,
as a Stream.

StreamMemoryStreamBaseStream

Web

The C# programming language

Not every file you want to use is located on your local disk. A file may be remote. You may need to access the network to download a file from a server. The WebClient type helps here.

WebClient

Custom

Dots: colored circles

Many file-handling methods are included in the .NET Framework. But often developers must create custom methods to handle unusual cases. Data may be badly formed or inconsistent. Edge cases must be handled.

Office:It is common to need to control Microsoft Excel with C# code. We introduce a fast approach.

ExcelWordCSV file

CSV files:These are text-based databases. With the System.IO namespace, you can read them into your C# program.

TextFieldParser: Parse CSVCSV: Separate Files

Equality:How can you tell if two files are exactly equal? Unfortunately, the file system's metadata is not sufficient.

File Equals

Performance

Performance optimization

When you access a file in Windows, the operating system puts that file into a memory cache. We provide a benchmark of file system caches. And we show how temporal locality can be used to improve performance.

1. Understand file caches.Operating systems provide their own file caching mechanisms. This is key to good IO performance.

Windows File Cache

2. Access files together.You can change the ordering of operations so that all file reads (or writes) occur in one part of runtime.

Temporal Locality

3. Use MemoryMappedFile.It is possible in version 4.0 of the .NET Framework to map files into memory with the MemoryMappedFile class.

MemoryMappedFile

4. Avoid operations.On the FileStream type, the Length property is not cheap.
It accesses the file system.
This can be slow.

FileStream LengthGold

The performance of file handling is an important part of computer programming. Often, optimizing how files are used is the most effective way to make a program faster. We use caches and clever algorithms to do this.

One of the most significant sources of inefficiency is unnecessary input/output (I/O). McConnell, p. 598

We can build small and fast storage, or large and slow storage, but not storage that is both large and fast. Aho et al., p. 454

Summary

Framework: NET

File handling is hard. Even with the helpful types provided in the .NET Framework, it is fraught with errors. We must account for disk errors and invalid data. Testing is essential.

C#