HomeSearch

Java Whitespace Methods

This Java article provides methods for handling whitespace in strings. It converts between Windows and UNIX newline sequences.
Whitespace. In a String, some characters like the space, newlines, and tabs are considered whitespace. These special chars can cause problems.Newline
For example, multiple whitespace chars together may need to combined (condensed). And Windows and UNIX newlines may need to be normalized (converted).
Remove, condense whitespace. This program changes whitespace in Strings. It removes all whitespace chars. And it can collapse whitespace.

RemoveAllWhitespace: This uses the regular expression "\s" in the replaceAll method. It removes all whitespace chars.

Replace

CollapseWhitespace: This method replaces all sequences of one or more whitespace with a space. It normalizes spacing in a string.

Java program that condenses whitespace public class Program { static String removeAllWhitespace(String value) { // Remove all whitespace characters. return value.replaceAll("\\s", ""); } static String collapseWhitespace(String value) { // Replace all whitespace blocks with single spaces. return value.replaceAll("\\s+", " "); } public static void main(String[] args) { String value = " Hi,\r\n\t\thow are you?"; // Test our methods. String result = removeAllWhitespace(value); System.out.println(result); result = collapseWhitespace(value); System.out.println(result); } } Output Hi,howareyou? Hi, how are you?
Convert with toCharArray. This example uses another approach to whitespace. It converts a String to a char array and changes the array's elements.Switch

Result: The method changes common whitespace characters all into spaces. This algorithm may perform faster than a regular expression.

Note: In some programs, where performance is key, I recommend handling Strings with char-testing algorithms.

But: In most situations, I suggest a regular expression—the code is shorter and probably less likely to have bugs.

Java program that uses toCharArray, converts whitespace public class Program { static String convertWhitespaceToSpaces(String value) { // Convert String to a character array. char[] array = value.toCharArray(); for (int i = 0; i < array.length; i++) { // Modify all newlines and tabs to be spaces. switch (array[i]) { case '\r': case '\n': case '\t': array[i] = ' '; break; } } // Return the modified string. return new String(array); } public static void main(String[] args) { String value = "I hope\nyou are\twell!"; // Test the conversion method. System.out.println(convertWhitespaceToSpaces(value)); } } Output I hope you are well!
Convert to UNIX newlines. UNIX uses just one character, \n for newlines. But Windows uses two—the \r\n sequence. We can convert Windows newlines to UNIX ones.

Result: This program changes the two Windows newline sequences but leaves the UNIX one alone. So the String length is reduced by 2 chars.

Java program that converts newlines to UNIX form public class Program { static String convertToUNIXNewlines(String value) { // Normalize the newlines in the String. return value.replace("\r\n", "\n"); } public static void main(String[] args) { // This string contains both Windows and UNIX newlines. String value = "One two\r\nthree\r\nfour\nfive"; // Replace Windows newlines. String result = convertToUNIXNewlines(value); // Write length before and after. System.out.println(value.length()); System.out.println(result.length()); } } Output 25 23
Convert to Windows newlines. This method handles the reverse conversion: it converts from UNIX to Windows newlines. This is more complex.

First: We normalize all newlines to UNIX newlines. In this way we avoid corrupting existing newlines.

Then: We change all UNIX newlines to Windows newlines. The final String has the correct number of newline chars.

Java program that converts newlines to Windows form public class Program { static String convertToUNIXNewlines(String value) { return value.replace("\r\n", "\n"); } static String convertToWindowsNewlines(String value) { // Convert to UNIX lines to normalize all newlines. // ... Then replace with Windows newlines. value = convertToUNIXNewlines(value); return value.replace("\n", "\r\n"); } public static void main(String[] args) { // This string contains 2 UNIX newlines. String value = "Cat\nDog\nFish\r\nBird"; String result = convertToWindowsNewlines(value); // Write lengths. // ... The two UNIX newlines were converted. // ... The Windows newline was ignored. System.out.println(value.length()); System.out.println(result.length()); } } Output 18 20
Convert file newlines. This program converts a file's newlines to UNIX newlines. It reads the file in as a byte array and converts it to a string.

Caution: This program is not perfect. It is slower than ideal and may cause some encoding issues.

Write: The program converts the String back into a byte array and writes it to the same location, replacing the original file.

Java program that converts newlines in file import java.io.IOException; import java.nio.file.FileSystems; import java.nio.file.Files; import java.nio.file.Path; public class Program { static String convertToUNIXNewlines(String value) { return value.replace("\r\n", "\n"); } public static void main(String[] args) throws IOException { // Get a Path object. Path path = FileSystems.getDefault().getPath("C:\\programs\\file.txt"); // Read all bytes for the file and convert it to a string. byte[] data = Files.readAllBytes(path); String data2 = new String(data); // Fix newlines. data2 = convertToUNIXNewlines(data2); // Write converted bytes. byte[] data3 = data2.getBytes(); Files.write(path, data3); } }
Trim. For leading or trailing spaces, the trim() method is ideal. It does not require any special code. This is like a chomp() or chop() method in other languages.Trim
In processing text, we remove or combine whitespace. These methods help with that task. Further steps may (for example) remove stopwords.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls