String Whitespace Methods
This page was last reviewed on Jan 11, 2022.
Dot Net Perls
Whitespace. In a String, some characters like the space, newlines, and tabs are considered whitespace. These special chars can cause problems.
For example, multiple whitespace chars together may need to combined (condensed). And Windows and UNIX newlines may need to be normalized (converted).
Remove, condense whitespace. This program changes whitespace in Strings. It removes all whitespace chars. And it can collapse whitespace.
Detail This uses the regular expression "\s" in the replaceAll method. It removes all whitespace chars.
String replace
Detail This method replaces all sequences of one or more whitespace with a space. It normalizes spacing in a string.
public class Program { static String removeAllWhitespace(String value) { // Remove all whitespace characters. return value.replaceAll("\\s", ""); } static String collapseWhitespace(String value) { // Replace all whitespace blocks with single spaces. return value.replaceAll("\\s+", " "); } public static void main(String[] args) { String value = " Hi,\r\n\t\thow are you?"; // Test our methods. String result = removeAllWhitespace(value); System.out.println(result); result = collapseWhitespace(value); System.out.println(result); } }
Hi,howareyou? Hi, how are you?
Convert with toCharArray. This example uses another approach to whitespace. It converts a String to a char array and changes the array's elements.
Result The method changes common whitespace characters all into spaces. This algorithm may perform faster than a regular expression.
Note In some programs, where performance is key, I recommend handling Strings with char-testing algorithms.
But In most situations, I suggest a regular expression—the code is shorter and probably less likely to have bugs.
public class Program { static String convertWhitespaceToSpaces(String value) { // Convert String to a character array. char[] array = value.toCharArray(); for (int i = 0; i < array.length; i++) { // Modify all newlines and tabs to be spaces. switch (array[i]) { case '\r': case '\n': case '\t': array[i] = ' '; break; } } // Return the modified string. return new String(array); } public static void main(String[] args) { String value = "I hope\nyou are\twell!"; // Test the conversion method. System.out.println(convertWhitespaceToSpaces(value)); } }
I hope you are well!
Convert to UNIX newlines. UNIX uses just one character, \n for newlines. But Windows uses two—the \r\n sequence. We can convert Windows newlines to UNIX ones.
Result This program changes the two Windows newline sequences but leaves the UNIX one alone. So the String length is reduced by 2 chars.
public class Program { static String convertToUNIXNewlines(String value) { // Normalize the newlines in the String. return value.replace("\r\n", "\n"); } public static void main(String[] args) { // This string contains both Windows and UNIX newlines. String value = "One two\r\nthree\r\nfour\nfive"; // Replace Windows newlines. String result = convertToUNIXNewlines(value); // Write length before and after. System.out.println(value.length()); System.out.println(result.length()); } }
25 23
Convert to Windows newlines. This method handles the reverse conversion: it converts from UNIX to Windows newlines. This is more complex.
Detail We normalize all newlines to UNIX newlines. In this way we avoid corrupting existing newlines.
Then We change all UNIX newlines to Windows newlines. The final String has the correct number of newline chars.
public class Program { static String convertToUNIXNewlines(String value) { return value.replace("\r\n", "\n"); } static String convertToWindowsNewlines(String value) { // Convert to UNIX lines to normalize all newlines. // ... Then replace with Windows newlines. value = convertToUNIXNewlines(value); return value.replace("\n", "\r\n"); } public static void main(String[] args) { // This string contains 2 UNIX newlines. String value = "Cat\nDog\nFish\r\nBird"; String result = convertToWindowsNewlines(value); // Write lengths. // ... The two UNIX newlines were converted. // ... The Windows newline was ignored. System.out.println(value.length()); System.out.println(result.length()); } }
18 20
Convert file newlines. This program converts a file's newlines to UNIX newlines. It reads the file in as a byte array and converts it to a string.
Warning This program is not perfect. It is slower than ideal and may cause some encoding issues.
Detail The program converts the String back into a byte array and writes it to the same location, replacing the original file.
import java.io.IOException; import java.nio.file.FileSystems; import java.nio.file.Files; import java.nio.file.Path; public class Program { static String convertToUNIXNewlines(String value) { return value.replace("\r\n", "\n"); } public static void main(String[] args) throws IOException { // Get a Path object. Path path = FileSystems.getDefault().getPath("C:\\programs\\file.txt"); // Read all bytes for the file and convert it to a string. byte[] data = Files.readAllBytes(path); String data2 = new String(data); // Fix newlines. data2 = convertToUNIXNewlines(data2); // Write converted bytes. byte[] data3 = data2.getBytes(); Files.write(path, data3); } }
Trim. For leading or trailing spaces, the trim() method is ideal. It does not require any special code. This is like a chomp() or chop() method in other languages.
String trim
In processing text, we remove or combine whitespace. These methods help with that task. Further steps may (for example) remove stopwords.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jan 11, 2022 (edit).
© 2007-2024 Sam Allen.