Duplicate WordsRemove duplicate words from a string by using the Dictionary class.
This page was last reviewed on Jun 13, 2021.
Duplicate words. Strings in C# often contain duplicate words. And often these duplicate words are not useful. It is possible to remove them.
Duplicate Chars
Stopword notes. This is similar to the concept of removing stop words—common words that lack meaning. A lookup table like Dictionary can be used in a loop.
Input and output. Consider a string like "yellow bird blue bird." We want our algorithm to figure out that the word "bird" is repeated, and to remove it.
yellow bird blue bird yellow bird blue
Example code. We use a Dictionary for constant-time look up. We will be processing words in a loop, and we need to check each word against all words already encountered.
Note Using 2 Lists would result in higher complexity, potentially making your program slow on large data sets.
Detail This method uses StringBuilder for performance. The Dictionary stores words already encountered.
Detail By passing a new char array to string Split, we can deal with punctuation.
Detail Here var refers to the Dictionary—it is a way to simplify the syntax of the program.
using System; using System.Collections.Generic; using System.Text; class Program { static void Main() { string s = "yellow bird, blue bird, yellow sun"; Console.WriteLine(s); Console.WriteLine(RemoveDuplicateWords(s)); } static public string RemoveDuplicateWords(string v) { // Keep track of words found in this Dictionary. var d = new Dictionary<string, bool>(); // Buildup string into this StringBuilder. StringBuilder b = new StringBuilder(); // Split the input. string[] a = v.Split(new char[] { ' ', ',', ';', '.' }, StringSplitOptions.RemoveEmptyEntries); // Loop over each word. foreach (string current in a) { // Lowercase each word. string lower = current.ToLower(); // If we haven't already encountered the word, append it to the result. if (!d.ContainsKey(lower)) { b.Append(current).Append(' '); d.Add(lower, true); } } // Return a string. return b.ToString().Trim(); } }
yellow bird, blue bird, yellow sun yellow bird blue sun
Stopwords. I used this code, and also a variant that removes stop words, to implement a full-text-search feature in a Windows Forms program. A special full-text search database is useful.
A summary. We combined Dictionary with StringBuilder to develop a method that removes duplicate English words efficiently. The code does lookups on each word as it encounters them.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jun 13, 2021 (image).
© 2007-2023 Sam Allen.