Distinct Method, Get Unique Elements Only
This page was last reviewed on Dec 21, 2021.
Dot Net Perls
Distinct. This removes all duplicate elements in a collection. It returns only distinct (or unique) elements. The System.Linq namespace provides this extension method.
Shows a method
Distinct returns an IEnumerable collection. We can loop over the collection returned by Distinct, or invoke other extension methods upon it.
An example. We declare and allocate an array on the managed heap. The array contains 6 elements, but only 4 different numbers. Two are repeated—this fact is key to the program's output.
int Array
Next We apply the Distinct extension method to the array reference, and then assign the result to an implicitly typed local variable.
Finally We loop over the result and display the distinct elements in the processed array.
Shows a method
using System; using System.Linq; // Declare an array with some duplicated elements in it. int[] array1 = { 1, 2, 2, 3, 4, 4 }; // Invoke Distinct extension method. var result = array1.Distinct(); // Display results. foreach (int value in result) { Console.WriteLine(value); }
1 2 3 4
IEqualityComparer. We can specify an IEqualityComparer to compare elements in the Distinct call. This is probably not useful in many programs.
Note We can "transform" elements in an IEqualityComparer. Here we treat each int as its parity (whether it is even or odd).
Odd, Even
using System; using System.Linq; using System.Collections.Generic; class EqualityParity : IEqualityComparer<int> { public bool Equals(int x, int y) { // Consider all even numbers the same, and all odd the same. return (x % 2) == (y % 2); } public int GetHashCode(int obj) { return (obj % 2).GetHashCode(); } } class Program { static void Main() { int[] array1 = { 9, 11, 13, 15, 2, 4, 6, 8 }; // This will remove all except the first event and odd. var distinctResult = array1.Distinct(new EqualityParity()); // Display results. foreach (var result in distinctResult) { Console.WriteLine(result); } } }
9 2
Benchmark duplicate methods. Usually a simple loop can be written to remove duplicates. A nested for-loop can execute much faster than the Distinct method on an int array.
Version 1 We use the Distinct method. Note how the code is short and easy to read. This is a benefit.
Version 2 A nested loop scans following elements for a duplicate. An element is added only if no following elements are the same.
Result On a short int array, the nested loops are faster. But this will depend on the data given to the methods.
using System; using System.Linq; using System.Collections.Generic; using System.Diagnostics; class Program { static IEnumerable<int> Test1(int[] array) { // Use distinct to check for duplicates. return array.Distinct(); } static IEnumerable<int> Test2(int[] array) { // Use nested loop to check for duplicates. List<int> result = new List<int>(); for (int i = 0; i < array.Length; i++) { // Check for duplicates in all following elements. bool isDuplicate = false; for (int y = i + 1; y < array.Length; y++) { if (array[i] == array[y]) { isDuplicate = true; break; } } if (!isDuplicate) { result.Add(array[i]); } } return result; } static void Main() { int[] array1 = { 1, 2, 2, 3, 4, 4 }; const int _max = 1000000; var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 1: benchmark distinct. var result = Test1(array1); if (result.Count() != 4) { break; } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 2: benchmark nested loop. var result = Test2(array1); if (result.Count() != 4) { break; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } }
185.44 ns Distinct method 51.11 ns Nested for-loops
Discussion. The Distinct method is not ideal for all purposes. Internally, the Distinct method is implemented in terms of iterators that are automatically generated by the C# compiler.
Detail Heap allocations occur when you invoke Distinct. For optimum performance, you could use loops on small collections.
And With small data sets, the overhead of using iterators and allocations likely overshadows any asymptotic advantage.
A summary. We used the Distinct extension method from System.Linq. This method provides a declarative, function-oriented syntax for a typically imperative processing task.
The Distinct extension incurs practical performance drawbacks in some programs. For performance, a for-loop is probably better (but may be harder to maintain).
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Dec 21, 2021 (edit link).
© 2007-2024 Sam Allen.