C#: Finding List Duplicates Using the GroupBy Linq Extension and Other GroupBy Operations and Usage Explained
Finding Identical Strings using GroupBy
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" }; theList.GroupBy(txt => txt) .Where(grouping => grouping.Count() > 1) .ToList() .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}", groupItem.Key, groupItem.Count(), string.Join(" ", groupItem.ToArray()))); // Writes this out to the console: // // Alpha duplicated 2 times with these values Alpha Alpha
Line By Line Explanation
Line 1: | Our generic list is defined with duplicate strings “Alpha” while Beta Gamma and Delta are only found once. |
Line 3: |
Using the extension method GroupBy. This extension is based off of an enumerable item (IEnumerable<TSource>) which is of course our list. The primary argument is and a Lambda function (Func<TSource, TKey>) where we will simply define our TSource as the input (our string of the list) and its lambda operation as the key for our grouping.
The key in our case for this scenario is our string which we want to find the duplicate in the list. If we were dealing with a complex object then the key might be a property or field off of the object to use as the key in other scenarios but it is not. So our key will be the actual item found within our list. Since GroupBy’s result behaves like a dictionary, each key must be unique and that is the crux of how we can use GroupBy to divine all identical strings.
|
Line 3: | Before moving on we must understand what GroupBy will return. By definition it returns IEnumerable<IGrouping<TKey, TSource>>. This can be broken down as such:
If one is familiar with the Dictionary class then one has worked with the KeyValuePair and this is the same except for the direct access of the value as mentioned above is not found in KeyValuePair.
|
Line 4: | With GroupBy returning individual lists of key value pairs of IGrouping objects we need to weed out the single item keys and return only ones of two of more and the Where does this job for us. By specifying a lambda to weed out the lists which only have 1 found key, that gives us the duplicates sought. |
Line 5: |
Change from IEnumerable returned from Where extension and get an actual List object. This is done for two reasons.
The first is that the ForEach is an extension method specific to a list and not a raw IEnumerable.
Secondly and possibly more importantly, the GroupBy extension is a differed execution operation meaning that the data is not generated until it is actually needed. By going to ToList we are executing the operation and getting the data back immediately. This can come into play if the original data may change. If the data can change its best to get the data upfront from such differed execution methods.
|
Line 6: | The goal is to display all the duplicates found and demonstrate how an IGrouping object is used. In our example we only have one IGrouping result list but if the data were changed we could have multiple. The following will display the information about our current IGrouping list returned. |
Line 7: | By accessing the named Key property of a IGrouping object we get an individual unique key result which defines the list of grouping data found. Note just because we have grouped our data by a key which is the same as the data, doesn’t mean in another use of Groupby that the data will be the same. In our example the key is “Alpha” which we send to {0} of the string.Format. |
Line 8: | The count property shows us how many values are found in the grouping. Our example returns two. |
Line 9: | We will enumerate the values of this current grouping and list out the data values. In this case there are two values both being “Alpha”. |
GroupBy Usage with Only Two Defined Keys Regex Example
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" }; theList.GroupBy( txt => Regex.IsMatch( txt, "ta$" )) .ToList() .ForEach(groupItem => Console.WriteLine("{0} found {1} times with these values: {2}", groupItem.Key, groupItem.Count(), string.Join(" ", groupItem.ToArray()))); // Output // False found 3 times with these values: Alpha Alpha Gamma // True found 2 times with these values: Beta Delta
GroupBy Usage with one Key or a Non Related Key
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" }; theList.GroupBy(txt => 1 ) .ToList() .ForEach(groupItem => Console.WriteLine("The Key ({0}) found a total of {1} times with a total letter count of {2} characters", groupItem.Key, groupItem.Count(), groupItem.Sum(it => it.Count()))); // Output: // The Key (1) found a total of 5 times with a total letter count of 24 characters