Posts tagged ‘GroupBy’

C#: Finding List Duplicates Using the GroupBy Linq Extension and Other GroupBy Operations and Usage Explained

Woman against mirror showing her reflection as she seductively looks outward from picture. Her reflection is like a duplicate found in a list.
When one is dealing with lists such as strings there can be a situation where duplicates can be encountered and one such way of finding identical strings is to use the Linq extension GroupBy.  This article also provides an in depth explanation of that least used and somewhat misunderstood extension. Examples are give with related and non related keys to help one understand the flexibility of the extension.
Note that this code can be used in any .Net version from 3.5 and greater.

Finding Identical Strings using GroupBy

Searching for duplicates in lists can be done in different ways but with the introduction of GroupBy extension in Linq to Object queries one has a powerful tool to find those duplicates. GroupBy-s premise is to group items by a key and since keys in dictionaries are required to be unique, using this method to find duplicate items makes sense.
So let us define our problem in terms of a list of strings and somewhere within that list are duplicates which we want reported. For the sake of simplicity I won’t deal with case sensitivities to keep the example tight. The solution is as below with a line by line explanation of what is going on.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => txt)
        .Where(grouping => grouping.Count() > 1)
        .ToList()
        .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
                                                 groupItem.Key, 
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Writes this out to the console:
//
// Alpha duplicated 2 times with these values Alpha Alpha

Line By Line Explanation

Line 1: Our generic list is defined with duplicate strings “Alpha” while Beta Gamma and Delta are only found once.
Line 3:
Using the extension method GroupBy. This extension is based off of an enumerable item (IEnumerable<TSource>) which is of course our list. The primary argument is and a Lambda function (Func<TSource, TKey>) where we will simply define our TSource as the input (our string of the list) and its lambda operation as the key for our grouping.
The key in our case for this scenario is our string which we want to find the duplicate in the list. If we were dealing with a complex object then the key might be a property or field off of the object to use as the key in other scenarios but it is not. So our key will be the actual item found within our list. Since GroupBy’s result behaves like a dictionary, each key must be unique and that is the crux of how we can use GroupBy to divine all identical strings.
Line 3: Before moving on we must understand what GroupBy will return. By definition it returns IEnumerable<IGrouping<TKey, TSource>>. This can be broken down as such:

  • IEnumerable simply means that it will return a list or multiple of items which will be of IGrouping<> type.
  • IGrouping is a tuple type object where it contains the key of the grouped item and its corresponding value.  The nuance of this item is that when it is accessed directly it simply returns the TSource item (the non key part, just its value).
If one is familiar with the Dictionary class then one has worked with the KeyValuePair and this is the same except for the direct access of the value as mentioned above is not found in KeyValuePair.
Line 4: With GroupBy returning individual lists of key value pairs of IGrouping objects we need to weed out the single item keys and return only ones of two of more and the Where does this job for us. By specifying a lambda to weed out the lists which only have 1 found key, that gives us the duplicates sought.
Line 5:
Change from IEnumerable returned from Where extension and get an actual List object. This is done for two reasons.
The first is that the ForEach is an extension method specific to a list and not a raw IEnumerable.
Secondly and possibly more importantly, the GroupBy extension is a differed execution operation meaning that the data is not generated until it is actually needed. By going to ToList we are executing the operation and getting the data back immediately. This can come into play if the original data may change. If the data can change its best to get the data upfront from such differed execution methods.
Line 6: The goal is to display all the duplicates found and demonstrate how an IGrouping object is used. In our example we only have one IGrouping result list but if the data were changed we could have multiple. The following will display the information about our current IGrouping list returned.
Line 7: By accessing the named Key property of a IGrouping object we get an individual unique key result which defines the list of grouping data found. Note just because we have grouped our data by a key which is the same as the data, doesn’t mean in another use of Groupby that the data will be the same. In our example the key is “Alpha” which we send to {0} of the string.Format.
Line 8: The count property shows us how many values are found in the grouping. Our example returns two.
Line 9: We will enumerate the values of this current grouping and list out the data values. In this case there are two values both being “Alpha”.

GroupBy Usage with Only Two Defined Keys Regex Example

Now that one understands the GroupBy, one must not think that multiple unique keys are the be all end all to its usage. Sometimes we may want group things into found and not found groupings. The following example takes our greek letter list above and finds all the  words ending in “ta”.
Here is how it is done:
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy( txt => Regex.IsMatch( txt, "ta$" ))
       .ToList()
       .ForEach(groupItem => Console.WriteLine("{0} found {1} times with these values: {2}",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Output
// False found 3 times with these values: Alpha Alpha Gamma
// True found 2 times with these values: Beta Delta
Using our old friend Regex we are going to check to see if the current string ends in ta. If it does it will be in the key grouping of True and if not it will be found in the False grouping by the result of IsMatch. The result shows how we have manipulated the groupings to divine that Beta and Delta are the only two in our list which match the criteria. Hence demonstrating how we can further use the GroupBy method.

GroupBy Usage with one Key or a Non Related Key

I have actually had a need to where I grouped all items in to one key and performed an aggregate method on the result. The tip here is to show that one doesn’t have to group items by related keys. In the following example we through everything into group 1. We could have called the group anything frankly and sometimes it is needed.
This final example shows how the GroupBy can be flexible.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => 1 )
        .ToList()
        .ForEach(groupItem => Console.WriteLine("The Key ({0}) found a total of {1} times with a total letter count of {2} characters",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 groupItem.Sum(it => it.Count())));

// Output:
// The Key (1) found a total of 5 times with a total letter count of 24 characters
Share