Archive for category Extension Methods

C# Linq: Find Missing Values in a Sequence of Numbers and Other Sequence Related lists as IEnumerable Extensions

Water bubble and wavesI recently had a need to determine if a sequence of integers was solid, not broken, and whether it contained a gap of any missing numbers to report to the end user. I researched the issue and was dismayed that the examples found on the internet seemed to have unnecessary overheads of copied arrays or other artifacts and were too cumbersome for my needs. From that I wrote these C# extensions which work for any .Net above 3.5. Below I document the extension methods to get the missing numbers of a sequence, quickly determine if a sequence is broken and finally report where the existing numbers before the break of the sequence. See the end for the whole extension class for easier copying.

Problem Definition

If one has a set of numbers say

{ 2, 4, 7, 9 }

it has a broken sequence because there are missing numbers from the set 2-9. Those missing numbers are:

{ 3, 5, 6, 8 }

Find Missing Numbers In a Sequence

This method is the one I created first. It uses the Linq Aggregate extension to enumerate over the numbers in the set. If there is a gap which is greater than 1 between the numbers (the difference below is > 0 but same concept) then it reports the numbers missing between the two.

public static IEnumerable<int> SequenceFindMissings(this IList<int> sequence)
{

    var missing = new List<int>();

    if ((sequence != null) && (sequence.Any()))
    {
        sequence.Aggregate((seed, aggr) =>
                            {
                                var diff = (aggr - seed) - 1;

                                if (diff > 0)
                                    missing.AddRange(Enumerable.Range((aggr - diff), diff));

                                return aggr;
                            });
    }

    return missing;
}

Quickly Determine Broken Sequence

Is the sequence broken from the first number to the last in the set?

public static bool IsSequenceBroken(this IEnumerable<int> sequence)
{
    bool broken = false;

    if (sequence != null) 
    {
        var sequenceAsList = sequence.ToList();

        if (sequenceAsList.Any())
        {
            int lastValue = sequence.First();

            broken = sequence.Any(value =>
                                    {
                                        if ((value - lastValue) > 1)
                                            return true;

                                        lastValue = value;

                                        return false;
                                    }); 
        }
    }

    return broken;
}

Report Last Valid Number Before The Break

This is useful in situations where one needs to report where the break happens, say the user is editing in a grid and one highlights the existing number which precedes the missing number(s).

Example here returns a 2 and 5 which are the numbers which precede the break.

   (new List() { 1, 2, 4, 5, 7, 8}).SequenceReportMissingsBreakStarts()

Here is the method:

public static IEnumerable<int> SequenceReportMissingsBreakStarts(this IList<int> sequence)
{

    var breaks = new List<int>();

    if ((sequence != null) && (sequence.Any()))
    {

        sequence.Aggregate((seed, aggr) =>
                            {
                                var diff = (aggr - seed) - 1;

                                if (diff > 0)
                                    breaks.Add(seed);
                                return aggr;
                            });
    }

    return breaks;
}

Full Extension Source With Comments

Here is the code for an easier copy

public static class SequenceExtensions
{
    /// <summary>
    /// Take a sequence of numbers and if there are any gaps greater than 1 between the numbers,
    /// report true.
    /// </summary>
    /// <param name="sequence">A set of numbers to check.</param>
    /// <returns>True if the there is a break in the sequence of numbers.</returns>
    public static bool IsSequenceBroken(this IEnumerable<int> sequence)
    {
        bool broken = false;

        if (sequence != null)
        {
            var sequenceAsList = sequence.ToList();

            if (sequenceAsList.Any())
            {
                int lastValue = sequence.First();

                broken = sequence.Any(value =>
                                        {
                                            if ((value - lastValue) > 1)
                                                return true;

                                            lastValue = value;

                                            return false;
                                        });
            }
        }

        return broken;
    }

    /// <summary>
    /// Take a sequence of numbers and report the missing numbers. Stop at first break found.
    /// </summary>
    /// <param name="sequence">Set of Numbers</param>
    /// <returns>True of sequence has missing numbers</returns>
    public static IEnumerable<int> SequenceFindMissings(this IList<int> sequence)
    {

        var missing = new List<int>();

        if ((sequence != null) && (sequence.Any()))
        {
            sequence.Aggregate((seed, aggr) =>
                                {
                                    var diff = (aggr - seed) - 1;

                                    if (diff > 0)
                                        missing.AddRange(Enumerable.Range((aggr - diff), diff));

                                    return aggr;
                                });
        }

        return missing;

    }

    /// <summary>
    /// A missing break start in a sequence is where the drop off occurs in the sequence.
    /// For example 3, 5, has a missing break start of the #3 for #4 is the missing.
    /// </summary>
    /// <param name="sequence">Set of Numbers</param>
    /// <returns>The list of break numbers which exist before the missing numbers.</returns>
    public static IEnumerable<int> SequenceReportMissingsBreakStarts(this IList<int> sequence)
    {

        var breaks = new List<int>();

        if ((sequence != null) && (sequence.Any()))
        {

            sequence.Aggregate((seed, aggr) =>
                                {
                                    var diff = (aggr - seed) - 1;

                                    if (diff > 0)
                                        breaks.Add(seed);
                                    return aggr;
                                });
        }

        return breaks;

    }
}

Hope this helps!

Share

Tags: , , , ,

C#: Combine Two Values Together From a List Into Pairs

Sometimes in C# .Net (see Notes section on usage for before .Net 4)  one might have a list of items and want to make them into pairs. Possibly to take those pairs and place the them into a dictionary. If the items are in a list, that list is linear by nature and using linq is not an option when using the extension ToDictionary.

I have created extension methods to create paired values from any list and those methods are named AsPairs and AsPairsSafe. If the list is odd in length, the final number will be combined with the system default for that type as handled in the method AsPairs. If that is not desired then call AsPairsSafe which will skip the last odd value.

Here is the usage of the AsPairs extension on integers and strings:

var ints = new List<int> { 1, 2, 3, 4, 5 };

IEnumerable<Tuple<int,int>> asTuplePairs = ints.AsPairs();

/* asTuplePairs looks like this(Note value 5 is paired with a default value of 0)
1,2 
3,4
5,0
*/

var strings = new List<string> { "Alpha", "Beta", "Gamma", "Delta", "Omega" };

IEnumerable<Tuple<string,string>> asTuplePairsStrings = strings.AsPairs();

/* asTuplePairsStrings (note Omega is paired a default value of Null)
Alpha, Beta 
Gamma, Delta
Omega, NULL

Whereas the call to AsPairsSafe would return without Omega:
Alpha, Beta 
Gamma, Delta
*/

Here are the extension methods:

public static class MyExtensions
{
   // Create Pairs from a list. If the list is odd add a default value for the final pair. 
   public static IEnumerable<Tuple<T, T>> AsPairs<T>(this List<T> list)
   {
      int index = 0;

      while (index < list.Count())
      {
         if (index + 1 > list.Count())
            yield break;

         if (index + 1 == list.Count())    
            yield return new Tuple<T,T>(list[index++],  default(T));
         else
            yield return new Tuple<T,T>(list[index++],  list[index++]);
      }
   }

   // Create Pairs from a list. Note if the list is not even in count, the last value is skipped.
   public static IEnumerable<Tuple<T, T>> AsPairsSafe<T>(this List<T> list)
   {
      int index = 0;

      while (index < list.Count())
      {
         if (index + 1 >= list.Count())
            yield break;

         yield return new Tuple<T,T>(list[index++],  list[index++]);
      }

   }
}

Notes

Tuple is a .Net 4 item. If you are using a previous version of .Net use the KeyValuePair structure instead.

Share

Tags:

C#: Finding List Duplicates Using the GroupBy Linq Extension and Other GroupBy Operations and Usage Explained

Woman against mirror showing her reflection as she seductively looks outward from picture. Her reflection is like a duplicate found in a list.
When one is dealing with lists such as strings there can be a situation where duplicates can be encountered and one such way of finding identical strings is to use the Linq extension GroupBy.  This article also provides an in depth explanation of that least used and somewhat misunderstood extension. Examples are give with related and non related keys to help one understand the flexibility of the extension.
Note that this code can be used in any .Net version from 3.5 and greater.

Finding Identical Strings using GroupBy

Searching for duplicates in lists can be done in different ways but with the introduction of GroupBy extension in Linq to Object queries one has a powerful tool to find those duplicates. GroupBy-s premise is to group items by a key and since keys in dictionaries are required to be unique, using this method to find duplicate items makes sense.
So let us define our problem in terms of a list of strings and somewhere within that list are duplicates which we want reported. For the sake of simplicity I won’t deal with case sensitivities to keep the example tight. The solution is as below with a line by line explanation of what is going on.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => txt)
        .Where(grouping => grouping.Count() > 1)
        .ToList()
        .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
                                                 groupItem.Key, 
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Writes this out to the console:
//
// Alpha duplicated 2 times with these values Alpha Alpha

Line By Line Explanation

Line 1: Our generic list is defined with duplicate strings “Alpha” while Beta Gamma and Delta are only found once.
Line 3:
Using the extension method GroupBy. This extension is based off of an enumerable item (IEnumerable<TSource>) which is of course our list. The primary argument is and a Lambda function (Func<TSource, TKey>) where we will simply define our TSource as the input (our string of the list) and its lambda operation as the key for our grouping.
The key in our case for this scenario is our string which we want to find the duplicate in the list. If we were dealing with a complex object then the key might be a property or field off of the object to use as the key in other scenarios but it is not. So our key will be the actual item found within our list. Since GroupBy’s result behaves like a dictionary, each key must be unique and that is the crux of how we can use GroupBy to divine all identical strings.
Line 3: Before moving on we must understand what GroupBy will return. By definition it returns IEnumerable<IGrouping<TKey, TSource>>. This can be broken down as such:

  • IEnumerable simply means that it will return a list or multiple of items which will be of IGrouping<> type.
  • IGrouping is a tuple type object where it contains the key of the grouped item and its corresponding value.  The nuance of this item is that when it is accessed directly it simply returns the TSource item (the non key part, just its value).
If one is familiar with the Dictionary class then one has worked with the KeyValuePair and this is the same except for the direct access of the value as mentioned above is not found in KeyValuePair.
Line 4: With GroupBy returning individual lists of key value pairs of IGrouping objects we need to weed out the single item keys and return only ones of two of more and the Where does this job for us. By specifying a lambda to weed out the lists which only have 1 found key, that gives us the duplicates sought.
Line 5:
Change from IEnumerable returned from Where extension and get an actual List object. This is done for two reasons.
The first is that the ForEach is an extension method specific to a list and not a raw IEnumerable.
Secondly and possibly more importantly, the GroupBy extension is a differed execution operation meaning that the data is not generated until it is actually needed. By going to ToList we are executing the operation and getting the data back immediately. This can come into play if the original data may change. If the data can change its best to get the data upfront from such differed execution methods.
Line 6: The goal is to display all the duplicates found and demonstrate how an IGrouping object is used. In our example we only have one IGrouping result list but if the data were changed we could have multiple. The following will display the information about our current IGrouping list returned.
Line 7: By accessing the named Key property of a IGrouping object we get an individual unique key result which defines the list of grouping data found. Note just because we have grouped our data by a key which is the same as the data, doesn’t mean in another use of Groupby that the data will be the same. In our example the key is “Alpha” which we send to {0} of the string.Format.
Line 8: The count property shows us how many values are found in the grouping. Our example returns two.
Line 9: We will enumerate the values of this current grouping and list out the data values. In this case there are two values both being “Alpha”.

GroupBy Usage with Only Two Defined Keys Regex Example

Now that one understands the GroupBy, one must not think that multiple unique keys are the be all end all to its usage. Sometimes we may want group things into found and not found groupings. The following example takes our greek letter list above and finds all the  words ending in “ta”.
Here is how it is done:
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy( txt => Regex.IsMatch( txt, "ta$" ))
       .ToList()
       .ForEach(groupItem => Console.WriteLine("{0} found {1} times with these values: {2}",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Output
// False found 3 times with these values: Alpha Alpha Gamma
// True found 2 times with these values: Beta Delta
Using our old friend Regex we are going to check to see if the current string ends in ta. If it does it will be in the key grouping of True and if not it will be found in the False grouping by the result of IsMatch. The result shows how we have manipulated the groupings to divine that Beta and Delta are the only two in our list which match the criteria. Hence demonstrating how we can further use the GroupBy method.

GroupBy Usage with one Key or a Non Related Key

I have actually had a need to where I grouped all items in to one key and performed an aggregate method on the result. The tip here is to show that one doesn’t have to group items by related keys. In the following example we through everything into group 1. We could have called the group anything frankly and sometimes it is needed.
This final example shows how the GroupBy can be flexible.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => 1 )
        .ToList()
        .ForEach(groupItem => Console.WriteLine("The Key ({0}) found a total of {1} times with a total letter count of {2} characters",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 groupItem.Sum(it => it.Count())));

// Output:
// The Key (1) found a total of 5 times with a total letter count of 24 characters
Share

Tags: , ,