Posts Tagged Extension Methods

C#: Finding List Duplicates Using the GroupBy Linq Extension and Other GroupBy Operations and Usage Explained

Woman against mirror showing her reflection as she seductively looks outward from picture. Her reflection is like a duplicate found in a list.
When one is dealing with lists such as strings there can be a situation where duplicates can be encountered and one such way of finding identical strings is to use the Linq extension GroupBy.  This article also provides an in depth explanation of that least used and somewhat misunderstood extension. Examples are give with related and non related keys to help one understand the flexibility of the extension.
Note that this code can be used in any .Net version from 3.5 and greater.

Finding Identical Strings using GroupBy

Searching for duplicates in lists can be done in different ways but with the introduction of GroupBy extension in Linq to Object queries one has a powerful tool to find those duplicates. GroupBy-s premise is to group items by a key and since keys in dictionaries are required to be unique, using this method to find duplicate items makes sense.
So let us define our problem in terms of a list of strings and somewhere within that list are duplicates which we want reported. For the sake of simplicity I won’t deal with case sensitivities to keep the example tight. The solution is as below with a line by line explanation of what is going on.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => txt)
        .Where(grouping => grouping.Count() > 1)
        .ToList()
        .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
                                                 groupItem.Key, 
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Writes this out to the console:
//
// Alpha duplicated 2 times with these values Alpha Alpha

Line By Line Explanation

Line 1: Our generic list is defined with duplicate strings “Alpha” while Beta Gamma and Delta are only found once.
Line 3:
Using the extension method GroupBy. This extension is based off of an enumerable item (IEnumerable<TSource>) which is of course our list. The primary argument is and a Lambda function (Func<TSource, TKey>) where we will simply define our TSource as the input (our string of the list) and its lambda operation as the key for our grouping.
The key in our case for this scenario is our string which we want to find the duplicate in the list. If we were dealing with a complex object then the key might be a property or field off of the object to use as the key in other scenarios but it is not. So our key will be the actual item found within our list. Since GroupBy’s result behaves like a dictionary, each key must be unique and that is the crux of how we can use GroupBy to divine all identical strings.
Line 3: Before moving on we must understand what GroupBy will return. By definition it returns IEnumerable<IGrouping<TKey, TSource>>. This can be broken down as such:

  • IEnumerable simply means that it will return a list or multiple of items which will be of IGrouping<> type.
  • IGrouping is a tuple type object where it contains the key of the grouped item and its corresponding value.  The nuance of this item is that when it is accessed directly it simply returns the TSource item (the non key part, just its value).
If one is familiar with the Dictionary class then one has worked with the KeyValuePair and this is the same except for the direct access of the value as mentioned above is not found in KeyValuePair.
Line 4: With GroupBy returning individual lists of key value pairs of IGrouping objects we need to weed out the single item keys and return only ones of two of more and the Where does this job for us. By specifying a lambda to weed out the lists which only have 1 found key, that gives us the duplicates sought.
Line 5:
Change from IEnumerable returned from Where extension and get an actual List object. This is done for two reasons.
The first is that the ForEach is an extension method specific to a list and not a raw IEnumerable.
Secondly and possibly more importantly, the GroupBy extension is a differed execution operation meaning that the data is not generated until it is actually needed. By going to ToList we are executing the operation and getting the data back immediately. This can come into play if the original data may change. If the data can change its best to get the data upfront from such differed execution methods.
Line 6: The goal is to display all the duplicates found and demonstrate how an IGrouping object is used. In our example we only have one IGrouping result list but if the data were changed we could have multiple. The following will display the information about our current IGrouping list returned.
Line 7: By accessing the named Key property of a IGrouping object we get an individual unique key result which defines the list of grouping data found. Note just because we have grouped our data by a key which is the same as the data, doesn’t mean in another use of Groupby that the data will be the same. In our example the key is “Alpha” which we send to {0} of the string.Format.
Line 8: The count property shows us how many values are found in the grouping. Our example returns two.
Line 9: We will enumerate the values of this current grouping and list out the data values. In this case there are two values both being “Alpha”.

GroupBy Usage with Only Two Defined Keys Regex Example

Now that one understands the GroupBy, one must not think that multiple unique keys are the be all end all to its usage. Sometimes we may want group things into found and not found groupings. The following example takes our greek letter list above and finds all the  words ending in “ta”.
Here is how it is done:
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy( txt => Regex.IsMatch( txt, "ta$" ))
       .ToList()
       .ForEach(groupItem => Console.WriteLine("{0} found {1} times with these values: {2}",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Output
// False found 3 times with these values: Alpha Alpha Gamma
// True found 2 times with these values: Beta Delta
Using our old friend Regex we are going to check to see if the current string ends in ta. If it does it will be in the key grouping of True and if not it will be found in the False grouping by the result of IsMatch. The result shows how we have manipulated the groupings to divine that Beta and Delta are the only two in our list which match the criteria. Hence demonstrating how we can further use the GroupBy method.

GroupBy Usage with one Key or a Non Related Key

I have actually had a need to where I grouped all items in to one key and performed an aggregate method on the result. The tip here is to show that one doesn’t have to group items by related keys. In the following example we through everything into group 1. We could have called the group anything frankly and sometimes it is needed.
This final example shows how the GroupBy can be flexible.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => 1 )
        .ToList()
        .ForEach(groupItem => Console.WriteLine("The Key ({0}) found a total of {1} times with a total letter count of {2} characters",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 groupItem.Sum(it => it.Count())));

// Output:
// The Key (1) found a total of 5 times with a total letter count of 24 characters
Share

Tags: , ,

C#: String Extension SplitOn to Split Text into Specific Sizes

Update 7/11/2016 Updated Code To use Linq

This code string extension which will take a specific max line size and fit only words into those lines.

For example if one split on this sentence: “The Fall River” would make three strings of “The “, “Fall “ and “River”.

I have placed this into a string extension which could be called such as

List<string> items = "The Fall River".SplitOn( 5 );

Here is the extension method

// using System.Text.RegularExpressions;
public static class StringExtensions
{

    /// <summary>Use this function like string.Split but instead of a character to split on, 
    /// use a maximum line width size. This is similar to a Word Wrap where no words will be split.</summary>
    /// Note if the a word is longer than the maxcharactes it will be trimmed from the start.
    /// <param name="initial">The string to parse.</param>
    /// <param name="MaxCharacters">The maximum size.</param>
    /// <remarks>This function will remove some white space at the end of a line, but allow for a blank line.</remarks>
    /// 
    /// <returns>An array of strings.</returns>
    public static List<string> SplitOn( this string initial, int MaxCharacters )
    {

        List<string> lines = new List<string>();

        if ( string.IsNullOrEmpty( initial ) == false )
        {
            string targetGroup = "Line";
            string pattern = string.Format( @"(?<{0}>.{{1,{1}}})(?:\W|$)", targetGroup, MaxCharacters );

            lines = Regex.Matches(initial, pattern, RegexOptions.Multiline | RegexOptions.CultureInvariant )
                         .OfType<Match>()
                         .Select(mt => mt.Groups[targetGroup].Value)
                         .ToList();          
        }
        return lines;
    }
}

Here is an example with output of its usage:

string text = "The rain in spain falls mainly on the plain of Jabberwocky falls.";

List<string> lines = text.SplitOn( 20 );

foreach ( string line in lines )
    Console.WriteLine( line );

/*
The rain in spain
falls mainly on the
plain of Jabberwocky
falls.
 */

foreach ( string line in text.SplitOn( 11 ) )
    Console.WriteLine( line );

/*
The rain in
spain falls
mainly on
the plain
of
Jabberwocky
falls.
 */
Share

Tags:

C#: Does the String Have Content – My Newest Favorite Extension Method

Everyone knows that one should do two things before using a string in C#. One is to check to see if it is null and the other is to check to see if it actually has content, not just blank. The primary way of doing that is to use the static method on the stringIsNullOrEmpty such as

string txt;

if (string.IsNullOrEmpty( txt ) == false)
{
   // Valid string do something
}

I have religiously been using that string checking paradigm (see my 2007 rants on it here (Is String.IsNullOrEmpty Hazardous to your Code’s health in .Net 2.0/3.0?)) for awhile now.

Issues

But I had two issues with using string.IsNullOrEmpty:

  1. The negative result of the method call (false) is telling me that the string is OK to use. Every so often I would call it without checking for that false condition. Grrr! I only want to do actions on it if it valid, not invalid.
  2. My way of working/thinking is that I type in the string name first and then think of testing it. Visual Studio’s Intellisense doesn’t help me, for I have already typed in the variable. I have to go back and add string.IsNullOrEmpty.

Solution

With the advent of extension methods I can solve both issues by making an extension method of what I want. By giving it a name which suggests a positive result means that the string is viable, I solve #1. By making it an extension method, I solve #2 above because it now shows up in Intellisense! Here is the code written as an extension

/// <summary>
/// Does the string contains text? Same as (IsNullOrEmpty() == false).
/// </summary>
/// <param name="txt">Text to look at.</param>
/// <returns>True if valid and contains text.</returns>
public static bool ContainsText( this string txt )
{
    return string.IsNullOrEmpty( txt ) == false;
}

Because I name it with a C, it shows up in the first set of selections in intellisense. That is a bonus.

Note this works even if the string is null! Give it a try!

Share

Tags: , ,

INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll

BetweenStonesWhat if you are stuck having to deal with older technology such as INI files while using the latest and greatest C# and .Net there is available? This article discusses an alternate way to read INI files and extract the data from those dusty tomes while  easily accessing the resulting data from dictionaries. Once the data resides in the dictionaries we can easily extract the data using the power of the indexer on section name followed by key name within the section. Such as IniFile[“TargetSection”][“TargetKey”] which will return a string of the value of that key in the ini file for that section.

Note all the code is one easy code section at the bottom of the article so don’t feel you have to copy each sections code.

Overview

If you are reading this, chances are you know what INI files are and don’t need a refresher. You may have looked into using the Win32 Kern32.dll method GetPrivateProfileSection to achieve your goals. Ack!  “Set the Wayback machine Sherman!” Thanks but no thanks.

Here is how to do this operation using Regular Expressions (Kinda a way back machine but very useful) and Linq to Object to get the values into a dictionary format so we can write this line of code to access the data within the INI file:

string myValue = IniFile[“SectionName”][“KeyName”];

The Pattern

Let me explain the Regex Pattern. If you are not so inclined to understand the semantics of it skip to the next section.

string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
 (?<Section>[^\]]*)         # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
   (?!\[)                    # Stop capture groups if a [ is found; new section
   (?<Key>[^=]*?)            # Any text before the =, matched few as possible
   (?:=)                     # Get the = now
   (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
   (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Our goal is to use Named Match groups. Each match will have its section name in the named group called  “Section”  and all of the data, which is the key and value pairs will be named “Key” and “Value” respectively.  The trick to the above pattern is found in line eight. That stops the match when a new section is hit using the Match Invalidator (?!). Otherwise our key/values would bleed into the next section if not stopped.

The Data

Here is the data for your perusal.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";

We are interested in “Window Name” and “Directory”.

The Linq

Ok, if you thought the regex pattern was complicated, the Linq to Objects has some tricks up its sleeve as well. Primarily since our pattern matches create a single match per section with the accompany key and value data in two separate named match capture collections, that presents a problem. We need to join the the capture collections together, but there is no direct way to do that for the join in Linq because that link is only an indirect by the collections index number.

How do we get the two collections to be joined?

Here is the code:

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
 select new
 {
  Section = m.Groups["Section"].Value,

  kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
     join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
     select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

  } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Explanation:

  • Line 1: Our end goal object is a Dictionary where the key is the Section name and the value is a sub-dictionary with all the keys and values found in that section.
  • Line 2: The regex needs IPW because we have commented the pattern. It needs multiline because we are spanning multiple lines and need ^ to match each individual line and not just the beginning.
  • Line 5: This is the easiest item, simply access the named capture group “Section” for the section name.
  • Line 7 (.Captures) : Each one of the keys and values are in the specialized capture collection property off of the match.
  • Line 7 (.Cast<Capture>) : Since capture is specialized list and not a true generic list, such as List<string> we are going to Cast it(Cast<(Of <(TResult>) it (to IEnumerable<(Of <(T>)>),so we can access the standard query operators, i.e. the extension methods which are available to IEnumerable<T>. Short answer, so we can call .Select.
  • Line 7 (.Select): Because each list does not have a direct way to associate the data, we are going to create a new object that has a property which will have that index number, along with the target data value. That will allow us join it to the other list.
  • Line 7 (Lambda) : The lambda has two parameters, the first is our actual regex Capture object represented by a. The i is the index value which we need for the join. We then call new and create a new entity with two properties, the first is actual value of the Key found of the Capture class property “Value” and the second is i the index value.
  • Line 8 (Join) : We are going to join the data together using the direct properties of our new entity, but first we need to recreate the magic found in Line 7 for our Values capture collection. It is the same logic as the previous line so I will not delve into its explanation in detail.
  • Line 8 (on cpKey.i equals cpValue.i) : This is our association for the join on the new entities and yay, where index value i equals the other index value i allows us to do that. This is the keystone of all we are doing.
  • Line 9 (new KeyValuePair) : Ok we are now creating each individual linq projection item of the data as a KeyValuePair object. This could be removed for a new entity, but I choose to use the KeyValuePair class.
  • Line 9 (ToDictionary) : We want to easily access these key value pairs in the future, so we are going to place the Key into a Key of a dictionary and the dictionary key’s value from the actual Value.
  • Line 11 (ToDictionary) : Here is where we take the projection of the previous lines of code and create the end goal dictionary where the key name is the section and the value is the sub dictionary created in Line 9.

Whew…what is the result?

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs

Summary

Thanks to the power of regular expressions and Linq we don’t have to use the old methods to extract and process the data. We can easily access the information using the newer structures. Hope this helps and that you may have learned something new from something old.

Code All in One Place

Here is all the code so you don’t have to copy it from each section above. Don’t forget to include the using System.Text.RegularExpressions to do it all.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";
string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
     (?<Section>[^\]]*)     # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
  (?!\[)                    # Stop capture groups if a [ is found; new section
  (?<Key>[^=]*?)            # Any text before the =, matched few as possible
  (?:=)                     # Get the = now
  (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
  (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
    select new
    {
        Section = m.Groups["Section"].Value,

        kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                 join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

    } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs
Share

Tags: , ,

C# 101: Extension Method Creation For Variables and Enums

Class101 With the advent of C# 3.0 in .Net 3.5 one can create extension methods for common tasks for any existing types and enums which one finds in their code. This is a quick how-to article which provides those two examples using an integer and a user defined enum.

Basic Extension Example

Sometimes one needs to know when an integer value is negative. Here is an extension method to do that and its usage example.

public static class MyExtensions
{
    public static bool IsNegative( this int value )
    {
        return (value < 0) ? true : false;
    }
}
  1. Step one create a static class.
  2. Create a static method return type can vary to suit the target operations needs.
  3. Use the this keyword in the input parameters of the type which the extension will be used by.

In the above case we want integers to report if they are negative or not. Here is the usage:

int value = 1;

Console.WriteLine( value + " is " + value.IsNegative() );
value = --value;

Console.WriteLine( value + " is " + value.IsNegative() );

value = --value;
Console.WriteLine( value + " is " + value.IsNegative() );

/* Output
1 is False
0 is False
-1 is True
*/

Enum Example

Enums can have extensions in C# as well and are very functional.

This example is similar to the previous but we will deal with software development states. The first two states are test related and we will create an extension to determine if the state is in test or not.

public enum States { Test, UserAcceptance, Development, Production, Maintainence };

public static class MyExtensions
{
    public static bool IsInTest( this States state )
    {
        return state < States.LastTestState;
    }

}

So every state before Development is considered a “Testing” state. Here is its usage:

States current = States.Test;
Console.WriteLine("State: " + current + " is " + current.IsInTest());

current = States.Development;
Console.WriteLine( "State: " + current + " is " + current.IsInTest() );

/* Output:
State: Test is True
State: Development is False
*/

HTH

HTH

Share

Tags: , , ,

C#: Splitting Data From a String and Extracting out Decimals and Integers into Separate Lists Using Extension Methods

Sometimes you have the need to extract text out of a string, once extracted you need to get the right data into lists. Say one list of integers and one list of decimals. Using the extension methods now found in .Net one can easily do those operations.

Example

Here is the data in a string, “12 34 56.1 18 19.25 41”. We want the integers separated out from the decimals into their own lists respectively as their native values (int/decimal) and not strings. Using Where, Select and a couple of Lambdas we can get to the intended  result.

string data =@"12 34 56.1 18 19.25 41";

string[] dataAsStrings = data.Split( ' ' );

List<decimal> decs = ( dataAsStrings.Where( itm => itm.Contains( '.' ) ) )
                  .Select( itm => Decimal.Parse( itm ) )
                  .ToList();

List<int> ints = ( dataAsStrings.Where( itm => itm.Contains( '.' ) == false ) ).Select( itm => int.Parse( itm ) ).ToList();

foreach ( decimal dc in decs )
    Console.WriteLine( dc );
/* Outputs 56.1 19.25 (on seperate lines)*/

foreach ( int it in ints)
    Console.WriteLine( it );
/* Outputs 12 34 18 41 (on seperate lines)*/
  • Step one is top split the string using string.Split.
  • Step 2 is take that split data and enumerate over it using the Where method.
  • Within the Where we use a lambda which basically says for each item in the list, if it contains a period select it and return it. Note the result of most Extension methods is deemed a projection which is really a list of the values. Where, Select and ToList all return projections.
  • Once the Where is done we call/chain the Select Extension. The select is saying take the projection from the Where and on each item parse out the value and return it in its native, non string format.
  • Using the ToList on the projection created by the Select we take the items and specify it to be in a List<> format.

We didn’t have to do the ToList and could have said something like this:

var decs = ( dataAsStrings.Where( itm => itm.Contains( '.' ) ) )
        .Select( itm => Decimal.Parse( itm ) );
foreach ( decimal dc in decs )
    Console.WriteLine( dc );
/* Outputs 56.1 19.25 (on seperate lines)*/

Which would have been just as vaild. At runtime the var would have been internally known as IEnumerable<decimal> under the covers and as you can see that is just a different type of list and works just the same.

Share

Tags: , , , , , ,