Posts Tagged Linq-to-Object

C#: Finding List Duplicates Using the GroupBy Linq Extension and Other GroupBy Operations and Usage Explained

Woman against mirror showing her reflection as she seductively looks outward from picture. Her reflection is like a duplicate found in a list.
When one is dealing with lists such as strings there can be a situation where duplicates can be encountered and one such way of finding identical strings is to use the Linq extension GroupBy.  This article also provides an in depth explanation of that least used and somewhat misunderstood extension. Examples are give with related and non related keys to help one understand the flexibility of the extension.
Note that this code can be used in any .Net version from 3.5 and greater.

Finding Identical Strings using GroupBy

Searching for duplicates in lists can be done in different ways but with the introduction of GroupBy extension in Linq to Object queries one has a powerful tool to find those duplicates. GroupBy-s premise is to group items by a key and since keys in dictionaries are required to be unique, using this method to find duplicate items makes sense.
So let us define our problem in terms of a list of strings and somewhere within that list are duplicates which we want reported. For the sake of simplicity I won’t deal with case sensitivities to keep the example tight. The solution is as below with a line by line explanation of what is going on.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => txt)
        .Where(grouping => grouping.Count() > 1)
        .ToList()
        .ForEach(groupItem => Console.WriteLine("{0} duplicated {1} times with these values {2}",
                                                 groupItem.Key, 
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Writes this out to the console:
//
// Alpha duplicated 2 times with these values Alpha Alpha

Line By Line Explanation

Line 1: Our generic list is defined with duplicate strings “Alpha” while Beta Gamma and Delta are only found once.
Line 3:
Using the extension method GroupBy. This extension is based off of an enumerable item (IEnumerable<TSource>) which is of course our list. The primary argument is and a Lambda function (Func<TSource, TKey>) where we will simply define our TSource as the input (our string of the list) and its lambda operation as the key for our grouping.
The key in our case for this scenario is our string which we want to find the duplicate in the list. If we were dealing with a complex object then the key might be a property or field off of the object to use as the key in other scenarios but it is not. So our key will be the actual item found within our list. Since GroupBy’s result behaves like a dictionary, each key must be unique and that is the crux of how we can use GroupBy to divine all identical strings.
Line 3: Before moving on we must understand what GroupBy will return. By definition it returns IEnumerable<IGrouping<TKey, TSource>>. This can be broken down as such:

  • IEnumerable simply means that it will return a list or multiple of items which will be of IGrouping<> type.
  • IGrouping is a tuple type object where it contains the key of the grouped item and its corresponding value.  The nuance of this item is that when it is accessed directly it simply returns the TSource item (the non key part, just its value).
If one is familiar with the Dictionary class then one has worked with the KeyValuePair and this is the same except for the direct access of the value as mentioned above is not found in KeyValuePair.
Line 4: With GroupBy returning individual lists of key value pairs of IGrouping objects we need to weed out the single item keys and return only ones of two of more and the Where does this job for us. By specifying a lambda to weed out the lists which only have 1 found key, that gives us the duplicates sought.
Line 5:
Change from IEnumerable returned from Where extension and get an actual List object. This is done for two reasons.
The first is that the ForEach is an extension method specific to a list and not a raw IEnumerable.
Secondly and possibly more importantly, the GroupBy extension is a differed execution operation meaning that the data is not generated until it is actually needed. By going to ToList we are executing the operation and getting the data back immediately. This can come into play if the original data may change. If the data can change its best to get the data upfront from such differed execution methods.
Line 6: The goal is to display all the duplicates found and demonstrate how an IGrouping object is used. In our example we only have one IGrouping result list but if the data were changed we could have multiple. The following will display the information about our current IGrouping list returned.
Line 7: By accessing the named Key property of a IGrouping object we get an individual unique key result which defines the list of grouping data found. Note just because we have grouped our data by a key which is the same as the data, doesn’t mean in another use of Groupby that the data will be the same. In our example the key is “Alpha” which we send to {0} of the string.Format.
Line 8: The count property shows us how many values are found in the grouping. Our example returns two.
Line 9: We will enumerate the values of this current grouping and list out the data values. In this case there are two values both being “Alpha”.

GroupBy Usage with Only Two Defined Keys Regex Example

Now that one understands the GroupBy, one must not think that multiple unique keys are the be all end all to its usage. Sometimes we may want group things into found and not found groupings. The following example takes our greek letter list above and finds all the  words ending in “ta”.
Here is how it is done:
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy( txt => Regex.IsMatch( txt, "ta$" ))
       .ToList()
       .ForEach(groupItem => Console.WriteLine("{0} found {1} times with these values: {2}",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 string.Join(" ", groupItem.ToArray())));
// Output
// False found 3 times with these values: Alpha Alpha Gamma
// True found 2 times with these values: Beta Delta
Using our old friend Regex we are going to check to see if the current string ends in ta. If it does it will be in the key grouping of True and if not it will be found in the False grouping by the result of IsMatch. The result shows how we have manipulated the groupings to divine that Beta and Delta are the only two in our list which match the criteria. Hence demonstrating how we can further use the GroupBy method.

GroupBy Usage with one Key or a Non Related Key

I have actually had a need to where I grouped all items in to one key and performed an aggregate method on the result. The tip here is to show that one doesn’t have to group items by related keys. In the following example we through everything into group 1. We could have called the group anything frankly and sometimes it is needed.
This final example shows how the GroupBy can be flexible.
List<string> theList = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Delta" };

theList.GroupBy(txt => 1 )
        .ToList()
        .ForEach(groupItem => Console.WriteLine("The Key ({0}) found a total of {1} times with a total letter count of {2} characters",
                                                 groupItem.Key,
                                                 groupItem.Count(),
                                                 groupItem.Sum(it => it.Count())));

// Output:
// The Key (1) found a total of 5 times with a total letter count of 24 characters
Share

Tags: , ,

C#: Handling Title Case in Strings with Articles and Prepositions

iStock_000002240961XSmall

This was an issue I answered in the forums which a user presented and felt that the response was intricate enough to share with the world as a whole. The user wanted to have a string converted to title case but also wanted to have the first letter of any article or preposition to not be be upper case along with the rest of the sentence. This article discusses how to do that in C#.

For example the user was interested in changing

“ALL QUIET ON THE WESTERN FRONT”

   to

“All Quiet on the Western Front.”

.Net Framework Almost Does It

Thanks to the TextInfo class and a helping hint from a current CultureInfo object we can use the method ToTitleCase to work with our current language. The problem is that when ToTitleCase is called with the original sentence we get this:

“All Quiet On The Western Front”

Give it some Help

The .Net code is not robust enough to ignore the articles and prepositions so we will augment it. The following code using Linq-to-Object and Regex and processes majority of the target articles and prepositions . I have placed it into an extension method below:

/*
using System.Globalization;
using System.Threading;
using System.Text.RegularExpressions;
*/

/// <summary>
/// An Extension Method to allow us t odo "The Title Of It".asTitleCase()
/// which would return a TitleCased string.
/// </summary>
/// <param name="title">Title to work with.</param>
/// <returns>Output title as TitleCase</returns>
public static string asTitleCase ( this string title)
{
    string WorkingTitle = title;

    if ( string.IsNullOrEmpty( WorkingTitle ) == false )
    {
        char[] space = new char[] { ' ' };

        List<string> artsAndPreps = new List<string>()
            { "a", "an", "and", "any", "at", "from", "into", "of", "on", "or", "some", "the", "to", };

        //Get the culture property of the thread.
        CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
        //Create TextInfo object.
        TextInfo textInfo = cultureInfo.TextInfo;

        //Convert to title case.
        WorkingTitle = textInfo.ToTitleCase( title.ToLower() );

        List<string> tokens = WorkingTitle.Split( space, StringSplitOptions.RemoveEmptyEntries ).ToList();

        WorkingTitle = tokens[0];

        tokens.RemoveAt(0);

        WorkingTitle += tokens.Aggregate<String, String>( String.Empty, ( String prev, String input )
                                => prev +
                                    ( artsAndPreps.Contains( input.ToLower() ) // If True
                                        ? " " + input.ToLower()              // Return the prep/art lowercase
                                        : " " + input ) );                   // Otherwise return the valid word.

        // Handle an "Out Of" but not in the start of the sentance
        WorkingTitle = Regex.Replace( WorkingTitle, @"(?!^Out)(Out\s+Of)", "out of" );
    }

    return WorkingTitle;

}
Explanation
  • Line 21: Here is our English list of words not to capitalize. We would have to change this for other languages.
  • Line 25: We get the current culture from the running thread so that ToTitleCase can do its job.
  • Line 30: ToTitleCase does the first run and upper cases all the first letters and drops any following upper case letters if they exist.
  • Line 32: We split the line on space between the words into word tokens and put them in a list.
  • Line 34: We save off the first word because regardless of what it is, it is correct.
  • Line 36: We remove the first word so not to process it.
  • Line 40: Using the Aggregate extension to accumulate each word token we will add a space. We are using the aggregate method in-lieu of string.Join to add spaces to our words (the accumulation), but also to check each word as it goes by which string.Join can’t help us with.
  • Line 42: As the tokens (words) are handed to us, check to see if they are in the list we setup in line 21. If it exists, add a space in front and make the whole word lower case (Line 43) other wise ad a space and just return the word.
  • Line 46: Handle any two word Out Of issues, but ignore if it is the first word as found in “Out of Africa”.
Tests and Results

 

Console.WriteLine( "ALL QUIET ON THE WESTERN FRONT".asTitleCase() );
Console.WriteLine( "Bonfire OF THE Vanities".asTitleCase() );
Console.WriteLine( "The Out-of-Sync Child: Recognizing and Coping with Sensory Processing Disorder".asTitleCase() );
Console.WriteLine( "Out OF AFRICA".asTitleCase() );

/* Results
All Quiet on the Western Front
Bonfire of the Vanities
The Out-Of-Sync Child: Recognizing and Coping With Sensory Processing Disorder
Out of Africa
*/
Share

Tags: , , ,

C# Regex Linq: Extract an Html Node with Attributes of Varying Types

iStock_000008717494XSmall

The premise of this article and subsequent code sample is that one has an html node to parse and needs the parsed node’s attributes accessible in a handy fashion. Using Regular Expressions with Linq  we can achieve our goal and examine all attributes of the html node. I will show the steps to take and pitfalls on using other methodology.

Data

<INPUT onblur=google&&google.fade&&google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'/>

Why Not Use XElement’s Attributes?

Because of the free-form text found in the html the following code throws an exception on the first attribute encountered:

string test = @"<INPUT onblur=google&&google.fade&&google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'>";

// Fails saying google is unexpected token!
var input = XElement.Parse( test )
                    .Attributes()
                    .Select( vl => new KeyValuePair<string, string>( vl.Name.ToString(), vl.Value.ToString() ) );

foreach ( KeyValuePair<string, string> item in input )
    Console.WriteLine( "Key: {0,15} Value: {1}", item.Key, item.Value );

Step 1: Regex

Our first step is to create a regular expression which can handle the node and its attributes. What is interesting about the below regex pattern is that it uses an if clause to discriminate if the attribute contains the value in quotes, single or double, and will put them into the captures collection.

(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
# -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!

The above will find a match on a node, place the tag into the named capture of Tag. Then each attribute will be in two named capture collections of Key Value

Regex Returns A Match…Now What?

We need to extract the items into a Dictionary of key value pairs. The following code works with the name match captures and its indexed captures and extracts all attributes (Note copy code to clipboard or view to get alignment):

var attributes = ( from Match mt in Regex.Matches( node, pattern, RegexOptions.IgnorePatternWhitespace )
                   select new
                   {
                       Name = mt.Groups["Tag"],
                       Attrs = ( from cpKey in mt.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                                 join cpValue in mt.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )
                   } ).First().Attrs;

What the above is doing is enumerating over all the matches, in this case there is only one. Then we work through all the keys in the “Key” captures array and marry them to the “Value” value in that array on a one-to-one basis. Notice how we can index into a joined array via its index thanks to the specialized select which returns the index value. Finally we express those combined items  into a key value pair.

Full Code and Result

string node = @"<INPUT onblur=google&amp;&amp;google.fade&amp;&amp;google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'/>";
string pattern =@"
(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
                     # -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!";

var attributes = ( from Match mt in Regex.Matches( node, pattern, RegexOptions.IgnorePatternWhitespace )
                   select new
                   {
                       Name = mt.Groups["Tag"],
                       Attrs = ( from cpKey in mt.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                                 join cpValue in mt.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )
                   } ).First().Attrs;


foreach ( KeyValuePair<string, string> kvp in attributes )
    Console.WriteLine( "Key {0,15}    Value: {1}", kvp.Key, kvp.Value );

/* Output:
Key          onblur    Value: google&amp;&amp;google.fade&amp;&amp;google.fade()
Key           class    Value: lst
Key           title    Value: Google Search
Key           value    Value: TESTING
Key       maxLength    Value: 2048
Key            size    Value: 55
Key            name    Value: q
Key    autocomplete    Value: off
Key            init    Value: true
*/
Share

Tags: , ,

C# Linq How To Load a Winform Control Using Dynamic Linq entitites (Or any other Control)

This has been requested in the forums, for most examples in the Linq world are of console applications. One can use Linq to dynamically load winform (or asp.net) controls very easily with the dynamic entities which are created on the fly thanks to Linq. This article shows how to load a gridview control, or any other, with dynamic items of linq.

Steps

  1. Create a winform project/solution.
  2. Drag two controls to the surface of the form datagridview and bindingsource. We will accept the default names of dataGridView1 and bindingSource1.
  3. For demonstration purposes of this article we will create a struct for the simulated data which the Linq-to-Object operation will query off of:
    struct Data
    {
        public string Name { get; set; }
        public string Operation { get; set; }
        public string Description { get; set; }
        public DateTime DateStart { get; set; }
        public DateTime DateEnd { get; set; }
    
    }

    Note if you don’t create a new file for the above struct, place it below the Form1 partial class. Placing it above will screw up the design view when you try to edit.

  4. Here is the code to load the GridView using Linq which can be called from the constructor of the form after the InitializeCoponent call.
    BindingSource bindingSource1= new BindingSource();
    private void LoadGrid()
    {
        List<Data> dataListing = new List<Data>()
        {
            new Data() { Name = "Jabberwocky", Operation="Read", DateStart= DateTime.Now.AddDays(-2), DateEnd = DateTime.Now.AddDays(-2), Description="Process Started No errors"},
            new Data() { Name = "Space", Operation="Write", DateStart= DateTime.Now.AddDays(-2), DateEnd = DateTime.Now.AddDays(-1), Description="Final process remote allocation of 3000 items to main buffer."},
            new Data() { Name = "Stock Purchase", Operation="DataWarehousing", DateStart= DateTime.Now, DateEnd = DateTime.Now, Description="Shared data transport."}
        };
    
        var items = from dta in dataListing
            select new
            {
               OperationName = dta.Name,
               Start         = dta.DateStart.ToShortDateString(),
               End           = dta.DateEnd.ToShortDateString(),
               Operation     = dta.Operation,
               Description   = dta.Description
             };
    
        bindingSource1.DataSource = items;
        dataGridView1.DataSource  = bindingSource1;
    
        // Grid attributes
        dataGridView1.BorderStyle         = BorderStyle.Fixed3D;
        dataGridView1.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.AllCells;
    
    }

Result

GridView

Share

Tags: , , ,

INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll

BetweenStonesWhat if you are stuck having to deal with older technology such as INI files while using the latest and greatest C# and .Net there is available? This article discusses an alternate way to read INI files and extract the data from those dusty tomes while  easily accessing the resulting data from dictionaries. Once the data resides in the dictionaries we can easily extract the data using the power of the indexer on section name followed by key name within the section. Such as IniFile[“TargetSection”][“TargetKey”] which will return a string of the value of that key in the ini file for that section.

Note all the code is one easy code section at the bottom of the article so don’t feel you have to copy each sections code.

Overview

If you are reading this, chances are you know what INI files are and don’t need a refresher. You may have looked into using the Win32 Kern32.dll method GetPrivateProfileSection to achieve your goals. Ack!  “Set the Wayback machine Sherman!” Thanks but no thanks.

Here is how to do this operation using Regular Expressions (Kinda a way back machine but very useful) and Linq to Object to get the values into a dictionary format so we can write this line of code to access the data within the INI file:

string myValue = IniFile[“SectionName”][“KeyName”];

The Pattern

Let me explain the Regex Pattern. If you are not so inclined to understand the semantics of it skip to the next section.

string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
 (?<Section>[^\]]*)         # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
   (?!\[)                    # Stop capture groups if a [ is found; new section
   (?<Key>[^=]*?)            # Any text before the =, matched few as possible
   (?:=)                     # Get the = now
   (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
   (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Our goal is to use Named Match groups. Each match will have its section name in the named group called  “Section”  and all of the data, which is the key and value pairs will be named “Key” and “Value” respectively.  The trick to the above pattern is found in line eight. That stops the match when a new section is hit using the Match Invalidator (?!). Otherwise our key/values would bleed into the next section if not stopped.

The Data

Here is the data for your perusal.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";

We are interested in “Window Name” and “Directory”.

The Linq

Ok, if you thought the regex pattern was complicated, the Linq to Objects has some tricks up its sleeve as well. Primarily since our pattern matches create a single match per section with the accompany key and value data in two separate named match capture collections, that presents a problem. We need to join the the capture collections together, but there is no direct way to do that for the join in Linq because that link is only an indirect by the collections index number.

How do we get the two collections to be joined?

Here is the code:

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
 select new
 {
  Section = m.Groups["Section"].Value,

  kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
     join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
     select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

  } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Explanation:

  • Line 1: Our end goal object is a Dictionary where the key is the Section name and the value is a sub-dictionary with all the keys and values found in that section.
  • Line 2: The regex needs IPW because we have commented the pattern. It needs multiline because we are spanning multiple lines and need ^ to match each individual line and not just the beginning.
  • Line 5: This is the easiest item, simply access the named capture group “Section” for the section name.
  • Line 7 (.Captures) : Each one of the keys and values are in the specialized capture collection property off of the match.
  • Line 7 (.Cast<Capture>) : Since capture is specialized list and not a true generic list, such as List<string> we are going to Cast it(Cast<(Of <(TResult>) it (to IEnumerable<(Of <(T>)>),so we can access the standard query operators, i.e. the extension methods which are available to IEnumerable<T>. Short answer, so we can call .Select.
  • Line 7 (.Select): Because each list does not have a direct way to associate the data, we are going to create a new object that has a property which will have that index number, along with the target data value. That will allow us join it to the other list.
  • Line 7 (Lambda) : The lambda has two parameters, the first is our actual regex Capture object represented by a. The i is the index value which we need for the join. We then call new and create a new entity with two properties, the first is actual value of the Key found of the Capture class property “Value” and the second is i the index value.
  • Line 8 (Join) : We are going to join the data together using the direct properties of our new entity, but first we need to recreate the magic found in Line 7 for our Values capture collection. It is the same logic as the previous line so I will not delve into its explanation in detail.
  • Line 8 (on cpKey.i equals cpValue.i) : This is our association for the join on the new entities and yay, where index value i equals the other index value i allows us to do that. This is the keystone of all we are doing.
  • Line 9 (new KeyValuePair) : Ok we are now creating each individual linq projection item of the data as a KeyValuePair object. This could be removed for a new entity, but I choose to use the KeyValuePair class.
  • Line 9 (ToDictionary) : We want to easily access these key value pairs in the future, so we are going to place the Key into a Key of a dictionary and the dictionary key’s value from the actual Value.
  • Line 11 (ToDictionary) : Here is where we take the projection of the previous lines of code and create the end goal dictionary where the key name is the section and the value is the sub dictionary created in Line 9.

Whew…what is the result?

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs

Summary

Thanks to the power of regular expressions and Linq we don’t have to use the old methods to extract and process the data. We can easily access the information using the newer structures. Hope this helps and that you may have learned something new from something old.

Code All in One Place

Here is all the code so you don’t have to copy it from each section above. Don’t forget to include the using System.Text.RegularExpressions to do it all.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";
string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
     (?<Section>[^\]]*)     # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
  (?!\[)                    # Stop capture groups if a [ is found; new section
  (?<Key>[^=]*?)            # Any text before the =, matched few as possible
  (?:=)                     # Get the = now
  (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
  (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
    select new
    {
        Section = m.Groups["Section"].Value,

        kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                 join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

    } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs
Share

Tags: , ,

Linq Orderby a Better IComparer in C#

Sometimes IComparer falls short when on has a need to sort on different, for lack of a better term, data columns. Before writing an IComparer interface for sort, try using Linq’s Orderby.

In the forums the user had data, in string lines, which looked like this

3 months ending 9/30/2007
9 months ending 9/30/2007
3 months ending 9/30/2008
9 months ending 9/30/2008

The user needed the white items sorted first in ascending fashion and the red year items sorted descending. Because the data was all in a string and needed differing sorts, je was having problems with sort with a custom IComparer class.

I recommend that he use regex to parse out the items then use linq to sort. Here is the result.  Note I merged all data into one string where each line is a true line.

string input =
@"3 months ending 9/30/2007
9 months ending 9/30/2007
3 months ending 9/30/2008
9 months ending 9/30/2008";

string pattern = @"(?<Total>\d\d?)(?:[^\d]+)(?<Date>[\d/]+)";

var items =
    from Match m in Regex.Matches( input, pattern )
    select new
    {
        Total = m.Groups["Total"].Value,
        Date = DateTime.Parse( m.Groups["Date"].Value ),
        Full = m.Groups[0].Value
    };

var values = from p in items
             orderby p.Total, p.Date.Year descending
             select p;

foreach ( var itm in values )
    Console.WriteLine( itm.Full );

/* Outputs
3 months ending 9/30/2008
3 months ending 9/30/2007
9 months ending 9/30/2008
9 months ending 9/30/2007
             */
Share

Tags: , , , ,

Regex To Linq to Dictionary in C#

This article demonstrates these concepts:

  1. Regex extraction of Key Value pairs and placing them into named capture groups.
  2. Linq extraction of the Key Value pairs extracted from the matches of Regex.
  3. Dictionary creation from Linq using the ToDictionary method.

I answered this on the MSDN forums, the user had this data in key value pairs delimited by the pipe:

abc:1|bbbb:2|xyz:45|p:120

Keys values separators

The need was to get the keys and values into a dictionary. The following code uses named regex group matches which are used in Linq to extract the keys and their values. Once that is done within the linq the extended method ToDictionary is used to create the dictionary on the fly. Here is the code:

string input = "abc:1|bbbb:2|xyz:45|p:120";
string pattern = @"(?<Key>[^:]+)(?:\:)(?<Value>[^|]+)(?:\|?)";

Dictionary<string, string> KVPs
    = ( from Match m in Regex.Matches( input, pattern )
      select new
      {
          key = m.Groups["Key"].Value,
          value = m.Groups["Value"].Value
       }
       ).ToDictionary( p => p.key, p => p.value );

foreach ( KeyValuePair<string, string> kvp in KVPs )
    Console.WriteLine( "{0,6} : {1,3}", kvp.Key, kvp.Value );

/* Outputs:
 abc :   1
bbbb :   2
 xyz :  45
  p  : 120
 */
Share

Tags: , , ,

C# Dictionary<T> Tricks

>(Post Updated 4/2/2011 : Added extra example)

This article demonstrates the Tribal Knowledge of using a Dictionary to do the things that the Swiss Army Knife Hashtable used to do.

For .Net 3 through .Net 4.

I love the generic Dictionary collection so much, I think that Microsoft might charge me usage on it. Here I present some of the things that I have learned which I call the tribal knowledge of the class.

The first use of the Dictionary was found in .Net 1 HashTable class. That class is a dictionary for storing lists with keys and has the ubiquitously Key Value Pair. With .Net 2 the generic class Dictionary was used, less upfront functionality, but same concepts; data in key value pairs.

While the old hash table could be sorted the dictionary cannot. So how does one sort? The Dictionary is just two lists under the covers. One is for keys and one is for values. Operations don’t go to the hash table but to the lists. Once you remember that and understand it, the obtuseness of the Dictionary goes away…

Sorting

Ok, so we can sort…but one complaint or question found in the MSDN forums that I have run into is that, the Hashtable could be sorted because it exposed IComparer and its cousin the Dictionary does not! How does one get around that?

As mentioned one must think of what a dictionary really is…just two lists. There are two properties exposed on the Dictionary and those are the Keys and Values. Those are shown to the world as ValueCollection and that my friends is the entry pass into the club. One of the interfaces that is exposed is the ValueCollection and it adheres to the ICollection. Bingo! With that we can do sorting! Here is the code snippet:

Dictionary<int, string> myDict = new Dictionary<int, string>()
            {
                { 2, "This" },
                { 1, "is" },
                { 5, "radio" },
                { 12, "clash" },
            };

List<string> song = new List<string>( myDict.Values );

song.Sort();

// This writes out: "clash is radio This"
Console.WriteLine( string.Join( " ", song.ToArray() ) );

We simply use the sorting which is found on the List<T>.

Enumerating over a Dictionary

Sometimes the need is there to run through a dictionary and show the keys and values. When enumerating over a Dictionary it returns a Key Value Pair object. That object holds the key and the value.

Dictionary<string, string> ColumnValuesHash = new Dictionary<string, string>()
// ... load with values...
foreach (KeyValuePair entry in ColumnValuesHash)
    Console.WriteLine("{0}: {1}", entry.Key, entry.Value);

Dictionary of Dictionaries

I have used this gem many times. Create a top level dictionary that will hold other dictionaries…its not as bad as it sounds.

Dictionary<int, string> myStringHash = new Dictionary<int, string>();

myStringHash.Add(41,      "Jabberwocky");
myStringHash.Add(8675309, "Jenny");

Dictionary< string, Dictionary<int, string>> myDicts = new Dictionary<string, Dictionary<int, string>>();

myDicts.Add("Test", myStringHash);

Console.WriteLine( myDicts["Test"][8675309] ); // Prints Jenny

Notice how easy it is to drill down via indexes off of each dictionary.

Dictionary Meets Linq and Gets Sorted

Here is how you can access the dictionary, sort it and enumerate it all in Linq. We will take our above example and do it

Dictionary<int, string> myDict = new Dictionary<int, string>()
                     {
                         { 2, "This" },
                         { 1, "is" },
                         { 5, "radio" },
                         { 12, "clash" },
                     };

var sorted = from item in myDict
             orderby item.Value ascending
             select item.Value;

// This writes out: "clash is radio This"
Console.WriteLine( string.Join( " ", sorted.ToArray() ) );

Extra Example (Loading on the fly and Linq extension Select Many)

Dictionary<string, Dictionary<int, string>> TopLevel = new Dictionary<string, Dictionary<int, string>>()
{
    {
        "Dictionary 1",
        new Dictionary<int, string>()
                                    { // Sub "dictionary 1" contains two objects
                                        {41, "The Meaning of Life" },
                                        {90125, "Owner of a Lonely Heart"}
                                    }
    },

    {

        "Dictionary 2",
        new Dictionary<int, string>()
                                    { // Sub "Dictionary 2" contains one object
                                        {8675309, "Jenny!" }
                                    }

    }

};

Console.WriteLine(TopLevel["Dictionary 1"][41]);      // Prints out "The Meaning of Life!"

//     Console.WriteLine(TopLevel["Dictionary 1"][8675309]); // Fails and would throw an exception! That is in Dictionary 2!

// Use selectmany to grab all sub objects
// (which are a Key value pair of KVP<string, Dictionary<int, string>)
// and return the value (the sub dictionaries) and combine
// all the those dictionaries from both sub dictionary into one IEnumerable KVP.
TopLevel.SelectMany( dict => dict.Value ) // Gets all the DIctionary<int, string> pairs between both dictionaries
        .ToList() // Convert from IEnumerable to list so we can do the for each below
        .ForEach( kvp => Console.WriteLine( "{0} is {1}", kvp.Key, kvp.Value ) ); // Outputs the lowest dictionaries.

/* above outputs
41 is The Meaning of Life
90125 is Owner of a Lonely Heart
8675309 is Jenny!
*/

Hope this helps!

Share

Tags: , ,