Archive for the ‘Linq-to-Object’ Category.

C# Regex Linq: Extract an Html Node with Attributes of Varying Types

iStock_000008717494XSmall

The premise of this article and subsequent code sample is that one has an html node to parse and needs the parsed node’s attributes accessible in a handy fashion. Using Regular Expressions with Linq  we can achieve our goal and examine all attributes of the html node. I will show the steps to take and pitfalls on using other methodology.

Data

<INPUT onblur=google&&google.fade&&google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'/>

Why Not Use XElement’s Attributes?

Because of the free-form text found in the html the following code throws an exception on the first attribute encountered:

string test = @"<INPUT onblur=google&&google.fade&&google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'>";

// Fails saying google is unexpected token!
var input = XElement.Parse( test )
                    .Attributes()
                    .Select( vl => new KeyValuePair<string, string>( vl.Name.ToString(), vl.Value.ToString() ) );

foreach ( KeyValuePair<string, string> item in input )
    Console.WriteLine( "Key: {0,15} Value: {1}", item.Key, item.Value );

Step 1: Regex

Our first step is to create a regular expression which can handle the node and its attributes. What is interesting about the below regex pattern is that it uses an if clause to discriminate if the attribute contains the value in quotes, single or double, and will put them into the captures collection.

(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
# -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!

The above will find a match on a node, place the tag into the named capture of Tag. Then each attribute will be in two named capture collections of Key Value

Regex Returns A Match…Now What?

We need to extract the items into a Dictionary of key value pairs. The following code works with the name match captures and its indexed captures and extracts all attributes (Note copy code to clipboard or view to get alignment):

var attributes = ( from Match mt in Regex.Matches( node, pattern, RegexOptions.IgnorePatternWhitespace )
                   select new
                   {
                       Name = mt.Groups["Tag"],
                       Attrs = ( from cpKey in mt.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                                 join cpValue in mt.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )
                   } ).First().Attrs;

What the above is doing is enumerating over all the matches, in this case there is only one. Then we work through all the keys in the “Key” captures array and marry them to the “Value” value in that array on a one-to-one basis. Notice how we can index into a joined array via its index thanks to the specialized select which returns the index value. Finally we express those combined items  into a key value pair.

Full Code and Result

string node = @"<INPUT onblur=google&amp;&amp;google.fade&amp;&amp;google.fade() class=lst title='Google Search' value=TESTING maxLength=2048 size=55 name=q autocomplete='off' init='true'/>";
string pattern =@"
(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
                     # -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!";

var attributes = ( from Match mt in Regex.Matches( node, pattern, RegexOptions.IgnorePatternWhitespace )
                   select new
                   {
                       Name = mt.Groups["Tag"],
                       Attrs = ( from cpKey in mt.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                                 join cpValue in mt.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )
                   } ).First().Attrs;


foreach ( KeyValuePair<string, string> kvp in attributes )
    Console.WriteLine( "Key {0,15}    Value: {1}", kvp.Key, kvp.Value );

/* Output:
Key          onblur    Value: google&amp;&amp;google.fade&amp;&amp;google.fade()
Key           class    Value: lst
Key           title    Value: Google Search
Key           value    Value: TESTING
Key       maxLength    Value: 2048
Key            size    Value: 55
Key            name    Value: q
Key    autocomplete    Value: off
Key            init    Value: true
*/
Share

C# Linq How To Load a Winform Control Using Dynamic Linq entitites (Or any other Control)

This has been requested in the forums, for most examples in the Linq world are of console applications. One can use Linq to dynamically load winform (or asp.net) controls very easily with the dynamic entities which are created on the fly thanks to Linq. This article shows how to load a gridview control, or any other, with dynamic items of linq.

Steps

  1. Create a winform project/solution.
  2. Drag two controls to the surface of the form datagridview and bindingsource. We will accept the default names of dataGridView1 and bindingSource1.
  3. For demonstration purposes of this article we will create a struct for the simulated data which the Linq-to-Object operation will query off of:
    struct Data
    {
        public string Name { get; set; }
        public string Operation { get; set; }
        public string Description { get; set; }
        public DateTime DateStart { get; set; }
        public DateTime DateEnd { get; set; }
    
    }

    Note if you don’t create a new file for the above struct, place it below the Form1 partial class. Placing it above will screw up the design view when you try to edit.

  4. Here is the code to load the GridView using Linq which can be called from the constructor of the form after the InitializeCoponent call.
    BindingSource bindingSource1= new BindingSource();
    private void LoadGrid()
    {
        List<Data> dataListing = new List<Data>()
        {
            new Data() { Name = "Jabberwocky", Operation="Read", DateStart= DateTime.Now.AddDays(-2), DateEnd = DateTime.Now.AddDays(-2), Description="Process Started No errors"},
            new Data() { Name = "Space", Operation="Write", DateStart= DateTime.Now.AddDays(-2), DateEnd = DateTime.Now.AddDays(-1), Description="Final process remote allocation of 3000 items to main buffer."},
            new Data() { Name = "Stock Purchase", Operation="DataWarehousing", DateStart= DateTime.Now, DateEnd = DateTime.Now, Description="Shared data transport."}
        };
    
        var items = from dta in dataListing
            select new
            {
               OperationName = dta.Name,
               Start         = dta.DateStart.ToShortDateString(),
               End           = dta.DateEnd.ToShortDateString(),
               Operation     = dta.Operation,
               Description   = dta.Description
             };
    
        bindingSource1.DataSource = items;
        dataGridView1.DataSource  = bindingSource1;
    
        // Grid attributes
        dataGridView1.BorderStyle         = BorderStyle.Fixed3D;
        dataGridView1.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.AllCells;
    
    }

Result

GridView

Share

INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll

BetweenStonesWhat if you are stuck having to deal with older technology such as INI files while using the latest and greatest C# and .Net there is available? This article discusses an alternate way to read INI files and extract the data from those dusty tomes while  easily accessing the resulting data from dictionaries. Once the data resides in the dictionaries we can easily extract the data using the power of the indexer on section name followed by key name within the section. Such as IniFile[“TargetSection”][“TargetKey”] which will return a string of the value of that key in the ini file for that section.

Note all the code is one easy code section at the bottom of the article so don’t feel you have to copy each sections code.

Overview

If you are reading this, chances are you know what INI files are and don’t need a refresher. You may have looked into using the Win32 Kern32.dll method GetPrivateProfileSection to achieve your goals. Ack!  “Set the Wayback machine Sherman!” Thanks but no thanks.

Here is how to do this operation using Regular Expressions (Kinda a way back machine but very useful) and Linq to Object to get the values into a dictionary format so we can write this line of code to access the data within the INI file:

string myValue = IniFile[“SectionName”][“KeyName”];

The Pattern

Let me explain the Regex Pattern. If you are not so inclined to understand the semantics of it skip to the next section.

string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
 (?<Section>[^\]]*)         # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
   (?!\[)                    # Stop capture groups if a [ is found; new section
   (?<Key>[^=]*?)            # Any text before the =, matched few as possible
   (?:=)                     # Get the = now
   (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
   (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Our goal is to use Named Match groups. Each match will have its section name in the named group called  “Section”  and all of the data, which is the key and value pairs will be named “Key” and “Value” respectively.  The trick to the above pattern is found in line eight. That stops the match when a new section is hit using the Match Invalidator (?!). Otherwise our key/values would bleed into the next section if not stopped.

The Data

Here is the data for your perusal.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";

We are interested in “Window Name” and “Directory”.

The Linq

Ok, if you thought the regex pattern was complicated, the Linq to Objects has some tricks up its sleeve as well. Primarily since our pattern matches create a single match per section with the accompany key and value data in two separate named match capture collections, that presents a problem. We need to join the the capture collections together, but there is no direct way to do that for the join in Linq because that link is only an indirect by the collections index number.

How do we get the two collections to be joined?

Here is the code:

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
 select new
 {
  Section = m.Groups["Section"].Value,

  kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
     join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
     select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

  } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Explanation:

  • Line 1: Our end goal object is a Dictionary where the key is the Section name and the value is a sub-dictionary with all the keys and values found in that section.
  • Line 2: The regex needs IPW because we have commented the pattern. It needs multiline because we are spanning multiple lines and need ^ to match each individual line and not just the beginning.
  • Line 5: This is the easiest item, simply access the named capture group “Section” for the section name.
  • Line 7 (.Captures) : Each one of the keys and values are in the specialized capture collection property off of the match.
  • Line 7 (.Cast<Capture>) : Since capture is specialized list and not a true generic list, such as List<string> we are going to Cast it(Cast<(Of <(TResult>) it (to IEnumerable<(Of <(T>)>),so we can access the standard query operators, i.e. the extension methods which are available to IEnumerable<T>. Short answer, so we can call .Select.
  • Line 7 (.Select): Because each list does not have a direct way to associate the data, we are going to create a new object that has a property which will have that index number, along with the target data value. That will allow us join it to the other list.
  • Line 7 (Lambda) : The lambda has two parameters, the first is our actual regex Capture object represented by a. The i is the index value which we need for the join. We then call new and create a new entity with two properties, the first is actual value of the Key found of the Capture class property “Value” and the second is i the index value.
  • Line 8 (Join) : We are going to join the data together using the direct properties of our new entity, but first we need to recreate the magic found in Line 7 for our Values capture collection. It is the same logic as the previous line so I will not delve into its explanation in detail.
  • Line 8 (on cpKey.i equals cpValue.i) : This is our association for the join on the new entities and yay, where index value i equals the other index value i allows us to do that. This is the keystone of all we are doing.
  • Line 9 (new KeyValuePair) : Ok we are now creating each individual linq projection item of the data as a KeyValuePair object. This could be removed for a new entity, but I choose to use the KeyValuePair class.
  • Line 9 (ToDictionary) : We want to easily access these key value pairs in the future, so we are going to place the Key into a Key of a dictionary and the dictionary key’s value from the actual Value.
  • Line 11 (ToDictionary) : Here is where we take the projection of the previous lines of code and create the end goal dictionary where the key name is the section and the value is the sub dictionary created in Line 9.

Whew…what is the result?

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs

Summary

Thanks to the power of regular expressions and Linq we don’t have to use the old methods to extract and process the data. We can easily access the information using the newer structures. Hope this helps and that you may have learned something new from something old.

Code All in One Place

Here is all the code so you don’t have to copy it from each section above. Don’t forget to include the using System.Text.RegularExpressions to do it all.

string data = @"[WindowSettings]
Window X Pos=0
Window Y Pos=0
Window Maximized=false
Window Name=Jabberwocky

[Logging]
Directory=C:\Rosetta Stone\Logs
";
string pattern = @"
^                           # Beginning of the line
((?:\[)                     # Section Start
     (?<Section>[^\]]*)     # Actual Section text into Section Group
 (?:\])                     # Section End then EOL/EOB
 (?:[\r\n]{0,}|\Z))         # Match but don't capture the CRLF or EOB
 (                          # Begin capture groups (Key Value Pairs)
  (?!\[)                    # Stop capture groups if a [ is found; new section
  (?<Key>[^=]*?)            # Any text before the =, matched few as possible
  (?:=)                     # Get the = now
  (?<Value>[^\r\n]*)        # Get everything that is not an Line Changes
  (?:[\r\n]{0,4})           # MBDC \r\n
  )+                        # End Capture groups";

Dictionary<string, Dictionary<string, string>> InIFile
= ( from Match m in Regex.Matches( data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline )
    select new
    {
        Section = m.Groups["Section"].Value,

        kvps = ( from cpKey in m.Groups["Key"].Captures.Cast<Capture>().Select( ( a, i ) => new { a.Value, i } )
                 join cpValue in m.Groups["Value"].Captures.Cast<Capture>().Select( ( b, i ) => new { b.Value, i } ) on cpKey.i equals cpValue.i
                 select new KeyValuePair<string, string>( cpKey.Value, cpValue.Value ) ).ToDictionary( kvp => kvp.Key, kvp => kvp.Value )

    } ).ToDictionary( itm => itm.Section, itm => itm.kvps );

Console.WriteLine( InIFile["WindowSettings"]["Window Name"] ); // Jabberwocky
Console.WriteLine( InIFile["Logging"]["Directory"] );          // C:\Rosetta Stone\Logs
Share