Archive for the ‘XML’ Category.

WCF: Creating Custom Headers, How To Add and Consume Those Headers

When creating a C# WCF service (version .Net 3.0 and above) there may be a value in identifying the clients (consumers) which a web service is providing operational support to. This article demonstrates in C# and config Xml how to have clients identify themselves and pass pertinent information within the soap message’s header. That information in turn will be processed by the Web Service accordingly.

Client Identifies Itself

The goal here is to have the client provide some sort of information which the server can use to determine who is sending the message. The following C# code will add a header named `ClientId`:

var cl = new ActiveDirectoryClient();

var eab = new EndpointAddressBuilder(cl.Endpoint.Address);

eab.Headers.Add( AddressHeader.CreateAddressHeader("ClientId",       // Header Name
                                                   string.Empty,     // Namespace
                                                    "OmegaClient")); // Header Value
cl.Endpoint.Address = eab.ToEndpointAddress();

// Now do an operation provided by the service.
cl.ProcessInfo("ABC");

What that code is doing is adding an endpoint header named `ClientId` with a value of `OmegaClient` to be inserted into the soap header without a namespace.

Custom Header in Client’s Config File

There is an alternate way of doing a custom header. That can be achieved in the Xml config file of the client where all messages sent by specifying the custom header as part of the endpoint as so:

<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
    </startup>
    <system.serviceModel>
        <bindings>
            <basicHttpBinding>
                <binding name="BasicHttpBinding_IActiveDirectory" />
            </basicHttpBinding>
        </bindings>
        <client>
          <endpoint address="http://localhost:41863/ActiveDirectoryService.svc"
              binding="basicHttpBinding" bindingConfiguration="BasicHttpBinding_IActiveDirectory"
              contract="ADService.IActiveDirectory" name="BasicHttpBinding_IActiveDirectory">
            <headers>
              <ClientId>Console_Client</ClientId>
            </headers>
          </endpoint>
        </client>
    </system.serviceModel>
</configuration>

The above config file is from a .Net 4.5 client.

Server Identifies Client Request

Finally the web service will read the custom header and distinquish between any WCF client and process it accordingly.

var opContext = OperationContext.Current; // If this is null, is this code in an async block? If so, extract it before the async call.

var rq = opContext.RequestContext; 

var headers = rq.RequestMessage.Headers;

int headerIndex = headers.FindHeader("ClientId", string.Empty);

var clientString = (headerIndex < 0) ? "UNKNOWN" : headers.GetHeader<string>(headerIndex);
Share

.Net Regex: Can Regular Expression Parsing be Faster than XmlDocument or Linq to Xml?

iStock_000017256683XSmallMost of the time one needs the power of the xml parser whether it is the XmlDocument or Linq to Xml to manipulate and extract data. But what if I told you that in some circumstances regular expressions might be faster?

Most conventional development thinking has branded regex processing as slow and the thought of using regex on xml might seem counter intuitive. In a continuation of articles I again want to dispel those thoughts and provide a real world example where Regular Expression parsing is not only on par with other tools in the .Net world but sometimes faster. The results of my speed test may surprise you;  and hopefully show that regular expressions are not as slow as believed, if not faster!

See: Are C# .Net Regular Expressions Fast Enough for You?

Real World Scenario

There was a developer on the MSDN forums who needed the ability to count URLs in multiple xml files. (See the actual post count the urls in xml file on Msdn) The poster received three distinct replies, one to use XMLDocument, another provided a Linq to XML solution and I chimed in with the regular expression method. The poster took the XMLDocument method and marked as the answer, but could he have done better?

I thought so…

So I took the three replies and distilled them down into their core processing and wrapped them in a similar IO extraction layer and proceeded to time them. I created 48 xml files with over one hundred thousand urls to find for a total of 13 meg on disk. I then proceeded to run the test all in release mode to get the results.  (See below section Setup to get a gist repository of the code).

Real World Result

Five tests, each test name is the technology and the user as found on the original msdn post. In red is the slowest and fastest time. Remember XmlDoc is the one the user choose as the answer.

Test 1
Regex           found 116736 urls in 00:00:00.1843576
XmlLinq_Link_FR found 116736 urls in 00:00:00.2662190
XmlDoc_Hasim()  found 116736 urls in 00:00:00.3534628

Test 2
Regex           found 116736 urls in 00:00:00.2317883
XmlLinq_Link_FR found 116736 urls in 00:00:00.2792730
XmlDoc_Hasim()  found 116736 urls in 00:00:00.2694969

Test 3
Regex           found 116736 urls in 00:00:00.1646719
XmlLinq_Link_FR found 116736 urls in 00:00:00.2333891
XmlDoc_Hasim()  found 116736 urls in 00:00:00.2625176

Test 4
Regex           found 116736 urls in 00:00:00.1677931
XmlLinq_Link_FR found 116736 urls in 00:00:00.2258825
XmlDoc_Hasim()  found 116736 urls in 00:00:00.2590841

Test 5
Regex           found 116736 urls in 00:00:00.1668231
XmlLinq_Link_FR found 116736 urls in 00:00:00.2278445
XmlDoc_Hasim()  found 116736 urls in 00:00:00.2649262

 

Wow! Regex consistently performed better, even when there was no caching of the files as found for the first run! Note that the time is Hours : Minutes : Seconds and regex’s is the fastest at 164 millseconds to parse 48 files! Regex worst time of 184 milleseconds is still better than the other two’s best times.

How was this all done? Let me show you.

Setup

Ok what magic or trickery have I played? All tests are run in a C# .Net 4 Console application in release mode. I have created a public Gist (Regex vs Xml) repository of the code and data which is actually valid Git repository for anyone how may want to add their tests, but let me detail what I did here on the blog as well.

The top level operation found in the Main looks like this where I run the tests 5 times

Enumerable.Range( 1, 5 )
            .ToList()
            .ForEach( tstNumber =>
            {
                Console.WriteLine( "Test " + tstNumber );
                Time( "Regex", RegexFindXml );
                Time( "XmlLinq_Link_FR", XmlLinq_Link_FR );
                Time( "XmlDoc_Hasim()", XmlDoc_Hasim );
                Console.WriteLine( Environment.NewLine );
            }

while the Time generic method looks like this and dutifully runs the target work and reports the results in “Test X found Y Urls in X [time]”:

public static void Time<T>( string what, Func<T> work )
{
    var sw = Stopwatch.StartNew();
    var result = work();
    sw.Stop();
    Console.WriteLine( "\t{0,-15} found {1} urls in {2}", what, result, sw.Elapsed );
}

Now in the msdn post the different methods had differing ways of finding each xml file and opening it, I made them all adhere to the way I open and sum the ULR counts. Here is its snippet:

return Directory.EnumerateFiles( @"D:\temp", "*.xml" )
            .ToList()
            .Sum( fl =>
            {

            } );

Contender  –  XML Document

This is one which the poster marked as the chosen one he used and I dutifully copied it to the best of my ability.

public static int XmlDoc_Hasim()
{
    return Directory.EnumerateFiles( @"D:\temp", "*.xml" )
                .ToList()
                .Sum( fl =>
                {
                    XmlDocument doc = new XmlDocument();
                    doc.LoadXml( System.IO.File.ReadAllText( fl ) );

                    if (doc.ChildNodes.Count > 0)
                        if (doc.ChildNodes[1].HasChildNodes)
                            return doc.ChildNodes[1].ChildNodes.Count;

                    return 0;

                } );

}

I used the sum extension method which is a little different from the original sum operation used, but it brings the tests closer in line by using the Extension.

Contender – Linq to Xml

Of the other two attempts, this one I felt was the more robust of the two, because it actually handled the xml namespace. Sadly it appeared to be ignored by the original poster. Here is his code

public static int XmlLinq_Link_FR()
{
    XNamespace xn = "http://www.sitemaps.org/schemas/sitemap/0.9";

    return Directory.EnumerateFiles( @"D:\temp", "*.xml" )
                    .Sum( fl => XElement.Load( fl ).Descendants( xn + "loc" ).Count() );

}

Contender – Regular Expression

Finally here is the speed test winner. I came up with the pattern design Upon by looking at the xml and it appeared one didn’t need to match the actual url, but just the two preceding  tags and any possible space between. That is the key to regex, using good patterns can achieve fast results.

public static int RegexFindXml()
{
    string pattern = @"(<url>\s*<loc>)";

    return Directory.EnumerateFiles( @"D:\temp", "*.xml" )
                    .Sum( fl => Regex.Matches( File.ReadAllText( fl ), pattern ).OfType<Match>().Count() );

}

XML1 (Shortened)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/barcelona.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/basel.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/internet-web2.0-startups-social-networking/bath.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/computer-networking/sheffield.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/computer-networking/singapore.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/computer-networking/slough.html</loc><changefreq>weekly</changefreq></url>
<url><loc>http://www.linkedin.com/directory/companies/computer-networking/slovak-republic.html</loc><changefreq>weekly</changefreq></url>
</urlset>

Xml2 Shortened

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.linkedin.com/groups/gid-2431604</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2430868</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/Wireless-Carrier-Reps-Past-Present-2430807</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2430694</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2430575</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2431452</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432377</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2428508</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432379</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432380</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432381</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432383</loc><changefreq>monthly</changefreq></url>
<url><loc>http://www.linkedin.com/groups/gid-2432384</loc><changefreq>monthly</changefreq></url>
</urlset>

Summary

It really comes down to the right tool for the right situation and this one regex really did well. But Regex is not good at most xml parsing needs, but for certain scenarios it really shines. If the xml has malformed or the namespace was wrong, then the parser has its own unique problems which would lead to a bad count. All the technologies had to do some upfront loading and that is key to how they performed. Regex is optimized to handle large data efficiently and as long as the pattern is spot on, it can really be quick.

My thought is don’t dismiss regular expression parsing out of hand, while the learning of it can pay off in some unique text parsing situations.

Share

C#: Adding CData Sections to an Existing Node using XmlDocument and XDocument to handle HTML code or other problematic characters

There are situations where the data within nodes of Xml need to be handled due to special characters or html type tags. To handle that one must place data into CData sections. This artcle shows one how to do that in .Net using C# for both the XMLDocument as well as the XDocument.

XMLDocument Example

string myXml =
@"<?xml version='1.0' encoding='utf-8'?>
<WorkingSet>
 <Data>
 </Data>
</WorkingSet>";

XmlDocument doc1 = new XmlDocument();
doc1.LoadXml( myXml );

XmlNode target = doc1.SelectSingleNode( "WorkingSet/Data" );

if (target != null)
    target.AppendChild( doc1.CreateCDataSection( "<h1>Hello</h1>" ) );

XDocument Example

XDocument doc = XDocument.Parse( myXml, LoadOptions.SetLineInfo );

XElement dataNode = doc.Descendants( "Data" ).First();

dataNode.Add ( new XCData( "<h1>Hello</h1>" ));

Console.WriteLine( doc.ToString() );

Results

<?xml version="1.0" encoding="utf-8"?>
<WorkingSet>
 <Data><![CDATA[<h1>Hello</h1>]]></Data>
</WorkingSet>
Share

Tribal Knowledge: C# XDocument Copy Xml But Remove the Pesky DocType

stockxpertcom_id7474751_jpg_882845b5f523a87946c1e89ba7bb9621 In another of my series of Tribal Knowledge articles, this one discusses the basics of loading an XDocument and creating a different document from that original.

There may be a need for one to remove the document type from the original XDocument in C#, or do a basic copy and this is presented here.

How-To

Here is the Xml in a classic before:

<?xml version='1.0' encoding='utf-8'?>
<!-- Generator: AVG Magician 1.0.0, AVG Exports Plug-In . AVG Version: 2.00 Build 8675309)  -->
<!DOCTYPE svg PUBLIC '-//W3C//DTD AVG 1.1//EN' 'http://www.w3.org/Graphics/AVG/1.1/DTD/avg11.dtd'[]>
<avg version='1.1' id='Layer_1' x='0px' y='0px' xml:space='preserve'>
    <rect x='100.143' y='103.714' fill='none' width='87.857' height='12.143' />
</avg>

and this is what we want to achieve:

<?xml version='1.0' encoding='utf-8'?>
<avg version="1.1" id="Layer_1" x="0px" y="0px" xml:space="preserve">
    <rect x="100.143" y="103.714" fill="none" width="87.857" height="12.143" />
</avg>

Since we only care about the AVG node, its the main root, we will simply get that node and append it to our new clone. Here is the full code:

string xml = @"<?xml version='1.0' encoding='utf-8'?>
<!-- Generator: AVG Magician 1.0.0, AVG Export Plug-In . AVG Version: 2.00 Build 8675309)  -->
<!DOCTYPE svg PUBLIC '-//W3C//DTD AVG 1.1//EN' 'http://www.w3.org/Graphics/AVG/1.1/DTD/avg11.dtd'[]>
<avg version='1.1' id='Layer_1' x='0px' y='0px' xml:space='preserve'>
    <rect x='100.143' y='103.714' fill='none' width='87.857' height='12.143' />
</avg>";

XDocument loaded = XDocument.Parse( xml, LoadOptions.SetLineInfo );

XDocument clone = new XDocument( new XDeclaration( "1.0", "utf-8", "yes"),
    loaded.LastNode
    );

Console.WriteLine( clone );

The above achieves the after Xml which we seek, no DocType, we didn’t add it and no first node which is the comment line. I hope this little example helps.

Share

C# XML Parsing Extracting Values and using XML to Linq with XDocument

On my blog the most accessed articles are the basic ones on XML which has suprised me. So In this article I will focus on the new kid on the block the Linq derived XDocument. In the following example I will load the Xml then enumerate over the child nodes, to extract specific values and then display them.

I am loading the xml directly, but you can load it from other locations using the Load option of XDocument.

string xml = @"<?xml version='1.0' encoding='UTF-8'?>
<widgets>
    <widget>
        <url>~/Portal/Widgets/ServicesList.ascx</url>
        <castAs>ServicesWidget</castAs>
        <urlType>ascx</urlType>
        <parameters>
            <PortalCategoryId>3</PortalCategoryId>
        </parameters>
    </widget>
    <widget>
        <url>www.omegacoder.com</url>
        <castAs>ServicesWidget</castAs>
        <urlType>htm</urlType>
        <parameters>
            <PortalCategoryId>41</PortalCategoryId>
        </parameters>
    </widget>
</widgets>";

XDocument loaded = XDocument.Parse( xml );

var widgets = from x in loaded.Descendants( "widget" )
              select new
              {
                  URL = x.Descendants( "url" ).First().Value,
                  Category = x.Descendants( "PortalCategoryId" ).First().Value
              };

foreach ( var wd in widgets )
    Console.WriteLine( "Widget at ({0}) has a category of {1}", wd.URL, wd.Category );

/* Outputs:

Widget at (~/Portal/Widgets/ServicesList.ascx) has a category of 3
Widget at (www.omegacoder.com) has a category of 41

*/
Share

XPath meets Linq To XML in C#

There are two technologies which I feel that every coder should learn regardless of the language one programs in…those are Regular Expressions and XPath.

XPath is a quirkier language than Regular Expressions, but it is the the linqua-franca for working with xml documents. By specifying an XPath query which mirrors the target nodes one can get nodes from an xml document to be processed in the code.

The forum poster had xml data with a missing node, he had a car example. He wanted all the parent car nodes which had three tires (child nodes) and did not have four. Three wheel cars verse four wheel cars.

The following C# shows how to manipulate the Xpath and retrieve the nodes needed. Note one uses an xml editor with Xpath editing to figure out the stranger derivations.

using System.Xml.Linq;
using System.Xml.XPath;

...

string xml= @"<?xml version='1.0' encoding='UTF-8'?>
<Main>
<cars>
    <car name='Twingo'>
        <wheel1>abc</wheel1>
        <wheel2>def</wheel2>
        <wheel3>ghi</wheel3>
    </car>
    <car name='quattro'>
        <wheel1>some info</wheel1>
        <wheel2>more info</wheel2>
        <wheel3>blur</wheel3>
        <wheel4>We have four tires</wheel4>
    </car>
    <car name='Triumph'>
        <wheel1>some info</wheel1>
        <wheel2>more info</wheel2>
        <wheel3>blur</wheel3>
    </car>
</cars>
</Main>";

XDocument loaded = XDocument.Parse( xml );

IEnumerable<XElement> list1
   = loaded.XPathSelectElements( "//cars/car[not(wheel4)]" );

foreach ( XElement el in list1 )
    Console.WriteLine( el );

/* Outputs
<car name="Twingo">
  <wheel1>abc</wheel1>
  <wheel2>def</wheel2>
  <wheel3>ghi</wheel3>
</car>
<car name="Triumph">
  <wheel1>some info</wheel1>
  <wheel2>more info</wheel2>
  <wheel3>blur</wheel3>
</car>*/
Share

Replace Xml Node with Raw Xml in .Net

In this post I will simulate a user editing a raw XmlNode from an existing xml document. Say for example we extract the node to edit, with subnodes, and place that in a text box. The user edits the text adding whatever is needed, but keeping with the original nodes, and then signals they are done. Upon that signal we take that raw fragment from the textbox and insert it into the document. Here is the code:

   1: string xmlInitial =
   2: @"<?xml version='1.0'?>
   3: <Rules>
   4:     <OpenBalances function='ReOrderFifo'>
   5:         <column name='SecurityID' used='True'/>
   6:         <column name='COL2' used='False'>#@#</column>
   7:         <column name='COL3' used='False'>#@#</column>
   8:     </OpenBalances>
   9:     <ClosedBalances/>
  10: </Rules>";
  11:
  12: string xmlCreatedByTheUser =
  13: @"<OpenBalances function='ReOrderFifo' iAmNew='true'>
  14:     <column name='SecurityID' used='True'/>
  15:     <column name='COL2' used='False'>#@#</column>
  16:     <column name='COL3' used='False'>#@#</column>
  17: </OpenBalances>";
  18:
  19:     XmlDocument originalXml = new XmlDocument();
  20:
  21:     string targetNode = "descendant::*[name(.) ='OpenBalances']";
  22:
  23:     originalXml.LoadXml(xmlInitial);
  24:
  25:     // Simulate the selection of the subnode
  26:     // for the user to edit in the first nodes
  27:     // Rules.
  28:     XmlNode editNode = originalXml.SelectSingleNode(targetNode);
  29:
  30:     // Get a fragment and slide the changed data into it.
  31:     XmlDocumentFragment fragment = originalXml.CreateDocumentFragment();
  32:     fragment.InnerXml = xmlCreatedByTheUser;
  33:
  34:     // Replace the contents of the editNode with the user fragment.
  35:     editNode.ParentNode.ReplaceChild(fragment, editNode);
  36:
  37:     Console.WriteLine(originalXml.OuterXml);
  • Line 01: This is our original Xml which we will use.
  • Line 12: This is the simulated change by the user. The user adds one attribute iAmNew.
  • Line 21: We will use this Xpath to extract the node to work on for the user.
  • Line 23: We load the initial Xml into the document.
  • Line 28: Simulated extraction of the node to display to the user.
  • Line 31: Its important that we create a Xml fragment from our original XmlDocument. We could not prune another XmlDocument, or create a fragment on the fly, it must come from the original Xml.
  • Line 32: Simulated user change and loading from a TextBox.
  • Line 35: Do the replacement here.
  • Line 37: Output the Xml.

Console Output:

   1: <?xml version="1.0"?>
   2: <Rules>
   3:    <OpenBalances function="ReOrderFifo" iAmNew="true">
   4:        <column name="SecurityID" used="True" />
   5:        <column name="COL2" used="False">##@##</column>
   6:        <column name="COL3" used="False">##@##</column>
   7:    </OpenBalances>
   8:    <ClosedBalances />
   9: </Rules>
Share

Add Attribute to XmlDocument in .Net

Here is an example of adding an attribute to an XmlDocument in C# and .Net. The below code reads in Xml. Where there are nodes that do not contain and ID attribute, we will add that attribute using the name as the value.

 1: public static string xmlAcct =
 2: @"<?xml version='1.0' encoding='utf-8'?>
 3: <Accounts>
 4: <acct acct='aex113' country_code='us' name='abcde' />
 5: <acct acct='aex114' name='eeaad' country_code='us' />
 6: <acct acct='aex115' country_code='us' name='eoo' id='eoo9' />
 7: </Accounts>
 8: ";
 9:
 10: public static void AddAttribute()
 11: {
 12:
 13:     XmlDocument originalXml = new XmlDocument();
 14:
 15:
 16:     originalXml.LoadXml(xmlAcct);
 17:
 18:     XmlNodeList accts
 19:        = originalXml.SelectNodes("descendant::*[name(.) ='acct']");
 20:     XmlNode temp;
 21:     XmlNode name;
 22:     XmlAttribute attr;
 23:
 24:     foreach (XmlNode current in accts)
 25:     {
 26:         temp = current.SelectSingleNode("@id");
 27:         if (temp == null)
 28:         {
 29:             name = current.SelectSingleNode("@name");
 30:             if (name != null)
 31:             {
 32:                 attr = originalXml.CreateAttribute("id");
 33:                 attr.InnerText = name.InnerText;
 34:                 current.Attributes.Append(attr);
 35:             }
 36:         }
 37:
 38:     }
 39:
 40:     Console.WriteLine(originalXml.OuterXml);
 41:
 42:
 43: }
  • Line 01: Create Test XMl
  • Line 16: Load the test Xml.
  • Line 19: Get all the account nodes using Xpath.
  • Line 24: Work through each account node to add an attribute if ID does not exist.
  • Line 27: When null we need to add the ID node.
  • Line 29: We will use the name as the ID.
  • Line 32: We have to create the attribute off of the current node. Very important, for we can’t just slap any old node on. It has to be from the current branch/node.
  • Line 40: Display the changes.
Share