Getting the HTML Document from IE using C#

You need to extract the html from the current web page in IE. This article details how to do that.

  1. Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
  2. Add two Com object references which will allow us to manipulate IE.

    image

  3. Note the code sample below does not require the using directive for the objects, so just add the code as is.
  4. Then find the instances of IE and extract the document:
SHDocVw.ShellWindows shellWindows
    = new SHDocVw.ShellWindowsClass();
 
string filename;
 
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
    filename
        = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
 
    if (filename.Equals("iexplore"))
    {
        Console.WriteLine("Web Site   : {0}", ie.LocationURL);
 
        mshtml.IHTMLDocument2 htmlDoc 
             = ie.Document as mshtml.IHTMLDocument2;
 
        Console.WriteLine("   Document Snippet: {0}", 
             ( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40) 
                                   : "***Failed***" ));
        Console.WriteLine("{0}{0}", Environment.NewLine);
    }
}

Here is a screen-shot of the output:

image

Share

11 Comments

  1. Netanel KL says:

    I’ve been trying to use this method to get the documents but all I get is this exception
    “Error HRESULT E_FAIL has been returned from a call to a COM component.”
    on this line:
    “mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;”
    any ideas?

  2. omegaman says:

    When I did this example it was VS2005 for .Net 2. If you are on a different version…there may be issues.

  3. Netanel KL says:

    You were right.
    I switched back from framework 3 and it fixed it.
    This article has helped me alot,
    Thanks!

  4. Dobermann says:

    How i can get cookie from web site?

  5. Dobermann says:

    htmlDoc
    sorry =)

  6. Holiwsh says:

    is possible make it for firefox?

  7. William L. Scheffer says:

    I am currently running VS2005 for .Net 2.0.50727 SP2 and keep getting the same HRUSULT E_FAIL message as user Netanel KL did.

    My references are as follows:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.IO;
    using SHDocVw;
    using mshtml;

    Any Idea’s

    Thanks

    Bill

  8. Steffen says:

    You should wrote on your Method the [STAThread] Attribute ;-)

  9. Sachin says:

    Thank you .. this helped

  10. Gabriel says:

    Hey guys I’m having the same problem as Netanel KL {

    “Error HRESULT E_FAIL has been returned from a call to a COM component.”
    on this line:
    “mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;”

    }

    But even if I return the .net framework for other versions, doesn’t work.

    Anyone can help me?

Leave a Reply