Getting the HTML Document from IE using C#

Posted by OmegaMan at July 10, 2007

Category: How To

You need to extract the html from the current web page in IE. This article details how to do that.

  1. Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
  2. Add two Com object references which will allow us to manipulate IE.

    image

  3. Note the code sample below does not require the using directive for the objects, so just add the code as is.
  4. Then find the instances of IE and extract the document:
SHDocVw.ShellWindows shellWindows
    = new SHDocVw.ShellWindowsClass();
 
string filename;
 
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
    filename
        = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
 
    if (filename.Equals("iexplore"))
    {
        Console.WriteLine("Web Site   : {0}", ie.LocationURL);
 
        mshtml.IHTMLDocument2 htmlDoc 
             = ie.Document as mshtml.IHTMLDocument2;
 
        Console.WriteLine("   Document Snippet: {0}", 
             ( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40) 
                                   : "***Failed***" ));
        Console.WriteLine("{0}{0}", Environment.NewLine);
    }
}

Here is a screen-shot of the output:

image

Share

10 Comments

  1. Netanel KL says

    I’ve been trying to use this method to get the documents but all I get is this exception
    “Error HRESULT E_FAIL has been returned from a call to a COM component.”
    on this line:
    “mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;”
    any ideas?

    Reply
  2. omegaman says

    When I did this example it was VS2005 for .Net 2. If you are on a different version…there may be issues.

    Reply
  3. Netanel KL says

    You were right.
    I switched back from framework 3 and it fixed it.
    This article has helped me alot,
    Thanks!

    Reply
  4. Dobermann says

    How i can get cookie from web site?

    Reply
  5. Dobermann says

    htmlDoc
    sorry =)

    Reply
  6. Holiwsh says

    is possible make it for firefox?

    Reply
    • omegaman says

      No.

      Reply
  7. William L. Scheffer says

    I am currently running VS2005 for .Net 2.0.50727 SP2 and keep getting the same HRUSULT E_FAIL message as user Netanel KL did.

    My references are as follows:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.IO;
    using SHDocVw;
    using mshtml;

    Any Idea’s

    Thanks

    Bill

    Reply
  8. Steffen says

    You should wrote on your Method the [STAThread] Attribute 😉

    Reply
  9. Sachin says

    Thank you .. this helped

    Reply

Leave a comment

(required)
(required) (will not be published)

This site uses Akismet to reduce spam. Learn how your comment data is processed.