Getting the HTML Document from IE using C#

Posted by OmegaMan at July 10, 2007

Category: How To

You need to extract the html from the current web page in IE. This article details how to do that.

  1. Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
  2. Add two Com object references which will allow us to manipulate IE.

    image

  3. Note the code sample below does not require the using directive for the objects, so just add the code as is.
  4. Then find the instances of IE and extract the document:
SHDocVw.ShellWindows shellWindows
    = new SHDocVw.ShellWindowsClass();
 
string filename;
 
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
    filename
        = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
 
    if (filename.Equals("iexplore"))
    {
        Console.WriteLine("Web Site   : {0}", ie.LocationURL);
 
        mshtml.IHTMLDocument2 htmlDoc 
             = ie.Document as mshtml.IHTMLDocument2;
 
        Console.WriteLine("   Document Snippet: {0}", 
             ( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40) 
                                   : "***Failed***" ));
        Console.WriteLine("{0}{0}", Environment.NewLine);
    }
}

Here is a screen-shot of the output:

image

Share

11 Comments

  1. Netanel KL says

    I’ve been trying to use this method to get the documents but all I get is this exception
    “Error HRESULT E_FAIL has been returned from a call to a COM component.”
    on this line:
    “mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;”
    any ideas?

    Reply
  2. omegaman says

    When I did this example it was VS2005 for .Net 2. If you are on a different version…there may be issues.

    Reply
  3. Netanel KL says

    You were right.
    I switched back from framework 3 and it fixed it.
    This article has helped me alot,
    Thanks!

    Reply
  4. Dobermann says

    How i can get cookie from web site?

    Reply
  5. Dobermann says

    htmlDoc
    sorry =)

    Reply
  6. Holiwsh says

    is possible make it for firefox?

    Reply
    • omegaman says

      No.

      Reply
  7. William L. Scheffer says

    I am currently running VS2005 for .Net 2.0.50727 SP2 and keep getting the same HRUSULT E_FAIL message as user Netanel KL did.

    My references are as follows:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.IO;
    using SHDocVw;
    using mshtml;

    Any Idea’s

    Thanks

    Bill

    Reply
  8. Steffen says

    You should wrote on your Method the [STAThread] Attribute 😉

    Reply
  9. Sachin says

    Thank you .. this helped

    Reply
  10. Gabriel says

    Hey guys I’m having the same problem as Netanel KL {

    “Error HRESULT E_FAIL has been returned from a call to a COM component.”
    on this line:
    “mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;”

    }

    But even if I return the .net framework for other versions, doesn’t work.

    Anyone can help me?

    Reply

Leave a comment

(required)
(required) (will not be published)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Recent Comments

    Thanks for this simple yet delightful code....
    Hey guys I'm having the same problem as Netanel KL { “Error HRESUL...
    Soluciono el problema con el primero (First step solved it for me)....
    Thank you! I was looking for this. You save my day!...
    Thank you! This was a big help. My requirement was a little differen...
    Thanks, after a small path adjustment it worked for Visual Studio 2015...
    Good call!...
    Thank you .. this helped...
    How about when you have a long string with no space? I tried it and it...
    Thank you, I ran into this same issues with Visual Studio 2015 after i...
  • Categories

  • Tags

  • Meta