This article shows one how to use C# to read a specific web page and get it contents.
 
The first step is to include System.Net. Then we will use the HttpWebRequest HttpWebResponse to begin the process and will target this blogs special page Page 58 with special characters :
 
string target            = @"http://www.omegacoder.com/?p=58";
HttpWebRequest request   = (HttpWebRequest)WebRequest.Create(target);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
  • Line 01: We will use a throw away target on this blog!
  • Line 02: We create our request which starts the process.
  • Line 03: The response will be used to extract the data.

With that we now can extract the web page:

using (Stream responseStream = response.GetResponseStream())
    using (StreamReader htmlStream = new StreamReader(responseStream, Encoding.UTF8))
       Console.WriteLine(htmlStream.ReadToEnd());
  • Line 01: Extract the primary stream from the response.
  • Line 02: Using StreamReader we will specify that we might get non ascii characters in the UTF8 range.
  • Line 03: We will simply display it…you may want to place it into a target string for processing.

The below example we will process each line looking for a special marker on the page XXXX:

string line;

using (Stream responseStream = response.GetResponseStream())
    using (StreamReader htmlStream = new StreamReader(responseStream, Encoding.UTF8))
       while ((line = htmlStream.ReadLine()) != null)
          if (Regex.IsMatch(line, "XXXXX"))
             Console.WriteLine(line.Trim(new char[] { ' ', '\t' }));
  • Line 05: We will read each line individually for the regex processing.
  • Line 06: If there is a match, we have our line. Remember Regex is in System.Text.RegularExpressions.

The following is outputted with the special characters.

<p>XXXXXX ÅÅÅÅÅÅÅ  cccc</p>
Share