Friday, November 16, 2012

Capture an Entire Web Page in a C# Console Application

Its fairly simple to incorporate a browser object into your .Net Windows Forms application.
Just by adding the WebBrowser Object you get the ability to display rich HTML files or browse to websites from your application.
Most programmers also notice the DrawToBitmap functions that enables you to capture the browser window and save it as an image.

A few problems almost always arise here are their solutions:

I cant find the DrawToBitmap function in the Intelisence

DrawToBitmap is part of the WebBrowserBase class. The WebBrowser class inherits from WebBrowserBase and so inherits the DrawToBitmap function.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowserbase.drawtobitmap.aspx



DrawToBitmap does not work correctly or produces a blank image

A common mistake is to write your capture code this way:


In order to capture the page one must wait for the page to load.
The way to implement is register to the WebBrowserDocumentCompletedEventHandler .
This way an event will fire when the web page will finish loading and you will be free to capture the image.

The capture doesn't capture the whole web page.

The drawToBitmap function only capturs the image displayed on the browser itself.
In order to capture the whole web page you must make sure that it is all visible.
The way I do it is by resizing the browser to match the scroll bars which almost always match the size of the page itself.

How can I capture the page with out the scroll bars

This one is a simple fix.
Just set the browsers ScrollBarsEnabled property to false.

Putting it all together, you get the following code:



Those were the basics to capture or convert a web page to an image in C# Forms but what if I want the ability to capture a web page from a console application ?

First we'll need to add some missing reference that aren't included by default:

using System.Windows.Forms;
using System.Web;
using System.Drawing;
using System.Threading; 



The next thing you'll probably notice is that when trying to run the same code that worked on a windows form you get "the current thread is not in a single-threaded apartment" exception.
A Google  search will reveal a simple solution, run the browser on a separate thread and set the threads apartment state as a single state apartment. This should stop the exception your seeing but will not fix all the problems.
The DocumentCompleted event we used will not fire up.

Since we are using an ActiveX compnent (The WebBrowser), to fix the issue,We have to create an STA thread that pumps a message loop.

 public void Capture()
        {
            var th = new Thread(() =>
            {
                browser = new WebBrowser();
                browser.DocumentCompleted += wb_DocumentCompleted;
                browser.Navigate(CaptureURL);
                Application.Run();
            });
            th.SetApartmentState(ApartmentState.STA);
            th.Start();
        }


To Sum it all up, I've created a project at codeplex, saving a screen capture of a webpage from a console application https://capturewebpage.codeplex.com/.
Download link is: https://capturewebpage.codeplex.com/downloads/get/533683
Feel free to download the project or leave comments.

5 comments:

  1. Thanks for your great help.
    But it doesn't work in WinForm applications using IE 8.
    Although it creates and saves a tall bitmap, but It only saves the top part (=displayed part) of the web page, not the whole web page.
    Do you have any solution for this problem, please?
    Thanks again.

    ReplyDelete
  2. it is very good but you must put more explain about how to use it

    ReplyDelete
    Replies
    1. I tried to make it as simple as I could. Where exactly do you recommend that I explain more ?
      In order to show how to actually use I posted a full solution at codeplex. the link is above.
      If you like you can also privately contact me at admin@codebetweenthelines.com

      Delete