Started writing a simple web crawler in Python for downloading a website


I have been having trouble looking through the C standard library so I thought to download the C library reference from cplusplus.com. I just wrote a simple script for downloading the front page and it is currently missing the style on the website but it got the basic contents. I’ll need to some work before it gives the best result.

I need to add code for recursively download everything but it’s a start.

Here’s my Python script.

#! python3
import urllib.request

filehandle = urllib.request.urlopen(
    'http://www.cplusplus.com/reference/clibrary/')

with open('test.html', 'w+b') as f:
    for line in filehandle:
        f.write(line)

filehandle.close()
Advertisements

5 thoughts on “Started writing a simple web crawler in Python for downloading a website

  1. I was curious if you ever thought of changing the
    page layout of your blog? Its very well written; I love what youve got to say.
    But maybe you could a little more in the way of content so people
    could connect with it better. Youve got an awful lot of text for only having one or 2 images.
    Maybe you could space it out better?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s