Python script for getting stats of a website


I have been interested for some time in a particular website – codereview.SE. It’s a website for getting your codes reviewed. I have been learning a lot from it. It’s still a test website so I wanted to find out whether it is growing or not. For that I decided to write a Python script for getting the stats on its front page and using them over many days to plot graphs.

Here’s the Python script

#! python3
import urllib.request
import datetime

FILE_NAME = 'data_file.txt'
CURRENT_URL = 'http://codereview.stackexchange.com/'

def today_date():
    return datetime.date.today().strftime('%d-%m-%Y')

def already_written():
    with open(FILE_NAME, 'a+') as f:
        f.seek(0)
        first_line = f.readline()
        if today_date() == first_line[:-1]:
            return True
        return False

def parse(line):
    """This separates the stat-name and associated number"""
    temp = [0, '']
    braces = False
    for c in line:
        if c == '<':
            braces = True
        elif c == '>':
            braces = False
        elif braces is True or c in [' ', ',', '%']:
            continue
        elif c.isdigit():
            temp[0] *= 10
            temp[0] += int(c)
        else:
            temp[1] += c

    return temp

def write_stats():
    '''This writes the stats into the file'''
    with open(FILE_NAME, 'r') as f:
        data = f.readlines()

    with open(FILE_NAME, 'w') as f:
        url_handle = urllib.request.urlopen(CURRENT_URL)

        f.write(today_date() + '\n')
        f.writelines(data[1:])
        f.write(today_date() + ',')

        for line in url_handle:
            temp_line = str(line)[2:-5]
            if 'stats-value' in temp_line and 'label' in temp_line:
                temp = parse(temp_line)
                f.write(str(temp[0]) + ',')

        f.write('\n')

def main():
    if not already_written():
        write_stats()

if __name__ == "__main__":
    main()

This will store the five stats shown on the front page of the website into a text file data_file in the following format

23-08-2013
dd-mm-yyyy, questions, answers, %answered, users, visitors/day

22-08-2013,9079,15335,88,26119,7407,
23-08-2013,9094,15354,88,26167,7585,

The above is the data I have currently in my file. The first line is for checking whether data has been added today or not. That will take care of duplicates in the file. The 2nd line was manually added to make sure I don’t forget what was what. Actually it was added so that I can use them as titles of graphs to be made for finding the growth of the website.

I have found that matplotlib library is really good for producing graphs. I’ll be using that for finding the growth. Obviously after I have some more data.

Advertisements