I have been interested for some time in a particular website – codereview.SE. It’s a website for getting your codes reviewed. I have been learning a lot from it. It’s still a test website so I wanted to find out whether it is growing or not. For that I decided to write a Python script for getting the stats on its front page and using them over many days to plot graphs.
Here’s the Python script
FILE_NAME = 'data_file.txt'
CURRENT_URL = 'http://codereview.stackexchange.com/'
with open(FILE_NAME, 'a+') as f:
first_line = f.readline()
if today_date() == first_line[:-1]:
"""This separates the stat-name and associated number"""
temp = [0, '']
braces = False
for c in line:
if c == '<':
braces = True
elif c == '>':
braces = False
elif braces is True or c in [' ', ',', '%']:
temp *= 10
temp += int(c)
temp += c
'''This writes the stats into the file'''
with open(FILE_NAME, 'r') as f:
data = f.readlines()
with open(FILE_NAME, 'w') as f:
url_handle = urllib.request.urlopen(CURRENT_URL)
f.write(today_date() + '\n')
f.write(today_date() + ',')
for line in url_handle:
temp_line = str(line)[2:-5]
if 'stats-value' in temp_line and 'label' in temp_line:
temp = parse(temp_line)
f.write(str(temp) + ',')
if not already_written():
if __name__ == "__main__":
This will store the five stats shown on the front page of the website into a text file data_file in the following format
dd-mm-yyyy, questions, answers, %answered, users, visitors/day
The above is the data I have currently in my file. The first line is for checking whether data has been added today or not. That will take care of duplicates in the file. The 2nd line was manually added to make sure I don’t forget what was what. Actually it was added so that I can use them as titles of graphs to be made for finding the growth of the website.
I have found that matplotlib library is really good for producing graphs. I’ll be using that for finding the growth. Obviously after I have some more data.