I have been interested for some time in a particular website – codereview.SE. It’s a website for getting your codes reviewed. I have been learning a lot from it. It’s still a test website so I wanted to find out whether it is growing or not. For that I decided to write a Python script for getting the stats on its front page and using them over many days to plot graphs.
Here’s the Python script
#! python3 import urllib.request import datetime FILE_NAME = 'data_file.txt' CURRENT_URL = 'http://codereview.stackexchange.com/' def today_date(): return datetime.date.today().strftime('%d-%m-%Y') def already_written(): with open(FILE_NAME, 'a+') as f: f.seek(0) first_line = f.readline() if today_date() == first_line[:-1]: return True return False def parse(line): """This separates the stat-name and associated number""" temp = [0, ''] braces = False for c in line: if c == '<': braces = True elif c == '>': braces = False elif braces is True or c in [' ', ',', '%']: continue elif c.isdigit(): temp *= 10 temp += int(c) else: temp += c return temp def write_stats(): '''This writes the stats into the file''' with open(FILE_NAME, 'r') as f: data = f.readlines() with open(FILE_NAME, 'w') as f: url_handle = urllib.request.urlopen(CURRENT_URL) f.write(today_date() + '\n') f.writelines(data[1:]) f.write(today_date() + ',') for line in url_handle: temp_line = str(line)[2:-5] if 'stats-value' in temp_line and 'label' in temp_line: temp = parse(temp_line) f.write(str(temp) + ',') f.write('\n') def main(): if not already_written(): write_stats() if __name__ == "__main__": main()
This will store the five stats shown on the front page of the website into a text file data_file in the following format
23-08-2013 dd-mm-yyyy, questions, answers, %answered, users, visitors/day 22-08-2013,9079,15335,88,26119,7407, 23-08-2013,9094,15354,88,26167,7585,
The above is the data I have currently in my file. The first line is for checking whether data has been added today or not. That will take care of duplicates in the file. The 2nd line was manually added to make sure I don’t forget what was what. Actually it was added so that I can use them as titles of graphs to be made for finding the growth of the website.
I have found that matplotlib library is really good for producing graphs. I’ll be using that for finding the growth. Obviously after I have some more data.