Automating tasks in Python – Mass renaming of folders present in zip


So Python is getting useful to me for avoiding manual repetitive tasks. I used it to rename a lot of folders. There were around 700 folders in 28 zip files which I needed to rename and re-zip. That would have been really cumbersome if not for Python.

The folders’ name were like  0001 – abc 1 . Only that the numbers didn’t match. The first number was used for arrangement and it was the second number according to which I had to change the first number. Doing that for 700 folders would have been a real pain. Luckily I knew Python and wrote a script for doing this.
Here’s the Python script that I used.

import zipfile
import os
import shutil

CUR_DIR = os.getcwd()
FILES_IN_CUR_DIR = os.listdir(CUR_DIR)

def get_new_name(original_name):
    num = 0
    started= False
    for c in original_name[7:]:
        if not c.isdigit():
            if started is True:
                break
            continue
        else:
            started = True
            num *= 10
            num += int(c)
    return str(num).zfill(4) + original_name[4:]

def extract_files(zip_f, cur_path):
    with zipfile.ZipFile(zip_f, 'r') as zip_file:
        zip_file.extractall(cur_path)

def rename_files(cur_path):
    for files in os.listdir(cur_path):
        new_name = get_new_name(files)
        os.rename(os.path.join(cur_path, files),
                  os.path.join(cur_path, new_name))

def main():    
    files = [f for f in FILES_IN_CUR_DIR
             if f.endswith('.zip')]

    for f in files:
        cur_path = os.path.join(CUR_DIR, f[:-4])
        print('Extracting ' + f)
        extract_files(f, cur_path)
        print('Started renaming')
        rename_files(cur_path)
        print('Removing zip file ', f)
        os.remove(os.path.join(CUR_DIR, f))
        print('Writing back to ' + f)
        shutil.make_archive(cur_path, 'zip', cur_path)
        print('\n')if __name__ == "__main__":
    main()

This is just the first version that I used to get it to work. It creates temporary folders that I needed to delete but that was just one select and delete. Not a big thing. I plan to take care of the temporary folders too. The new one you can find in my github -> general -> Python33 -> Utilities maybe in a few days.

Advertisements

Python script for getting stats of a website


I have been interested for some time in a particular website – codereview.SE. It’s a website for getting your codes reviewed. I have been learning a lot from it. It’s still a test website so I wanted to find out whether it is growing or not. For that I decided to write a Python script for getting the stats on its front page and using them over many days to plot graphs.

Here’s the Python script

#! python3
import urllib.request
import datetime

FILE_NAME = 'data_file.txt'
CURRENT_URL = 'http://codereview.stackexchange.com/'

def today_date():
    return datetime.date.today().strftime('%d-%m-%Y')

def already_written():
    with open(FILE_NAME, 'a+') as f:
        f.seek(0)
        first_line = f.readline()
        if today_date() == first_line[:-1]:
            return True
        return False

def parse(line):
    """This separates the stat-name and associated number"""
    temp = [0, '']
    braces = False
    for c in line:
        if c == '<':
            braces = True
        elif c == '>':
            braces = False
        elif braces is True or c in [' ', ',', '%']:
            continue
        elif c.isdigit():
            temp[0] *= 10
            temp[0] += int(c)
        else:
            temp[1] += c

    return temp

def write_stats():
    '''This writes the stats into the file'''
    with open(FILE_NAME, 'r') as f:
        data = f.readlines()

    with open(FILE_NAME, 'w') as f:
        url_handle = urllib.request.urlopen(CURRENT_URL)

        f.write(today_date() + '\n')
        f.writelines(data[1:])
        f.write(today_date() + ',')

        for line in url_handle:
            temp_line = str(line)[2:-5]
            if 'stats-value' in temp_line and 'label' in temp_line:
                temp = parse(temp_line)
                f.write(str(temp[0]) + ',')

        f.write('\n')

def main():
    if not already_written():
        write_stats()

if __name__ == "__main__":
    main()

This will store the five stats shown on the front page of the website into a text file data_file in the following format

23-08-2013
dd-mm-yyyy, questions, answers, %answered, users, visitors/day

22-08-2013,9079,15335,88,26119,7407,
23-08-2013,9094,15354,88,26167,7585,

The above is the data I have currently in my file. The first line is for checking whether data has been added today or not. That will take care of duplicates in the file. The 2nd line was manually added to make sure I don’t forget what was what. Actually it was added so that I can use them as titles of graphs to be made for finding the growth of the website.

I have found that matplotlib library is really good for producing graphs. I’ll be using that for finding the growth. Obviously after I have some more data.

Making hierarchy of folders using Python


Suppose you want to make many folders aka directories for some purposes. Maybe you need to make many folders in a relative’s computers? Maybe you want to make many folders with same pattern in the names? Maybe someone has a habit of deleting files or formatting his hard drive and you have the responsibility of making a basic folder hierarchy>

I posted how to delete files so I thought to add one to make files also.

The hierarchy of directories that I am making is like this. Every indent means a subdirectory of the previous unindented one.

./project_euler
    ./001_050
    ./051_100
    ...
    ./401_450
./codechef
    ./easy
    ./medium
    ./hard
./spoj
./functions
./utilities

I made this script to learn Python mainly but it can be used to do same task of creating directories again and again easily. There are alternatives in Unix/Linux but in Windows it might be the easiest way.

#! python3
import os

TOP_LEVEL = ('spoj', 'functions', 'utilities', '_testing')
EULER_HIGHEST = 450
CODECHEF_FOL = ('easy', 'medium', 'hard')

def safe_make_folder(i):
    '''Makes a folder and its parent if not present'''
    try:
        os.makedirs(i)
    except:
        pass

def make_top_level():
    '''Makes folders with no subdirectories'''
    for i in TOP_LEVEL:
        safe_make_folder(i)

def make_euler_folders():
    '''Makes euler and its subdirectories'''
    for j in (os.path.join('project_euler', '{:03}_{:03}'.format(i, i + 49))
              for i in range(1,EULER_HIGHEST, 50)):
        safe_make_folder(j)

def make_codechef_folders():
    '''Makes codechef and its subdirectories'''
    for i in CODECHEF_FOL:
        safe_make_folder(os.path.join('codechef', i))

def main():
    make_top_level()
    make_euler_folders()
    make_codechef_folders()

if __name__ == "__main__":
    main()

This script’s comments are self-explanatory. Any questions? Just ask in the comments?

Opening websites using Python


Perhaps you need to open a set of websites for doing some work? Maybe opening facebook, twitter and other social networking websites? But you don’t want to click on many bookmarks or type the website’s name in your browser. Then this solution is for you. You can write different scripts for opening different set of websites. You can decide their sequence and even adjust the time between them.

Just change the names of websites in URLS, place them inside quotes ‘xyz_website’ and separate all the websites’ name by commas. Put any number in DELAY variable for the delay in seconds you want between opening of the webpages and you are good to go.

#! python3
import webbrowser
import time

URLS = [
    'http://codereview.stackexchange.com/',
    'http://programmers.stackexchange.com/',
    'http://stackoverflow.com/'
    ]
DELAY = 2

def main():
    for URL in URLS:
        webbrowser.open(URL, 2)
        time.sleep(DELAY)main()

Started writing a simple web crawler in Python for downloading a website


I have been having trouble looking through the C standard library so I thought to download the C library reference from cplusplus.com. I just wrote a simple script for downloading the front page and it is currently missing the style on the website but it got the basic contents. I’ll need to some work before it gives the best result.

I need to add code for recursively download everything but it’s a start.

Here’s my Python script.

#! python3
import urllib.request

filehandle = urllib.request.urlopen(
    'http://www.cplusplus.com/reference/clibrary/')

with open('test.html', 'w+b') as f:
    for line in filehandle:
        f.write(line)

filehandle.close()