About

Welcome to Panela, Matt Harrison's take on mostly Open Source, Linux, Python, innovation in those areas, other buzzwords and Dick Proenneke. It comes complete with the illustrations as needed. Note the opinions expressed here are merely my opinions and not the opinions of my employer.

about Matt

Calendar

««Jul 2009»»
SMTWTFS
    1234
567891011
12131415161718
19202122232425
262728293031

Mailing List

My RSS Feeds








Boilerplate for maintainable, distributable, testable python code

posted 2007.09.04 Tue

There's a difference between code when you know you will be using it later and code that you're going to throw away. There's also a difference between being lazy and taking time to follow best practices. Below is an example of two command line programs that do the same thing. One is 3 lines long and the other is 100. (Of course the example is contrived, and usually the logic will be more than 3 lines. For cat I would usually use the command line cat) The effect of having the actual logic portion be small is that this is boilerplate code useful for starting python command line programs.

The extra length needs to buy us something. Here's my ideas of what it gives me:

  • Order - I know where items belong
  • Isolated Functionality - No (or limited) execution when importing/reusing
  • Logging - rather than dealing with prints that clutter the screen
  • Testing - I've got doctest built in, but I have a unittest file as well that gives decent coverage
  • Documentation - Illustrates module and function docs as well as doctests
  • Command line docs - use of optparse
  • __main__ - Illustrates use of main statement (with argument arg for reusability

So lazy web. Am I a pep8 following geek that just complicates things? Should boilerplate code include more? less?

I'm asking cause I'm doing two intro to python classes soon, and I thought that I might use this as an example since it covers a bunch of ground.

Throwaway Code

import sys

for line in open(sys.argv[1]):
    print line,

More complicated, yet testable, maintainable

#! /usr/bin/env python
"""
canonicalcat is an example of writing a python program that perhaps
you want to distribute or will be maintained over time.  As such one
should add documentation and this is an example of module level
documentation.  If one were to import canonicalcat and type
`help(canonicalcat)` it would spit this out
"""

# These are the imports from the python stdlib 
import sys
import logging
import optparse

# Some like to separate additional libraries from the standard ones


# file meta data
__version__ = "0.1"
__author__ = "matt harrison"
__license__ = "psf"


# GLOBAL ARGS/CONSTS
# Since python doesn't have constants, we can emulate them
# using (naming) conventions.  Put them here


# GLOBAL INIT
logging.basicConfig(filename=".concat.log", level=logging.DEBUG)


def cat_file(fin, outfile=None):
    """
    Main logic to cat file

    >>> import StringIO
    >>> fout = StringIO.StringIO()
    >>> cat_file("small.txt", fout)
    >>> lines = fout.getvalue()
    >>> lines
    'foo\nbar\n'
    
    """
    logging.log(logging.DEBUG, "CONCAT: %s"% fin)
    if outfile is None:
        fout = sys.stdout
    elif isinstance(outfile, str):
        fout = open(outfile, 'w')
    else:
        fout = outfile
        
    for line in open(fin):
        fout.write(line)


def _test():
    import doctest
    doctest.testmod()


def main(prog_args=None):
    """
    A main function.  Rather than putting this logic into the the if
    __name__ statement below, creating a main function allows other
    programs to use main logic.  It also allows for testing, by
    passing in args rather than monkey patching sys.argv (which won't
    be thread safe).

    >>> main(["canonicalcat.py", "small.txt"])
    foo
    bar
    
    """
    if prog_args is None:
        prog_args = sys.argv
        
    parser = optparse.OptionParser()
    parser.usage = """A python implementation of 'cat', default use is to
    provide a filename to cat"""
    parser.add_option("-o", "--output-file", dest="fileout",
                      help="specify file to cat to (default is stdout)")
    parser.add_option("-t", "--test", dest="test", action="store_true",
                      help="run doctests")

    opt, args = parser.parse_args(prog_args)

    if args[1:]:
        cat_file(args[1], opt.fileout)

    if opt.test:
        _test()
    else:
        parser.print_help()

    
if __name__ == "__main__":
    # when one "exectutes" a python program, it's __name__ is
    # __main__, otherwise it's name will be the module name
    main()

tags:      

links: digg this    del.icio.us    reddit




1. Grig Gheorghiu left...
2007.09.04 Tue 1:01 pm :: http://agiletesting.blogspot.com

I think you're right on. If you don't do all the testing- and maintenance-related stuff upfront, it will come back to bite you later on, when it will be much more painful. Great stuff!

Grig


2. Brad Fullenbach left...
2007.09.05 Wed 4:31 am

Hello.

Have you contacted google for a job recently?


3. Marius Gedminas left...
2007.09.05 Wed 11:16 am :: http://mg.pov.lt/blog

I think you're going a tiny bit over the top. I wouldn't include logging into the boilerplate, and I've come to the conclusion that large doctests embedded in docstrings obstruct rather than help reading the code. Really short usage examples are good, but anything longer should go into a separate test module.

My Python programs tend to have the shebang, the module docstring, def main(), optparse, and functions that do all the work. I usually forget about 'author' and other metadata globals and put all the info into the docstring. I don't use a boilerplate file; most of these things are so easy to remember that I type them on demand. There's one exception: optparse. I always have to refer to the documentation for its subtleties (in fact, I have a Tomboy note that I use as a cheat sheet).

Speaking of optparse, it's a bit strange how you use args[1:]. Usually when you use parser.parse_args() and don't pass sys.argv explicitly, you get back a list of arguments without the program name at the beginning. I'd suggest sticking to the idiom and passing sys.argv[1:] to parse_args().