Advanced doit: ApplicationsΒΆ

Learning Objectives

  • Add the final step to compile the document.
  • Show how to import tasks.

We’ve gone over basic task definition and execution, and have seen how to define a simple workflow and run it with doit. However, we haven’t talked much about how the tasks are actually executed and what doit does. To motivate this discussion, let’s first define our final task, which compiles the document. Hopefully you’ve already downloaded pandoc; if not, head back to the home page and follow the link there.

def task_pandoc():

    cmd = 'pandoc -r markdown+yaml_metadata_block+startnum+fancy_lists'\
          ' -s -S %(dependencies)s -o %(targets)s'

    return {'actions': [cmd],
            'file_dep': [''],
            'targets': ['Melee_data.csv.document.pdf'],
            'clean': [clean_targets]}

This task compiles the markdown document into a nice PDF using LaTeX. Running doit after adding this task will have one of two results: either you’ll have the PDF document at the end, or you’ll get an error from pandoc. The error will occur if you don’t have LaTeX; it’s a big package, and we don’t want to make you install it just for this. So, what if we want the option to choose the file type for the pandoc output, to avoid using LaTeX?

There are two ways to do this. The first way is to use doit’s built-in task arguments functionality. This method is quite verbose, and documented at the link for the curious. The other way is to specify your parameters with argparse. This will require building a bit of scaffolding, and show you the beginnings of how to write complex applications with doit.

When we run our file with the doit command, it uses pattern matching to find all functions starting with “task” and runs the task generator on them. If the task functions aren’t captured by this step, they just return regular old dictionaries, which isn’t what we want! To get around this, we will use the built-in dict_to_task function to create a decorator we can apply to our task functions.

from doit.task import dict_to_task

def make_task(task_dict_func):
    '''Wrapper to decorate functions returning pydoit
    `Task` dictionaries and have them return pydoit `Task`
    def d_to_t(*args, **kwargs):
        ret_dict = task_dict_func(*args, **kwargs)
        return dict_to_task(ret_dict)
    return d_to_t

Okay, so what’s the point? Using this decorator, we can just define tasks as regular old functions, decorate them, and execute them with a loader whenever we want. That way, we can write our own infrastructure instead of relying on the doit command. We can write our own simple task loader by inheriting from the TaskLoader base class, like so:

from doit.cmd_base import TaskLoader
from doit.doit_cmd import DoitMain

def run_tasks(tasks, args, config={'verbosity': 0}):
    '''Given a list of `Task` objects, a list of arguments,
    and a config dictionary, execute the tasks.

    if type(tasks) is not list:
        raise TypeError('tasks must be of type list.')

    class Loader(TaskLoader):
        def load_tasks(cmd, opt_values, pos_args):
            return tasks, config

    return DoitMain(Loader()).run(args)

If this all seems a little obtuse, don’t be alarmed; this is most of what you need to start writing your own applications with doit. The TaskLoader is what parses a file or object and pulls tasks from it, and we have overridden the load_tasks method to take the list of Task objects we’re passing in. We then pass this Loader to DoitMain, which executes using it. You can read more here.

We’re now ready to create our application. Create a new file called (or whatever you’d like) and copy the above pieces of code into it. Then, copy the code from into it and decorate the tasks with make_task, like so:

def gunzip_data():
    return {'actions': ['gunzip -c %(dependencies)s > %(targets)s'],
            'targets': ['Melee_data.csv'],
            'file_dep': ['Melee_data.csv.gz']}

Finally, we’ll add the argument parsing. Defined a new function main, import argparse, and add one argument. We’ll also modify the pandoc function to take an argument.

def task_pandoc(outfmt='pdf'):

    cmd = 'pandoc -r markdown+yaml_metadata_block+startnum+fancy_lists'\
          ' -s -S %(dependencies)s -o %(targets)s'

    return {'actions': [cmd],
            'file_dep': [''],
            'targets': ['Melee_data.csv.document.{fmt}'.format(fmt=outfmt)],
            'clean': [clean_targets]}

def main():
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument('--outfmt', default='pdf')
    args = parser.parse_args()

    tasks = []

    run_tasks(tasks, ['run'])

if __name__ == '__main__':

We can now run this script with a regular python interpreter, like so:

$ python --outfmt md

Which will execute the tasks. Unfortunately, we get an error when we run this script! The reason has to do with our download task, which uses a yield statement, and doesn’t play well with the decorator. So, we’re going to making this task more generalized by removing the direct access to the URLs.

def task_download_data(URL, target=None):

    if target is None:
        target = os.path.basename(URL)

    def print_url(URL):
        print 'File was retrieved from: {0}'.format(URL)

    return {'name': 'download:{0}'.format(target),
            'actions': ['curl -OL {0}'.format(URL)],
            'targets': [target],
            'uptodate': [run_once],
            'clean': [clean_targets, (print_url, [URL])]}

# ... moar code

def main():
    # ...
    for URL in DATA_URLS:

The final modification we need to make is to add name attributes to our tasks, which are usually taken directly from the function names. Once we’re done there, we’ll have a working doit application.

To save some time and any headaches, a final, working form can be found here.