Advanced doit: Applications =========================== .. rubric:: Learning Objectives - Add the final step to compile the document. - Show how to import tasks. We've gone over basic task definition and execution, and have seen how to define a simple workflow and run it with `doit`. However, we haven't talked much about how the tasks are actually executed and what `doit` does. To motivate this discussion, let's first define our final task, which compiles the document. Hopefully you've already downloaded pandoc; if not, head back to the home page and follow the link there. .. code:: python def task_pandoc(): cmd = 'pandoc -r markdown+yaml_metadata_block+startnum+fancy_lists'\ ' -s -S %(dependencies)s -o %(targets)s' return {'actions': [cmd], 'file_dep': ['Melee_data.csv.document.md'], 'targets': ['Melee_data.csv.document.pdf'], 'clean': [clean_targets]} This task compiles the markdown document into a nice PDF using LaTeX. Running `doit` after adding this task will have one of two results: either you'll have the PDF document at the end, or you'll get an error from pandoc. The error will occur if you don't have LaTeX; it's a big package, and we don't want to make you install it just for this. So, what if we want the option to choose the file type for the pandoc output, to avoid using LaTeX? There are two ways to do this. The first way is to use doit's built-in task `arguments functionality `__. This method is quite verbose, and documented at the link for the curious. The other way is to specify your parameters with `argparse `__. This will require building a bit of scaffolding, and show you the beginnings of how to write complex applications with doit. When we run our file with the `doit` command, it uses pattern matching to find all functions starting with "task" and runs the task generator on them. If the task functions aren't captured by this step, they just return regular old dictionaries, which isn't what we want! To get around this, we will use the built-in `dict_to_task` function to create a decorator we can apply to our task functions. .. code:: python from doit.task import dict_to_task def make_task(task_dict_func): '''Wrapper to decorate functions returning pydoit `Task` dictionaries and have them return pydoit `Task` objects ''' def d_to_t(*args, **kwargs): ret_dict = task_dict_func(*args, **kwargs) return dict_to_task(ret_dict) return d_to_t Okay, so what's the point? Using this decorator, we can just define tasks as regular old functions, decorate them, and execute them with a loader whenever we want. That way, we can write our own infrastructure instead of relying on the `doit` command. We can write our own simple task loader by inheriting from the `TaskLoader` base class, like so: .. code:: python from doit.cmd_base import TaskLoader from doit.doit_cmd import DoitMain def run_tasks(tasks, args, config={'verbosity': 0}): '''Given a list of `Task` objects, a list of arguments, and a config dictionary, execute the tasks. ''' if type(tasks) is not list: raise TypeError('tasks must be of type list.') class Loader(TaskLoader): @staticmethod def load_tasks(cmd, opt_values, pos_args): return tasks, config return DoitMain(Loader()).run(args) If this all seems a little obtuse, don't be alarmed; this is most of what you need to start writing your own applications with doit. The `TaskLoader` is what parses a file or object and pulls tasks from it, and we have overridden the `load_tasks` method to take the list of `Task` objects we're passing in. We then pass this `Loader` to `DoitMain`, which executes using it. You can read more `here `__. We're now ready to create our application. Create a new file called `myapp.py` (or whatever you'd like) and copy the above pieces of code into it. Then, copy the code from `dodo.py` into it and decorate the tasks with `make_task`, like so: .. code:: python @make_task def gunzip_data(): return {'actions': ['gunzip -c %(dependencies)s > %(targets)s'], 'targets': ['Melee_data.csv'], 'file_dep': ['Melee_data.csv.gz']} Finally, we'll add the argument parsing. Defined a new function `main`, import argparse, and add one argument. We'll also modify the `pandoc` function to take an argument. .. code:: python @make_task def task_pandoc(outfmt='pdf'): cmd = 'pandoc -r markdown+yaml_metadata_block+startnum+fancy_lists'\ ' -s -S %(dependencies)s -o %(targets)s' return {'actions': [cmd], 'file_dep': ['Melee_data.csv.document.md'], 'targets': ['Melee_data.csv.document.{fmt}'.format(fmt=outfmt)], 'clean': [clean_targets]} def main(): import argparse parser = argparse.ArgumentParser() parser.add_argument('--outfmt', default='pdf') args = parser.parse_args() tasks = [] tasks.append(task_download_data()) tasks.append(task_gunzip_data()) tasks.append(task_plot_heatmap()) tasks.append(task_build_markdown_file()) tasks.append(task_pandoc(outfmt=args.outfmt)) run_tasks(tasks, ['run']) if __name__ == '__main__': main() We can now run this script with a regular python interpreter, like so: .. code:: bash $ python myapp.py --outfmt md Which will execute the tasks. Unfortunately, we get an error when we run this script! The reason has to do with our download task, which uses a `yield` statement, and doesn't play well with the decorator. So, we're going to making this task more generalized by removing the direct access to the URLs. .. code:: python @make_task def task_download_data(URL, target=None): if target is None: target = os.path.basename(URL) def print_url(URL): print 'File was retrieved from: {0}'.format(URL) return {'name': 'download:{0}'.format(target), 'actions': ['curl -OL {0}'.format(URL)], 'targets': [target], 'uptodate': [run_once], 'clean': [clean_targets, (print_url, [URL])]} # ... moar code def main(): # ... for URL in DATA_URLS: tasks.append(task_download_data(URL)) The final modification we need to make is to add `name` attributes to our tasks, which are usually taken directly from the function names. Once we're done there, we'll have a working doit application. To save some time and any headaches, a final, working form can be :download:`found here <_static/myapp.py>`.