Dependencies and Task Status ============================ .. rubric:: Learning Objectives - Introduce task ``uptodate`` - Introduce file dependencies If you run ``doit`` again, you'll notice that you get the same output as before -- the dot and task name indicating that the task was run again. Preferably, we'd like the task not to run once it already has (the file might be large, for example). pydoit handles this two ways: through dependencies and the ``uptodate`` function. *File dependencies* are quite intuitive. The task names the files it depends on, and if any of those change, the task is rerun. Our download command already adds a bit of complexity though, because it does not depend on any files. This is where ``uptodate`` comes in. ``uptodate`` is another entry in the task dictionary. It is a list of booleans, callables (that is, functions), or strings specifying shell commands. If any of the ``uptodate`` items evaluates to ``False``, the task is out-of-date and run. For our download task, we are going to use a function which is included in the doit library, ``run_once``. As one might expect, this makes sure the task is run at least once. Let's add it to our ``dodo.py``. .. code:: python from doit.tools import run_once def task_download_data(): return {'actions': ['curl -OL https://s3.amazonaws.com/pydoit-intermediate/Melee_data.csv.gz'], 'targets': ['Melee_data.csv.gz'], 'uptodate': [run_once]} Now if we run ``doit`` again (twice more, actually), our output will change. .. code:: bash -- download_data The dash-dash indicates that the task was determined to be up to date, and was not executed. .. note:: By default, the task name will be taken from the function defining it. We can also define our own task names with the ``name`` entry in the task dictionary. doit's system for determining whether a task is up to date is actually somewhat complex. The `documentation `__ provides a more detailed list of conditions where a task is **not** up to date: - An uptodate item is (or evaluates to) ``False`` - A file is added to or removed from file\_dep - A file\_dep changed since last successful execution - A target path does not exist - A task has no file\_dep and uptodate item equal to ``True`` The last of these explains why we need to include the ``uptodate`` entry in our download task; it has no file dependencies, and so will always be considered out of date, even if the target exists. .. rubric:: Experimenting with uptodate What would happen if we changed ``uptodate`` to ``[True]``? How about ``[False]``? Now that we know more about how tasks are considered up to date, let's introduce a file dependency. The file we downloaded was a gzip archive, so we'll write a task to extract it. The command we would run in the shell might be: .. code:: bash $ gunzip Melee_data.csv.gz This would produce a file called ``Melee_data.csv``. We can see then that we have a *target* (``Melee_data.csv``), an *action* (running ``gunzip``), and a *file dependency* (``Melee_data.csv.gz``). Let's add the task to our ``dodo.py``. .. code:: python def task_gunzip_data(): return {'actions': ['gunzip -c %(dependencies)s > %(targets)s'], 'targets': ['Melee_data.csv'], 'file_dep': ['Melee_data.csv.gz']} On top of the file dependency, this task also introduces *automatic variables*. These are in the ``actions`` string, and are recognized by the task creator. This removes redundancy and saves us some code. When we run ``doit``, we get output showing that only the gunzip task was executed. .. code:: -- download_data . gunzip_data