Recently, I’ve been making a lot of progress on the lamprey transcriptome project, and that has involved a lot of IPython notebook. While I’ll talk about lamprey in a later post, I first want to talk about a nice technical tidbit I came up with while trying to manage a large IPython notebook with lots of figures. This involved learning some more about the internals of matplotlib, as well as the usefulness of the with statement in python.
So first, some background!
matplotlib is the go-to plotting
package for python. It has many weaknesses, and a whole series of posts
could be (and has been) written about why we should use something else,
but for now, its reach is long and it is widely used in the scientific
community. It’s particularly useful in concert with IPython notebook,
where figures can be embedded into cells inline. However, an important
feature(?) of matplotlib is that it’s built around a state machine; when
it comes to deciding what figure (and other components) are currently
being worked with, matplotlib keeps track of the current context
globally. That allows you to just call
plot() at any given time and
have your figures be pushed more or less where you’d like. It also
means that you need to keep track of the current context, lest you end
up drawing a lot of figures onto the same plot and producing a terrible
abomination from beyond space and time itself.
IPython has a number of ways of dealing with this. While in its inline mode, the default behavior is to simply create a new plotting context at the beginning of each cell, and close it at the cell’s completion. This is convenient because it means the user doesn’t have to open and close figures manually, saving a lot of coding time and boilerplate. It becomes a burden, however, when you have a large notebook, with lots of figures, some of which you don’t want to be automatically displayed. While we can turn off the automatic opening and closing of figures with
we’re now stuck with having to manage our own figure context. Suddenly,
our notebooks aren’t nearly as clean and beautiful as they once were,
being littered with ugly declarations of new figures and axes, calls to
plt.show(), and other such not-pretty things. I like
pretty things, so I sought out a solution. As it tends to do, python
Enter context managers!
Some time ago, many’s a programmer was running into a similar problem
with opening and closing files (well, and a lot of other use cases). To
do things properly, we needed to do exception handling to properly and
close() on our file pointers when something went wrong.
To handle such instances, python introduced context managers and the
From the docs:
A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code.
Though this completely washes out the ~awesomeness~ of context
managers, it does sound about like what we want! In simple terms,
context managers are just objects that implement the
__exit__ methods. When you use the
with statement on one of them,
__enter__ is called, where we put our setup code ; if it returns
something, it takes the name given it by
__exit__ is called
with block is left, and contains the teardown code. For our
purposes, we want to take care of matplotlib context. Without further
ado, let’s look at an example that does what we want:
Let’s break this down. The
__init__ actually does most of our setup
here; it takes some basic parameters to pass to
plt.subplots, as well
as some parameters for whether we want to show the plot and whether we
want to save the result to file(s). The
__enter__ method returns the
axes objects. Finally,
__exit__ saves the
figure to the file name with the given extensions (matplotlib uses the
extension to infer the file format), and shows the plot if necessary. It
plt.close() on the figure, deletes the
axes objects from
the figure, and calls
del on both instances just to be sure. The three
expected parameters to
__exit__ are for exception handling, which is
discussed in greater detail in the docs.
Here’s an example of how I used it in practice:
That’s taken directly out of the lamprey notebook where I first implemented this. I usually put a filelink in there, so that the resulting image can easily be viewed in its own tab for closer inspection.
The point is, all the normal boilerplate for handling figures is done in one line and the code is much more clear and pretty! And of course, most importantly, the original goal of not automatically displaying figures is also taken care of.
I consider this yak shaved.