Knocking on the Future's Door
Our starting point is this Git repository.
In it,
you'll find a directory called novice
that contains
a mixture of IPython Notebooks (.ipynb
) and Markdown files (.md
).
We use notebooks for our lessons on Python because that's what we teach with;
We use Markdown for things like our lesson on Git.
(We used to use HTML, but people thought Markdown would be simpler to edit, diff, and merge.)
Our Makefile turns this all into the notes you see online by converting the notebooks to Markdown, and then converting those Markdown files, and the files actually written in Markdown, into HTML. We convert notebooks to Markdown rather than converting them directly to HTML so that we only need to maintain one template file for our website (the one describing the Markdown-to-HTML conversion) rather than two. Our hope was that we could then convert either the Markdown or the generated HTML to LaTeX, and compile that produce our PDF.
This ought to be simple.
IPython comes with a tool called nbconvert
that uses another tool called pandoc
to translate .ipynb
files into other formats,
and pandoc
can be installed and used directly
to translate Markdown to other formats as well.
Together,
those tools get us most of what we want—most, but not all.
For example,
we want to clearly distinguish user input from computer output.
Notebook cells have this information,
and the "Markdown" generated by nbconvert
helpfully retains that information
as a div
with an appropriate class:
<div class="in"> <pre>weight_kg = 55 print weight_kg</pre> </div> <div class="out"> <pre>55 </pre> </div>
We want the input and output blocks in lessons that are written in Markdown to have the same classes,
but there's no syntax in standard Markdown for putting classes on pre-formatted code blocks.
One hack is to use an extension in the Kramdown parser
to wrap the block in a div
:
<div class="in" markdown="1"> ~~~ weight_kg = 55 print weight_kg ~~~ </div>
Another is to rely on its support for the "PHP Extra" dialect of Markdown and do this:
~~~ weight_kg = 55 print weight_kg ~~~ {:class="in"}
which is less cluttered. The problem is, these classes aren't translated into LaTeX when we convert to PDF, so all of our pre-formatted blocks come out looking the same.
As another example,
our notes include a glossary
(as every good set of notes should).
This is stored in gloss.md
in the repository's root directory,
and lessons (both notebooks and Markdown files)
link to glossary entries like this:
...tell Git to make it a [repository](../../gloss.html#repository), which is...
which refers to an anchor in gloss.md
that looks like this:
**repository**: A storage area where a [version control](#version-control) system...
These links are retained correctly in the generated HTML, but are translated into hyperlinks in the LaTeX rather than intra-document references.
We know how to fix these problems, and all the others I haven't bothered to enumerate, but we shouldn't have to. Nothing we're doing is particularly strange—we're hardly the first people in science to want to create a glossary—but we now have to spend several hours (at least) to do something that "ought" to work out of the box.
I can rhyme off half a dozen reasons why what we're trying to do is the "right" way, but most scientists would (quite rightly) respond, "Yeah, but it doesn't actually work." It comes back once again to Glass's Law and the initial productivity dip that comes with any new way of doing things:
If you'd like to help us solve this particular problem, we would appreciate your assistance. If there's a simpler way to accomplish what we want, we'd appreciate a pointer even more: after all, a problem avoided is better than a problem solved. But most of all, we'd like to see more people working to close the gap between what is and what should be.