Home> Blog> Systematic Curriculum Design

Systematic Curriculum Design

This post originally appeared on the Software Carpentry website.

Executive summary: we'd appreciate your help organizing and motivating our material better.

One of the good things about traveling is that it gives me time to think. One of the bad things about thinking is that every time I do, I wind up with more work than I had when I started. For example, to organize and motivate our content, I'm using eight questions that scientists frequently ask:

  1. How can I manage this data?
  2. How can I process it?
  3. How can I tell if I've processed it correctly?
  4. How can I find and fix bugs when I haven't?
  5. How can I keep track of what I've done?
  6. How can I find and use other people's work?
  7. How can other people find and use mine?
  8. How can I do all these things faster?

On the other side of the equation I have a syllabus for the core Software Carpentry material, which includes:

  • the command-line shell (e.g., Bash)
  • version control
  • basic programming (variables, lists, loops, conditionals, and simple file I/O)
  • functions and libraries
  • databases (i.e., basic SQL queries)
  • matrix programming (e.g., MATLAB or NumPy)
  • quality assurance (defensive programming, testing, etc.)
  • dictionaries (or hashes, if you're a Perl programmer)
  • the development process (stepwise refinement, red-green-refactor, performance profiling)
  • web programming (by which we mean using web APIs, not providing services yourself)

In order to figure out how well we're helping scientists, we need to map their needs onto our content. Here's what I've come up with:

QuestionSubjectAnswer
How can I manage this data?The ShellUse directories and sub-directories with meaningful names.
Use filenames that can easily be matched with wildcards.
Use filename extensions that indicate the type of data in the file.
Use text unless there's a powerful reason to use something else.
Version ControlIf it's megabytes or less, put it under version control.
Basic ProgrammingCreate and use data formats that are easy for programs to parse.
Functions and Libraries
DatabasesStore it in a relational database.
Store each atom of information in its own field.
Make sure each record has a unique key.
Make sure that information is never duplicated.
Use foreign keys and joins to combine information from different tables.
Number CrunchingRepresent it as a matrix, because that's easy to process.
Quality
Sets and DictionariesStore it in a set or dictionary so that elements can be looked up by value rather than by position.
Development
Web ProgrammingFormat it as HTML (or XML, or some other widely-used format).
Separate content from presentation (e.g., use CSS for styling).
QuestionSubjectAnswer
How can I process it?The ShellUse Unix commands that manipulate lines of text.
Combine those commands using pipes and redirection.
Use loops to perform the same operations on many files.
Version Control
Basic ProgrammingWrite programs that use loops, file I/O, and string splitting to read data.
Use floating-point numbers unless you are sure all values (including calculated values) will always be integers.
Functions and LibrariesDefine functions to do simple operations, then combine those for more complicated effects.
Equivalently, describe what you would do in a language customized to your problem, then fill in the missing bits of code by creating functions.
DatabasesWrite SQL queries to select, filter, aggregate, and sort data.
Use a general-purpose programming language for everything else.
Number CrunchingUse a linear algebra package like NumPy.
Quality
Sets and DictionariesUse algorithms that don't depend on the order of items.
DevelopmentUse the right data structures.
Web ProgrammingUse an HTTP library to fetch it.
Use an XML or JSON library to parse it.
QuestionSubjectAnswer
How can I tell if I've processed it correctly?The Shell
Version Control
Basic ProgrammingTest your programs with small data sets whose results can be checked by hand.
Functions and Libraries
DatabasesBuild queries in small steps.
Run queries against small data sets whose output can be checked manually.
Number CrunchingCompare a program's output to analytic results, experimental results, simplified test cases, and previous programs.
Use tolerances when comparing results.
QualityCreate simple data sets for which the right answer can be calculated by hand.
Compare the results produced by the new program to results produced by older programs.
Sets and Dictionaries
DevelopmentMake code testable by dividing it into functions, and then replacing some functions with others for testing purposes.
Web Programming
QuestionSubjectAnswer
How can I find and fix bugs when I haven't?The Shell
Version Control
Basic Programming
Functions and Libraries
Databases
Number Crunching
QualityWrite test cases that fail when the bug is present, but pass when the bug is fixed.
Add assertions to programs to check its internal consistency.
Use a debugger.
Sets and Dictionaries
DevelopmentWrite tests.
Web Programming
QuestionSubjectAnswer
How can I keep track of what I've done?The Shell
Version ControlKeep your work under version control.
Check in whenever you've completed a significant change.
Write meaningful check-in comments.
Basic ProgrammingPut version control IDs in programs (and data files), and copy them forward to results.
Functions and LibrariesGive functions meaningful names.
Group related functions and related definitions into modules.
Write docstrings to explain what functions and modules do and how to use them.
DatabasesStore queries in files (just like programs).
Number Crunching
QualityTurn bug fixes into assertions and test cases.
Use a coverage analyzer to see what code is and isn't being tested.
Sets and Dictionaries
Development
Web ProgrammingUse meta headers in your HTML/XML data files.
QuestionSubjectAnswer
How can I find and use other people's work?The Shell
Version ControlGet it from their version control repositories.
Basic Programming
Functions and LibrariesUse the help function to read their documentation.
Databases
Number Crunching
Quality
Sets and Dictionaries
Development
Web ProgrammingAsk them to use well-formed URLs.
And to format it according to well-defined machine-readable standards (e.g., XML or JSON).
QuestionSubjectAnswer
How can other people find and use mine?The Shell
Version ControlPut your work in a publicly-accessible version control repository.
Basic Programming
Functions and LibrariesWrite docstrings to explain what functions and modules do and how to use them.
DatabasesRaise exceptions to signal errors so that other people can handle them as they think best.
Number Crunching
Quality
Sets and Dictionaries
Development
Web ProgrammingPut it on the web at a stable URL.
Format it according to well-defined machine-readable standards (e.g., XML or JSON).
Include meta-data.
QuestionSubjectAnswer
How can I do all these things faster?The ShellPut commands in shell scripts so that they can be re-used.
Version Control
Basic ProgrammingUse appropriate variable names so that people will waste less time trying to read programs.
Functions and LibrariesLearn to recognize and use common design patterns.
Databases
Number CrunchingUse a linear algebra package like NumPy.
QualityDesign code for testing.
Write test cases before writing new code.
Sets and DictionariesUse sets and dictionaries for sparse, irregular, or unordered data.
DevelopmentUse a profiler to figure out why code is slow before trying to optimize it.
Build code so that parts can be replaced easily.
Web Programming

In parallel with this, a group of us have been working on a paper describing best practices for computational science. The list we've converged on is:

  1. Write programs for people, not computers.
    • Programs should not require their readers to hold more than a handful of facts in memory at once.
    • Names should be consistent, distinctive, and meaningful.
    • Code style and formatting should be consistent.
    • All aspects of software development should be broken down into tasks roughly an hour long.
  2. Automate repetitive tasks.
    • Rely on the computer to repeat tasks.
    • Save recent commands in a file for re-use.
    • Use a build tool to automate scientific workflows.
  3. Use the computer to record history.
    • Software tools should be used to track computational work automatically.
  4. Make incremental changes.
    • Work in small steps with frequent feedback and course correction.
  5. Use version control.
    • Use a version control system.
    • Everything that has been created manually should be put in version control.
  6. Don't repeat yourself (or others).
    • Every piece of data must have a single authoritative representation in the system.
    • Code should be modularized rather than copied and pasted.
    • Re-use code instead of rewriting it.
  7. Plan for mistakes.
    • Add assertions to programs to check their operation.
    • Use an off-the-shelf unit testing library.
    • Turn bugs into test cases.
    • Use a symbolic debugger.
  8. Optimize software only after it works correctly.
    • Use a profiler to identify bottlenecks.
    • Write code in the highest-level language possible.
  9. Document the design and purpose of code rather than its mechanics.
    • Document interfaces and reasons, not implementations.
    • Refactor code instead of explaining how it works.
    • Embed the documentation for a piece of software in that software.
  10. Conduct code reviews.
    • Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems.
    • Use an issue tracking tool.

As you can see, this list only partially overlaps the "Answers" column in the table above. That makes me nervous: when two independent attacks on a problem yield two different answers, the odds are good that neither of them is right. I trust the "best practices" list more than I do the breakdown of our existing material, which leaves me with some awkward choices. Changing the motivating questions would feel like moving the goalposts so that I can declare victory with the content I have, but on the other hand, maybe there is a better way to carve up the space of things scientists want to do that will give a better mapping. Or are there connections between our content and those motivating questions that I'm just missing? Or do we really have the wrong content, i.e., are we teaching what we know, rather than what would actually be most useful to scientists?

Stepping back for a moment, the real point of this exercise is to ensure that:

  1. we're teaching what's most useful to our learners;
  2. everything we teach makes sense, and is seen as useful, when it first appears; and
  3. learners see the connections between ideas and between ideas and their application.

What we should really do is go one step further and figure out how to tell whether our learners can actually do the things embodied in our eight questions. We should then work backward from that assessment to figure out what demonstrable skills they need to acquire, then what understanding they need in order to become proficient with those skills, and then see how that maps onto our best practices. We've made a start toward this with the "driver's license" exam described in an earlier post; if you'd like to help us follow through, please get in touch.