Empirical Software Engineering Papers

The best short introduction to empirical software engineering is Robert Glass's book Facts and Fallacies of Software Engineering, but it's twelve years old now, and the field has exploded since it was published. Steve McConnell's Code Complete: A Practical Handbook of Software Construction is slightly more up to date, and the anthology Making Software: What Really Works, and Why We Believe It is more recent still, but they are both too long and too dense for most people.

If all you want is a sense of what's out there, It Will Never Work in Theory is an infrequently-updated blog of interesting new results. Some of my favorite entries are:

Let's Go to the Whiteboard: what programmers actually draw when they talk about programs, and why.
Understanding Broadcast Based Peer Review on Open Source Projects: explores the "toss it out there and hope someone catches it" model of code review used in many open source projects.
Usability Implications of Requiring Parameters in Objects' Constructors: a good introduction to the kinds of things that careful qualitative analysis can tell us about the usability of programming languages and their features.
Is Transactional Programming Actually Easier?: uses a quantitative approach to tackle the same kinds of usability questions.
An Experiment About Static and Dynamic Type Systems: another example of what careful, controlled experimentation can reveal about a language.
A Field Study of API Learning Obstacles: full of rich insights and practical implications relevant for anyone trying to improve the developer documentation of their products.
What Makes a Good Bug Report?: looks at what the bug reporter can and should do to get a faster and more useful response.
Does Adding Manpower Also Affect Quality?: its two main conclusions are that increased team size and linear growth are correlated with later periods of better product quality, but periods of accelerated team expansion are correlated with later periods of reduced software quality.
How, and Why, Process Metrics Are Better: found that if you want to predict how many bugs there are in a piece of code, you're better off looking at how the code was produced than at the code itself.
An Empirical Comparison of the Accuracy Rates of Novices using the Quorum, Perl, and Randomo Programming Languages: found that Perl is as hard to learn as a language with a randomly-designed syntax. Many of the comments on the blog and by email to the paper's authors were hostile; the lead author responded, and recently published a larger study using three different investigative techniques that bears out the results of the original paper. (I'm hoping to get them to repeat their study for Python, Perl, R, and MATLAB some time this year...)
The FCS1: A Language Independent Assessment of CS1 Knowledge: describes a concept inventory for basic programming concepts.
Halving Fail Rates using Peer Instruction: another landmark paper in CS education, this one showing that peer instruction improves retention in introductory programming classes.
A Decade of Research and Development on Program Animation: The Jeliot Experience: recapitulates a long-running computing education research project, and shows how both the authors' ideas and the field as a whole have evolved over the years.
Do Faster Releases Improve Software Quality?: explores what happened when Firefox shifted from occasional large releases to frequent small ones. Long story short, with shorter release cycles, users do not experience significantly more post-release bugs, and bugs are fixed faster, but users experience these bugs earlier during software execution (i.e., the program crashes earlier).
UML in Practice: an award-winning paper that explores why most programmers in industry don't use one of academics' favorite creations.
Reviews of Code Simplicity and The Essence of Software Engineering, two disappointing books that prove that many software engineering researchers still don't see what evidence has to do with anything.

I'd welcome pointers to other openly-access papers reporting empirical studies that are relevant to what we teach. (Unfortunately, and ironically, the ACM and IEEE are among the most backward of professional societies when it comes to open access publishing. As a result, a lot of really interesting work in this field currently languishes in unfindable obscurity behind their paywalls.)