I found pair programming pretty useful when starting a new project from scratch, when changes are likely to be interdependent. It is then better to work with, let's say, 1.5x the performance of a single developer on one thing a time, than to work separately and then try to reconcile the changes. Knowledge transfer is also very important at this stage (you get more people with the same vision of the fundamentals).
This generalizes to other cases when there is a "narrow front" - when few things can be worked on in parallel without stepping on each other's toes.
Even more generally, it seems there are three kinds of clear benefits:
1) Less change synchronization (fewer changes worked on at the time).
2) Knowledge transfer (see @FeepingCreature's answer).
3) Immediate, detailed review - probably fewer defects.
There is also a matter of raw throughput (or how much time is required to make a specific change, while the rest of the code is assumed to stay the same, ignoring the cost of syncing with any changes done in parallel). A naive baseline is that a pair has a throughput of a single developer (since they're working on one change at a time). Fortunately, it can be way better, because one person can just focus on the details on the code and the other on the slightly bigger picture and next steps, look up the relevant facts from the documentation etc. This eliminates a lot of context switching and limits the number of things that each developer needs to keep in working memory. Also a lot of typos and other simple problems get caught immediately, so there is less debugging to do. It's not so clear, what all of this stuff adds up to.
I was able to find some studies about the topic, including a meta-analysis by Hannay et al. TL;DR: it depends on the situation, including how experienced are the developers and how complex is the task). It's clearly not a silver bullet and generally it still seems to be a trade-off between person-hours spent and the quality of the produced software.
It's easy to play armchair statistician and contribute little, but I want to point out that the empirics cited here are effectively just anecdotes. The paper studies 13 pairs and 13 individuals in three assignments in one class at UUtah. Its estimate of relative time costs is only significant to ~1σ because development time has variance of (if I backsolved correctly) 65%, which...seems about right. Still, it seems like borderline abuse of frequentist statistics to argue that a two-tailed p<0.05 should be required to reject the hypothesis that pairs finish projects in half the wall-clock time of individuals (which is the null the analysis assumes).
That said, the author correctly identifies that quality matters significantly more than speed. The quality metric, however, is "assignment tests passed" in throwaway academic projects, eliding the questions of what quality failures would or wouldn't be caught by the review / CI workflows that an industrial project would be going through anyway.
So, finger to the wind, this study feels like it suggests that a pair spends 15% more person-hours (once they get used to each other) before turning their schoolwork in, and do 15% more of the work of the assignment than a student working alone. Consistent with the higher reported work-enjoyment numbers! Definitely a stronger showing than I would have guessed! But definitely not well-abstracted by "no significant result for time; significant improvement for quality".
What am I missing here?