Today's post is in response to the post "Quantum without complications", which I think is a pretty good popular distillation of the basics of quantum mechanics. 

For any such distillation, there will be people who say "but you missed X important thing". The limit of appeasing such people is to turn your popular distillation into a 2000-page textbook (and then someone will still complain). 

That said, they missed something!

To be fair, the thing they missed isn't included in most undergraduate quantum classes. But it should be.[1]

Or rather, there is something that I wish they told me when I was first learning this stuff and confused out of my mind, since I was a baby mathematician and I wanted the connections between different concepts in the world to actually have explicit, explainable foundations and definitions rather than the hippie-dippie timey-wimey bullshit that physicists call rigor. 

The specific point I want to explain is the connection between quantum mechanics and probability. When you take a quantum class (or read a popular description like "Quantum without complications") there is a question that's in the air, always almost but not quite understood. At the back of your mind. At the tip of your tongue.

It is all around us. Even now, in this very room. You can see it when you look out your window or when you turn on your television. You can feel it when you go to work... when you go to church... when you pay your taxes. It is the world that has been pulled over your eyes to blind you from the truth. 

The question is this: 

The complex "amplitude" numbers that appear in quantum mechanics really feel like probabilities. But everyone tells us they're not probabilities. What the hell is going on?

If you are brave, I'm going to tell you about it. Buckle up, Neo.

Quantum mechanics 101

Let me recap the standard "state space" quantum story, as exemplified by (a slight reinterpretation of) that post. Note that (like in the "Quantum without complications" post) I won't give the most general or the most elegant story, but rather optimize for understandability:

  1. The correct way to model the state of the quantum universe is as a state vector[2], Here the "bra-ket" notation is physics notation for "vector", with representing that we view v as a column vector and representing that we view it as a row vector (we'll expand this language in footnotes as we go along, since it will be relevant). The "caligraphic" letter represents a complex Hilbert space, which is (more or less) a fancy term for a (complex) vector space that might be infinite-dimensional. Moreover:
    1. We assume the Hilbert space has a basis of "pure states" (though different bases give more or less the same physics).
    2. We assume the state vector is a unit complex vector when expressed in the state basis: (here note that are complex numbers) , so [3].
    3. Whenever we think about quantum mechanics in this formulation, our Hilbert space actually depends on a "number of particles" parameter: where n is "the number of particles in the universe". In terms of the "pure states" basis, the basis of is In other words, we have a set of "one-particle" pure states , and an n-particle pure state is an n-tuple of elements of [4]. We think of these as "tuples of particle locations": so if we have single-particle states then for n = 4, we have 4-particle states and is 81-dimensional. For example contains the 4-particle state corresponding to "first particle at c, second particle at a, third particle at a, fourth particle at b,\) and also linear combinations like (note that this last vector is not a unit vector, and needs to be normalized by in order to be an allowable "state vector".
  2. Quantum states evolve in time, and the state of your quantum system at any time is fully determined by its state at time 0. This evolution is linear, given by the "evolution equation": Moreover:
    1. The operators are unitary (a natural complex-valued analog of "orthogonal" real-valued matrices). Note that a complex matrix is unitary if and only if it takes unit vectors to unit vectors.
    2. Importantly: Evolution matrices tend to mix states, so if a system started out in a pure state we don't in general expect it to stay pure at time .
    3. As t varies, the operator evolves exponentially. We can either model this by viewing the time parameter T as discrete, and writing (as is done in the other blog post), or we can use continuous time and write where is called the "Heisenberg matrix". Note that in order for to be unitary,   must be Hermitian.
  3. When you model the interaction of a quantum system with an external "observer", there is a (necessarily destructive, but we won't get into this) notion of "measurement" . You model measurements as a statistical process whose result is a probability distribution (in the usual sense, not any weird quantum sense) on some set of possible outcomes . Here the probabilities are nonnegative numbers which depend on the state and must satisfy when measured at some fixed quantum state .
    1. The most basic form of measurement , associated to the basis of pure states, returns one of the pure states The probability that the state is measured is the square norm of the coordinate of at pure state s. As a formula in bra-ket notation: Note that these are the squared norms of the coordinates of a unit complex vector -- thus they sum to one, as probabilities should. This is the main reason we want states to be unit vectors.  
    2. A second kind of measurement is associated to a pair of complementary orthogonal complex projections and (as they are orthogonal, ). The measurement then outputs one of two outcomes depending on whether it thinks is in the image of or . (While is of course a superposition, the measurement will always reduce the superposition information to a probability in an appropriate way).
    3. The above two scenarios have an obvious mutual generalization, associated with a collection of several orthogonal projections which sum to .

 Upshots

The important things to keep in mind from the above:

  • A state is a complex linear combination of pure states, where the are complex numbers whose squared norms sum to 1.
  • States evolve linearly in time. Pure states tend to get "mixed" with time.
  • A measurement is a process associated to the interaction between the quantum state and a (usually macroscopic) observer. It returns a probability distribution on some set of outcomes that depends on the state : i.e., it converts a quantum phenomenon to a statistical phenomenon.
  • One standard measurement that is always available returns a distribution over pure states. Its probability of returning the pure state is the squared norm of the s-coordinate of .

Statistical mechanics 101

The process of measurement connects quantum mechanics with statistical mechanics. But even if I hadn't talked about measurement in the last section, anyone who has studied probability would see a lot of parallels between the last section and the notion of Markov procesess.

Most people are intuitively familiar with Markov processes. A Markov process is a mathematical way of modeling some variable x that starts at some state s (which may be deterministic or already probabilistic) and undergoes a series of random transitions between states. Let me again give a recap:

  1. The correct way to model the state of the universe is as a probability distribution , which models uncertain knowledge about the universe and is a function from a set of deterministic states to real numbers. These must satisfy:

    1. for each deterministic state s.
    2. .

    We say that a probability distribution p is deterministic if there is a single state s with In this case we write .

  2. Probability distributions evolve in time, and the state of your statistical system at any time is fully determined by its state at time 0. This evolution is linear, given by the "evolution equation": In terms of probabilities, the matrix coefficient is the transition probability, and measures the probability that "if your statistical system was in state s at time 0, it occupies state s' at time t". In particular:
    1. The operators are Markovian (equivalent to the condition that each column is a probability distribution).
    2. Importantly: Evolution matrices tend to mix states, so if a system started out in a deterministic state we don't in general expect it to stay deterministic at time .
    3. As t varies, the operator evolves exponentially. We can either model this by viewing the time parameter t as discrete, and writing , or we can use continuous time and write where is called the "rate matrix". 

There are a hell of a lot of similarities between this picture and the quantum picture, though of course we don't have to separately introduce a notion of measurement here: indeed, in the quantum context, measurement converts a quantum state to a probability distribution but in statistics, you have a probability distribution from the start! 

However, there are a couple key differences as well. The standard one that everyone notices is that in the quantum picture we used complex numbers and in the statistical picture, we used real numbers. But there's a much more important and insidious difference that I want to bring your attention to (and that I have been bolding throughout this discussion). Namely: 

The "measurement" translation from quantum to statistical states is not linear.

Specifically, the "pure-state measurement" probability variable associated to a quantum state is quadratic in the vector (with coordinates ). 

This seems to dash the hopes of putting both the quantum and statistical pictures of the world on an equal footing, with perhaps some class of "mixed" systems interpolating between them. After all, while the dynamics in both cases are linear, there must be some fundamental nonlinearity in the relationship between the quantum and statistical worlds.

Right?

Welcome to the matrix

We have been lied to (by our quantum mechanics 101 professors. By the popular science magazines. By the well-meaning sci-fi authors). There is no such thing as a quantum state.

Before explaining this, let's take a step back and imagine that we have to explain probability to an intelligent alien from a planet that has never invented probability. Then here is one possible explanation you can give:

Probability is a precise measure of our ignorance about a complex system. It captures the dynamics of a "minimal bound" on the information we have about a set of "coarse" states in a subsystem S (corresponding to "the measurable quantities in our experimental setup") inside a large system U (corresponding to a maximally finegrained description of the universe)[5]

Now whenever we do quantum mechanics, we also implicitly are separate a "large" system into an "experimental setup" and an "environment". We think of the two as "not interacting very much", but notably measurement is inherently linked to thinking about the interaction of the system and its environment. 

And it turns out that in the context of quantum mechanics, whenever you are studying a subsystem inside a larger environment (e.g. you're focusing on only a subset of all particles in the universe, an area of space, etc.), you are no longer allowed to use states.

Density matrices

Instead, what replaces the "state" or "wavefunction" from quantum mechanics is the density matrix, which is a "true state" of your system (incorporating the "bounded information" issues inherent with looking at a subsystem). This "true state" is a matrix, or a linear operator, Note here a potential moment of confusion: in the old "state space" picture of quantum mechanics (that I'm telling you was all lies), the evolution operators were matrices from . The partition matrices happen to live in the same space, but they behave very differently and should by no means be thought of as the same "kind of object". In particular they are Hermitian rather than unitary.

Now obviously the old picture isn't wrong. If your system happens to be "the entire universe", then while I am claiming that you also have this new "density matrix evolution" picture of quantum mechanics, you still have the old "state vector" picture. You can get from one to the other via the following formula:

In other words, is the rank-1 complex projection matrix associated to your "old-picture" state .

Now the issue with states is that there is no way to take a universe state associated to a big system and convert it to a "system state" associated to a small or coarse subsystem. But there is a way to take the partition matrix associated to the big system and "distill" the partition matrix for the subsystem. It's called "taking a trace", and while it's easy to describe in many cases, I won't do this here for reasons of time and space (in particular, because I haven't introduced the necessary formalism to talk about system-environment separation and don't plan to do so). 

Going back to the relationship between the quantum state and the partition function: notice that the passage is quadratic. I forgot to bold: it's quadratic. 

What does this mean? Well first of all, this means that the "probability vector" associated to performing a measurement on the state is now a linear operation of the "improved" version of the state, namely the density matrix . This is a big deal! This means that we might be able to have a linear relationship with the "probability world" after all.

But does this mean that the linear evolution that Quantum mechanics posits on the nice vector turns into some quadratic mess? Luckily, the answer is "no". Indeed, the evolution remains linear. Namely just from the formula, we see the following identity is true for the "universal" state vector[6]: Now if you expand, you see that each entry of is linear in entries of . Thus evolution is given by a linear "matrix conjugation" operator where "Op" denotes the vector space of operators from to itself. Moreover, the evolution operators are unitary[7]

So what we've developed is a new picture: 

  • The "state" vector turns into the "density" matrix
  • "Evolution" operator turns into the "conjugation" operator

So now comes the big question. What if instead of the "whole universe", we are only looking at the dynamics of the "limited information" subsystem? Turns out there are two options here, depending on whether the Hilbert space associated with the subsystem is "coupled" (i.e., exchanges particles/ energy/ etc.) with the Hilbert space of the "environment" (a.k.a. the "rest of the universe"). 

  1. If is uncoupled to its environment (e.g. we are studying a carefully vacuum-isolated system), then we still have to replace the old state vector picture by a (possibly rank ) density matrix , the evolution on the density matrix is still nice and unitary, given by .
  2. On the other hand, if the system is coupled to its environment, the dynamics is linear but no longer unitary. (At least not necessarily). Instead of the unitary evolution, the dynamics on the "density matrix" space of operators is given by the "Lindbladian" evolution formula[8] (also called the "GKSL master equation", but that name sounds less cool). 

So at the end of the day we see two new things that occur when modeling any realistic quantum system:

  1. The relevant dynamics happens on the level of the density matrix. This makes the results of measurement linear in the state when viewed as a probability vector.
  2. The linear evolution matrix is not unitary. 

In fact, we can say more: the new dynamics interpolates between the unitary dynamics of "fully isolated quantum systems" and the Markovian dynamics of the stochastic evolution picture. In fact, if the interaction between the system and its environment exhibits weak coupling and short correlation-time (just words for now that identify a certain asymptotic regime, but note that most systems are like this macroscopically), then the Lindbladian dynamics becomes Markovian (at a suitable time step). Specifically if there are states, the density matrix at any point in time has terms. In this asymptotic regime, all the dynamics reduces to the dynamics of the diagonal density matrices, the N linear combinations of matrices of the form , though the different diagonal terms can get mixed. And on large timescales, this mixing is exactly described by a Markov process.

If you've followed me along this windy path, you are now awakened. You know three things:

  1. We live in the matrix. All that we can observe are matrix-shaped density matrices and -- that is unless you want to take the blue pill and live in a comfortable make-believe world which is, literally, "out of contact with reality [of the external environment]" -- there is no such thing as a quantum state.
  2. Statistics (and specifically Markov processes) are part of the same class of behaviors as unitary quantum evolution. In fact, statistical processes are the "default" for large systems.
  3. Realistic small systems which exist in an environment will exhibit a mix of probabilistic and quantum behaviors.

So can I brag to people that I've resolved all the "multiverse/decoherence" issues now?

Not really. Certainly, you can fully understand "measurement" in terms of these "corrected" quantum dynamics -- it's no longer a mystery (and has not been for a very long time). And you can design toy models where running dynamics on a "multiverse" exhibits everything a natural splitting into quantum branches and gives everything you want from decoherence. But the larger question of why and how different quantum "branches" decohere in our real, non-toy universe is still pretty hard and not a little mysterious. (I might write a bit more about this later, but I don't have any groundbreaking insights for you here.)

Who ordered that?

This is the famous apocryphal question asked by the physicist Isidor Isaac Rabi in response to the discovery of yet another elementary particle (the muon). So who ordered this matrix-flavored craziness, that the correct way to approach modeling quantum systems is by evolving a matrix (entries indexed by pairs of configurations) rather than just a single state?

In this case there actually is an answer: Liouville. Liouville ordered that. Obviously Liouville didn't know about quantum mechanics, but he did know about phase space[9]. Here I'm going to get a little beyond our toy "Quantum 101" and talk about wavefunctions (in an very, very hand-wavy way. Get it - waves). Namely, something interesting that happens when performing "quantization": passing from usual mechanics to quantum mechanics is that, weirdly, "space gets smaller". Indeed, knowing a bunch of positions of particles is not sufficient to know how they evolve in the classical world: you also need to know their velocities (or equivalently, momenta). So for example in single-particle classical physics in three dimensions, the evolution equation you get is not on single-particle "configuration space" , but on the space of (position, momentum) pairs, which is . In "wavefunction" quantum mechanics, your quantum state loses half of its dimension: the evolution occurs on just 3-dimensional wavefunctions. This is to some extent unavoidable: the uncertainty principle tells you that you can't independently set the position and the momentum of a particle, since position and momentum are actually two separate bases of the Hilbert space of wavefunctions. But on the other hand, like, classical physics exists. This means that in some appropriate "local/coarsegrained" sense of a particle in a box separated (but entangeled) from the environent of the rest of the universe, position and momentum are two meaningful quantities that can sort of cooccur.

Now there is a certain very natural and elegant quantum-classical comparison, called the "Wigner-Weyl transform", that precisely relates the space of operators on (or a more general configuration space) and functions on the phase space (or a more general phase space). Thus, when we think in the "density matrix" formalism, there is a natural translation of states and evolutions between them which (approximately) translates phase-space dynamics and density-space dynamics. So in addition to all the good properties of the density matrix formalism that I've (badly) explained above, we see a reasonable explanation for something else that was mysterious and nonsensical in the "typical" quantum story. 

But don't worry. If you're attached to your old nice picture of quantum mechanics where states are wavefunctions and evolution is unitary and nothing interesting ever happens, there's always the blue pill. The wavefunction will always be there.

 

  1. ^

    Along with the oscillating phase expansion, basics on Lie groups, products, and the Wigner-Weyl transform. Oh and did I mention that an intro quantum class should take 3 semesters, not one?

  2. ^

    Often called a "wavefunction"

  3. ^

    In terms of the bra-ket notation physicists write this requirement as The way you're supposed to read this notation is as follows:
    - If the "ket" is a column vector of complex numbers, then the same vector written as a "bra" fmeans Here the notation denotes "complex conjugate". 
    - When we write a ket and a bra together, we're performing matrix multiplication. So as above denotes "horizontal times vertical" vector multiplication (which is dot product and gives a scalar) and denotes "vertical times horizontal" vector multiplication (which is external multiplication and gives a matrix). A good heuristic to remember is that "stuff between two brackets is a scalar and stuff between two pipes is a matrix.

  4. ^

    There is often some discussion of distinguishable vs. indistinguishable particles, but it will not be relevant here and we'll ignore it.

  5. ^

    I initially wrote this in the text, but decided to replace with a long footnote (taking a page from @Kaarel), since it's not strictly necessary for what follows. 

    A nice way to make this precise is to imagine that in addition to our collection of "coarse states" which encode "information about the particular system in question", there is a much larger collection of "fine states" which we think of as encoding "all the information in the universe". (For convenience we assume both sets are finite.) For example perhaps the states of our system are 5-particle configurations, but the universe actually contains 100 particles (or more generally, our subsystem only contains coarse-grained information, like the average of a collection of particles, etc.). Given a state of the universe, i.e. a state of the "full/fine system", we are of course able to deterministically recover the state of our subsystem. I.e., we have a "forgetting information" map: In the case above of 5 particles in a 100-particle universe, the map F "forgets" all the particle information except the states of the first 5 particles. Conversely, given a "coarse" state , we have some degree of ignorance about the fine "full system" state that underlies it. We can measure this ignorance by associating to each coarse state a set namely its preimage under the forgetting map.

    Now when thinking of a Markov process, we assume that there is an "evolution" mapping that "evolves" a state of the universe to a new state of the universe in a deterministic way. Now given such an evolution on the "full system" states, we can try to think what "dynamics" it implies on subsystem states . To this end, we define the real number to be the average over (universe states underlying S) of the indicator function . De-tabooing the word "probability", this is just the probability that a random "total" state underlying the coarse state s maps to a "total" state underlying s' after time t.

    Now in general, it doesn't have to be the case that on the level of matrices, we have the Markov evolution behavior: e.g. that . For example we might have chosen the evolution mapping to be an involution with in which case is the identity matrix (whereas might have been essentially arbitrary). However there is an inequality involving entropy (that I'm not going to get into -- but note that entropy is explainable to the alien as just a deterministic function on probability distribution "vectors") that for a given value of the single-transition matrix , the least possible information you may have about the double-transition matrix is in a suitable sense "bounded" by . Moreover, there is a specific choice of "large system" dynamics, sometimes called a "thermal bath", which gives us time evolution that is (arbitrarily close to) . Moreover, any system containing a thermal bath will have no more information about multistep dynamics than a thermal bath. Thus in the limit of modeling "lack of information" about the universe, but conditional on knowing the single-time step coarse transformation matrix , it makes sense to "posit" that our k-step dynamics is .

  6. ^

    To prove the following formula holds, all we need is the identity for unitary matrices. Here the "dagger" notation is a matrix version of and takes a matrix to its "complex conjugate transpose" .

  7. ^

    Note that instead of all operators here, it would be sufficient to only look at the (real, not complex) subspace of Hermitian operators which satisfy . In this case, lacking complex structure, evolution would no longer be unitary: it would be orthogonal instead.

  8. ^

    If you read the massive footnote about "explaining probability theory to an alien" above, you know that whenever we talk about probabilities we are making a secret implicit assumption that we are in the "worst-case" informational environment, where knowing dynamics on the "coarse" system being observed gives minimal information about the environment -- this can be guaranteed by assuming the environment contains a "thermal bath". The same story applies here: a priori, it's possible that there is some highly structured interaction between the system and the environment that lets us make a "more informative" picture of the evolution, that would depend on the specifics of system-environment interaction; but if we assume that interactions with the environment are "minimally informative", then any additional details about the rest of the universe get "integrated out" and the Lindbladian is the "true answer" to the evolution dynamics. 

  9. ^

    The history is actually a bit tangled here with the term attributed to various people -- it seems the first people to actually talk about phase space in the modern way were actually Ludwig Boltzmann, Henri Poincaré, and Josiah Willard Gibbs.

1.
^

Along with the oscillating phase expansion, basics on Lie groups, products, and the Wigner-Weyl transform. Oh and did I mention that an intro quantum class should take 3 semesters, not one?

2.
^

Often called a "wavefunction"

3.
^

In terms of the bra-ket notation physicists write this requirement as The way you're supposed to read this notation is as follows:
- If the "ket" is a column vector of complex numbers, then the same vector written as a "bra" fmeans Here the notation denotes "complex conjugate". 
- When we write a ket and a bra together, we're performing matrix multiplication. So as above denotes "horizontal times vertical" vector multiplication (which is dot product and gives a scalar) and denotes "vertical times horizontal" vector multiplication (which is external multiplication and gives a matrix). A good heuristic to remember is that "stuff between two brackets is a scalar and stuff between two pipes is a matrix.

4.
^

There is often some discussion of distinguishable vs. indistinguishable particles, but it will not be relevant here and we'll ignore it.

5.
^

I initially wrote this in the text, but decided to replace with a long footnote (taking a page from @Kaarel), since it's not strictly necessary for what follows. 

A nice way to make this precise is to imagine that in addition to our collection of "coarse states" which encode "information about the particular system in question", there is a much larger collection of "fine states" which we think of as encoding "all the information in the universe". (For convenience we assume both sets are finite.) For example perhaps the states of our system are 5-particle configurations, but the universe actually contains 100 particles (or more generally, our subsystem only contains coarse-grained information, like the average of a collection of particles, etc.). Given a state of the universe, i.e. a state of the "full/fine system", we are of course able to deterministically recover the state of our subsystem. I.e., we have a "forgetting information" map: In the case above of 5 particles in a 100-particle universe, the map F "forgets" all the particle information except the states of the first 5 particles. Conversely, given a "coarse" state , we have some degree of ignorance about the fine "full system" state that underlies it. We can measure this ignorance by associating to each coarse state a set namely its preimage under the forgetting map.

Now when thinking of a Markov process, we assume that there is an "evolution" mapping that "evolves" a state of the universe to a new state of the universe in a deterministic way. Now given such an evolution on the "full system" states, we can try to think what "dynamics" it implies on subsystem states . To this end, we define the real number to be the average over (universe states underlying S) of the indicator function . De-tabooing the word "probability", this is just the probability that a random "total" state underlying the coarse state s maps to a "total" state underlying s' after time t.

Now in general, it doesn't have to be the case that on the level of matrices, we have the Markov evolution behavior: e.g. that . For example we might have chosen the evolution mapping to be an involution with in which case is the identity matrix (whereas might have been essentially arbitrary). However there is an inequality involving entropy (that I'm not going to get into -- but note that entropy is explainable to the alien as just a deterministic function on probability distribution "vectors") that for a given value of the single-transition matrix , the least possible information you may have about the double-transition matrix is in a suitable sense "bounded" by . Moreover, there is a specific choice of "large system" dynamics, sometimes called a "thermal bath", which gives us time evolution that is (arbitrarily close to) . Moreover, any system containing a thermal bath will have no more information about multistep dynamics than a thermal bath. Thus in the limit of modeling "lack of information" about the universe, but conditional on knowing the single-time step coarse transformation matrix , it makes sense to "posit" that our k-step dynamics is .

6.
^

To prove the following formula holds, all we need is the identity for unitary matrices. Here the "dagger" notation is a matrix version of and takes a matrix to its "complex conjugate transpose" .

7.
^

Note that instead of all operators here, it would be sufficient to only look at the (real, not complex) subspace of Hermitian operators which satisfy . In this case, lacking complex structure, evolution would no longer be unitary: it would be orthogonal instead.

8.
^

If you read the massive footnote about "explaining probability theory to an alien" above, you know that whenever we talk about probabilities we are making a secret implicit assumption that we are in the "worst-case" informational environment, where knowing dynamics on the "coarse" system being observed gives minimal information about the environment -- this can be guaranteed by assuming the environment contains a "thermal bath". The same story applies here: a priori, it's possible that there is some highly structured interaction between the system and the environment that lets us make a "more informative" picture of the evolution, that would depend on the specifics of system-environment interaction; but if we assume that interactions with the environment are "minimally informative", then any additional details about the rest of the universe get "integrated out" and the Lindbladian is the "true answer" to the evolution dynamics. 

9.
^

The history is actually a bit tangled here with the term attributed to various people -- it seems the first people to actually talk about phase space in the modern way were actually Ludwig Boltzmann, Henri Poincaré, and Josiah Willard Gibbs.

New Comment


34 comments, sorted by Click to highlight new comments since:

Treating the density matrix as fundamental is bad because you shouldn't explain with ontology that which you can explain with epistemology.

I've found our Agent Smith :) If you are serious, I'm not sure what you mean. Like there is no ontology in physics -- every picture you make is just grasping at pieces of whatever theory of everything you eventually develop

When you say there's "no such thing as a state," or "we live in a density matrix," these are statements about ontology: what exists, what's real, etc.

Density matrices use the extra representational power they have over states to encode a probability distribution over states. If we regard the probabilistic nature of measurements as something to be explained, putting the probability distribution directly into the thing we live in is what I mean by "explain with ontology."

Epistemology is about how we know stuff. If we start with a world that does not inherently have a probability distribution attached to it, but obtain a probability distribution from arguments about how we know stuff, that's "explain with epistemology."

In quantum mechanics, this would look like talking about anthropics, or what properties we want a measure to satisfy, or solomonoff induction and coding theory.

 

What good is it to say things are real or not? One useful application is predicting the character of physical law. If something is real, then we might expect it to interact with other things. I do not expect the probability distribution of a mixed state to interact with other things.

One person's "occam's razor" may be description length, another's may be elegance, and a third person's may be "avoiding having too much info inside your system" (as some anti-MW people argue). I think discussions like "what's real" need to be done thoughtfully, otherwise people tend to argue past each other, and come off overconfident/ underinformed. 

To be fair, I did use language like this so I shouldn't be talking -- but I used it tongue-in-cheek, and the real motivation given in the above is not "the DM is a more fundamental notion" but "DM lets you make concrete the very suggestive analogy between quantum phase and probability", which you would probably agree with.

For what it's worth, there are "different layers of theory" (often scale-dependent), like classical vs. quantum vs. relativity, etc., where there I think it's silly to talk about "ontological truth". But these theories are local conceptual optima among a graveyard of "outdated" theories, that are strictly conceptually inferior to new ones: examples are heliocentrism (and Ptolemy's epycycles), the ether, etc. 

Interestingly, I would agree with you (with somewhat low confidence) that in this question there is a consensus among physicists that one picture is simply "more correct" in the sense of giving theoretically and conceptually more elegant/ precise explanations. Except your sign is wrong: this is the density matrix picture (the wavefunction picture is genuinely understood as "not the right theory", but still taught and still used in many contexts where it doesn't cause issues).

I also think that there are two separate things that you can discuss.

  1. Should you think of thermodynamics, probability, and things like thermal baths as fundamental to your theory or incidental epistemological crutches to model the world at limited information?
  2. Assuming you are studying a "non-thermodynamic system with complete information", where all dynamics is invertible over long timescales, should you use wave functions or density matrices?

Note that for #1, you should not think of a density function as a probability distribution on quantum states (see the discussion with Optimization Process in the comments), and this is a bad intuition pump. Instead, the thing that replaces probability distributions in quantum mechanics is a density matrix.

I think a charitable interpertation of your criticism would be a criticism of #1 (putting limited-info dynamics -- i.e., quantum thermodynamics) as primary to "invertible dynamics". Here there is a debate to be had.

I think there is not really a debate in #2: even in invertible QM (no probability), you need to use density matrices if you want to study different subsystems (e.g. when modeling systems existing in an infinite, but not thermodynamic universe you need this language, since restricting a wavefunction to a subsystem makes it mixed). There's also a transposed discussion, that I don't really understand, of all of this in field theory: when do you have fields vs. operators vs. other more complicated stuff, and there is some interesting relationship to how you conceptualize "boundaries" - but this is not what we're discussing. So you really can't get away from using density matrices even in a nice invertible universe, as soon as you want to relate systems to subsystems.

For question #1 is reasonable (though I don't know how productive) to discuss what is "primary". I think (but here I am really out of my depth) that people who study very "fundamental" quantum phenomena increasingly use a picture with a thermal bath (e.g. I vaguely remember this happening in some lectures here). At the same time, it's reasonable to say that "invertible" QM phenomena are primary and statistical phenomena are ontological epiphenomena on top of this. While this may be a philosophical debate, I don't think it's a physical one, since the two pictures are theoretically interchangeable (as I mentioned, there is a canonical way to get thermodynamics from unitary QM as a certain "optimal lower bound on information dynamics", appropriately understood).

Still, as soon as you introduce the notion of measurement, you cannot get away from thermodynamics. Measurement is an inherently information-destroying operation, and iiuc can only be put "into theory" (rather than being an arbitrary add-on that professors tell you about) using the thermodynamic picture with nonunitary operators on density matrices.

people who study very "fundamental" quantum phenomena increasingly use a picture with a thermal bath

Maybe talking about the construction of pointer states? That linked paper does it just as you might prefer, putting the Boltzmann distribution into a density matrix. But of course you could rephrase it as a probability distribution over states and the math goes through the same, you've just shifted the vibe from "the Boltzmann distribution is in the territory" to "the Boltzmann distribution is in the map."

Still, as soon as you introduce the notion of measurement, you cannot get away from thermodynamics. Measurement is an inherently information-destroying operation, and iiuc can only be put "into theory" (rather than being an arbitrary add-on that professors tell you about) using the thermodynamic picture with nonunitary operators on density matrices.

Sure, at some level of description it's useful to say that measurement is irreversible, just like at some level of description it's useful to say entropy always increases. Just like with entropy, it can be derived from boundary conditions + reversible dynamics + coarse-graining. Treating measurements as reversible probably has more applications than treating entropy as reversible, somewhere in quantum optics / quantum computing.

Thanks for the reference -- I'll check out the paper (though there are no pointer variables in this picture inherently).

I think there is a miscommunication in my messaging. Possibly through overcommitting to the "matrix" analogy, I may have given the impression that I'm doing something I'm not. In particular, the view here isn't a controversial one -- it has nothing to do with Everett or einselection or decoherence. Crucially, I am saying nothing at all about quantum branches.

I'm now realizing that when you say map or territory, you're probably talking about a different picture where quantum interpretation (decoherence and branches) is foregrounded. I'm doing nothing of the sort, and as far as I can tell never making any "interpretive" claims.

All the statements in the post are essentially mathematically rigorous claims which say what happens when you 

  • start with the usual QM picture, and posit that
  • your universe divides into at least two subsystems, one of which you're studying
  • one of the subsystems your system is coupled to is a minimally informative infinite-dimensional environment (i.e., a bath).

Both of these are mathematically formalizable and aren't saying anything about how to interpret quantum branches etc. And the Lindbladian is simply a useful formalism for tracking the evolution of a system that has these properties (subdivisions and baths). Note that (maybe this is the confusion?) subsystem does not mean quantum branch, or decoherence result. "Subsystem" means that we're looking at these particles over here, but there are also those particles over there (i.e. in terms of math, your Hilbert space is a tensor product

Also, I want to be clear that we can and should run this whole story without ever using the term "probability distribution" in any of the quantum-thermodynamics concepts. The language to describe a quantum system as above (system coupled with a bath) is from the start a language that only involves density matrices, and never uses the term "X is a probability distribution of Y". Instead you can get classical probability distributions to map into this picture as a certain limit of these dynamics.

As to measurement, I think you're once again talking about interpretation. I agree that in general, this may be tricky. But what is once again true mathematically is that if you model your system as coupled to a bath then you can set up behaviors that behave exactly as you would expect from an experiment from the point of view of studying the system (without asking questions about decoherence).

There are some non-obvious issues with saying "the wavefunction really exists, but the density matrix is only a representation of our own ignorance". Its a perfectly defensible viewpoint, but I think it is interesting to look at some of its potential problems:

  1. A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and  |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states. In contrast, in the density matrix formulation these are alternative descriptions of the same machine. In any possible experiment, the two machines are identical.  Exactly how much of a problem this is for believing in wavefuntions but not density matrices is debatable - "two things can look the same, big deal" vs "but, experiments are the ultimate arbiters of truth, if experiemnt says they are the same thing then they must be and the theory needs fixing."
  2. There are many different mathematical representations of quantum theory. For example, instead of states in Hilbert space we can use quasi-probability distributions in phase space, or path integrals. The relevance to this discussion is that the quasi-probability distributions in phase space are equivalent to density matrices, not wavefunctions. To exaggerate the case, imagine that we have a large number of different ways of putting quantum physics into a mathematical language, [A, B, C, D....] and so on. All of them are physically the same theory, just couched in different mathematics language, a bit like say, ["Hello", "Hola", "Bonjour", "Ciao"...] all mean the same thing in different languages. But, wavefunctions only exist as an entity separable from density matrices in some of those descriptions.  If you had never seen another language maybe the fact that the word "Hello" contains the word "Hell" as a substring might seem to possibly correspond to something fundamental about what a greeting is (after all, "Hell is other people"). But its just a feature of English, and languages with an equal ability to greet don't have it. Within the Hilbert space language it looks like wavefunctions might have a level of existence that is higher than that of density matrices, but why are you privileging that specific language over others?
  3. In a wavefunction-only ontology we have two types of randomness, that is normal ignorance and the weird fundamental quantum uncertainty. In the density matrix ontology we have the total probability, plus some weird quantum thing called "coherence" that means some portion of that probability can cancel out when we might otherwise expect it to add together.  Taking another analogy (I love those), the split you like is [100ml water + 100ml oil], (but water is just my ignorance and doesn't really exist), and you don't like the density matrix representation of [200ml fluid total, oil content 50%]. Their is no "problem" here per se but I think it helps underline how the two descriptions seem equally valid. When someone else measures your state they either kill its coherence (drop oil % to zero), or they transform its oil into water. Equivalent descriptions.

All of that said, your position is fully reasonable, I am just trying to point out that the way density matrices are usually introduced in teaching or textbooks does make the issue seem a lot more clear cut than I think it really is.

A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and  |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states.

I wonder if this can be resolved by treating the randomness of the machines quantum mechanically, rather than having this semi-classical picture where you start with some randomness handed down from God. Suppose these machines use quantum mechanics to do the randomization in the simplest possible way - they have a hidden particle in state |left>+|right>  (pretend I normalize), they mechanically measure it (which from the outside will look like getting entangled with it) and if it's on the left they emit their first option (|0> or |+> depending on the machine) and vice versa.

So one system, seen from the outside, goes into the state |L,0>+|R,1>, the other one into the state |L,0>+|R,0>+|L,1>-|R,1>. These have different density matrices. The way you get down to identical density matrices is to say you can't get the hidden information (it's been shot into outer space or something). And then when you assume that and trace out the hidden particle, you get the same representation no matter your philosophical opinion on whether to think of the un-traced state as a bare state or as a density matrix. If on the other hand you had some chance of eventually finding the hidden particle, you'd apply common sense and keep the states or density matrices different.

Anyhow, yeah, broadly agree. Like I said, there's a practical use for saying what's "real" when you want to predict future physics. But you don't always have to be doing that.

You are completely correct in the "how does the machine work inside?" question. As you point out that density matrix has the exact form of something that is entangled with something else.

I think its very important to be discussing what is real, although as we always have a nonzero inferential distance between ourselves and the real the discussion has to be a little bit caveated and pragmatic. 

  1. A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and  |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states. In contrast, in the density matrix formulation these are alternative descriptions of the same machine. In any possible experiment, the two machines are identical.  Exactly how much of a problem this is for believing in wavefuntions but not density matrices is debatable - "two things can look the same, big deal" vs "but, experiments are the ultimate arbiters of truth, if experiemnt says they are the same thing then they must be and the theory needs fixing."

I like “different machines that produce different states”. I would bring up an example where we replace the coin by a pseudorandom number generator with seed 93762. If the recipient of the photons happens to know that the seed is 93762, then she can put every photon into state |0> with no losses. If the recipient of the photons does not know that the random seed is 93762, then she has to treat the photons as unpolarized light, which cannot be polarized without 50% loss.

So for this machine, there’s no getting away from saying things like: “There’s a fact of the matter about what the state of each output photon is. And for any particular experiment, that fact-of-the-matter might or might not be known and acted upon. And if it isn’t known and acted upon, then we should start talking about probabilistic ensembles, and we may well want to use density matrices to make those calculations easier.”

I think it’s weird and unhelpful to say that the nature of the machine itself is dependent on who is measuring its output photons much later on, and how, right?

Yes, in your example a recipient who doesn't know the seed models the light as unpolarised, and one who does as say, H-polarised in a given run. But for everyone who doesn't see the random seed its the same density matrix.

Lets replace that first machine with a similar one that produces a polarisation entangled photon pair, |HH> + |VV> (ignoring normalisation). If you have one of those photons it looks unpolarised (essentially your "ignorance of the random seed" can be thought of as your ignorance of the polarisation of the other photon).

If someone else (possibly outside your light cone) measures the other photon in the HV basis then half the time they will project your photon into |H> and half the time into |V>, each with 50% probability. This 50/50 appears in the density matrix, not the wavefunction, so is "ignorance probability".

In this case, by what I understand to be your position,  the fact of the matter is either (1) that the photon is still entangled with a distant photon, or (2) that it has been projected into a specific polarisation by a measurement on that distant photon. Its not clear when the transformation from (1) to (2) takes place (if its instant, then in which reference frame?).

So, in the bigger context of this conversation,
OP: "You live in the density matrices (Neo)"
Charlie :"No, a density matrix incorporates my own ignorance so is not a sensible picture of the fundamental reality. I can use them mathematically, but the underlying reality is built of quantum states, and that randomness when I subject them to measurements is fundamentally part of the territory, not the map. Lets not mix the two things up." 
Me: "Whether a given unit of randomness is in the map (IE ignorance), or the territory is subtle. Things that randomly combine quantum states (my first machine) have a symmetry over which underlying quantum states are being mixed that looks meaningful. Plus (this post), the randomness can move abruptly from the territory to the map due to events outside your own light cone (although the amount of randomness is conserved), so maybe worrying too much about the distinction isn't that helpful.

ϕt=Utϕ0U−1t.

I think you mean   here, not 

Your use of "pure state" is totally different to the standard definition (namely rank(rho)=1). I suggest using a different term.

To add: I think the other use of "pure state" comes from this context. Here if you have a system of commuting operators and take a joint eigenspace, the projector is mixed, but it is pure if the joint eigenvalue uniquely determines a 1D subspace; and then I think this terminology gets used for wave functions as well

Thanks - you're right. I have seen "pure state" referring to a basis vector (e.g. in quantum computation), but in QTD your definition is definitely correct. I don't like the term "pointer variable" -- is there a different notation you like?

I'd prefer "basis we just so happen to be measuring in". Or "measurement basis" for short.

You could use "pointer variable", but this would commit you to writing several more paragraphs to unpack what it means (which I encourage you to do, maybe in a later post).

Question: if I'm considering an isolated system (~= "the entire universe"), you say that I can swap between state-vector-format and matrix-format via

. But later, you say...

If  is uncoupled to its environment (e.g. we are studying a carefully vacuum-isolated system), then we still have to replace the old state vector picture  by a (possibly rank ) density matrix ...

But if , how could it ever be rank>1?

(Perhaps more generally: what does it mean when a state is represented as a rank>1 density matrix? Or: given that the space of possible s is much larger than the space of possible s, there are sometimes (always?) multiple s that correspond to some particular ; what's the significance of choosing one versus another to represent your system's state?)

The usual story about where rank > 1 density matrices come from is when your subsystem is entangled with an environment that you can't observe. 

The simplest example is to take a Bell state, say 

|00> + |11>  (obviously I'm ignoring normalization) and imagine you only have access to the first qubit; how should you represent this state? Precisely because it's entangled, we know that there is no |Psi> in 1-qubit space that will work. The trace method alluded to in the post is to form the (rank-1) density matrix of the Bell state, and then "trace out" the second system; if you think of the density matrix as living in M_2 tensor M_2, this means applying the trace operator just to the right side of the tensor, i.e. mapping matrix units E_ij tensor E_kl to delta_kl E_ij and then extending by linearity.

You can check that for this example you get the (normalized) 2x2 identity matrix. 

You can think of this tracing out process as a quantum version of marginalization. To get a feel for it intuitively, it's useful to consider the following: suppose you are given access to an endless supply of one-of-a-Bell-pair qubits, and you make repeated measurements, what will you see? 

It's pretty clear that if you measure in the standard basis, you'll have a 50/50 chance of measuring |0> or |1>. This is the sort of thing a first-timer might pattern match to an equal superposition but that's not correct, no matter what basis you measure in you'll obtain 50/50--this is because 

conceptually: you're measuring half an entangled state so the whole point is it can't yield a given state with certainly under measurement

mathematically: the Bell state can be written as 

|xx> + |yy> for any orthogonal states |x>, |y> , there's nothing special about the standard basis. So in any measurement, the entangled state has an equal chance of both being x and both being y, so sometime who can only see one qubit will see an equal chance of x and of y

formalism-ly (?): the rule for calculating measurement probabilities, |<x, y>|^2 = <x|(|y><y|)|x> where y is your state, and x is the state whose probability after measurement you wish to know, generalizes obviously to <x|rho|x> for any density matrix; in our case rho is a multiple of the identity, and all states are norm 1, so all potential measurement outcomes yield the same probability.

 

The point about the Lindbladian is that it's pretty generic for rank-1 states to evolve to higher rank mixed states; it's basically the same idea as decoherence: you entangle with the rest of the world, but then lose track of all the precise degrees of freedom, so you only really see a small subsystem of a large entangled state. 

Indeed it's true a given high rank density matrix can have multiple purifications--rank-1 states of which it is the traced out part corresponding to one subsystem--but that's to be expected, in this point of view: if we had perfect knowledge of the whole system, including everything our subsystem had ever become entangled with, we'd use a regular, pure state. The use of a mixed, higher rank density matrix corresponds to our loss of information to the "environment". And yes, the rank of the density matrix is related to the minimal dimension of the Hilbert space needed for an environment to purify your density matrix.

Actually, I have a little more to say:

Another way to think about higher-rank density matrices is as probability distributions over pure states; I think this is what Charlie Steiner's comment is alluding to. 

So, the rank-2 matrix from my previous comment,   can be thought of as

, i.e., an equal probability of observing each of . And, because  for any orthonormal vectors , again there's nothing special about using the standard basis here (this is mathematically equivalent to the argument I made in the above comment about why you can use any basis for your measurement). 

I always hated this point of view; it felt really hacky, and I always found it ugly and unmotivated to go from states  to projections  just for the sake of taking probability distributions. 

The thing above about entanglement and decoherence, IMO, is a more elegant and natural way to see why you'd come up with this formalism. To be explicit, suppose you have the state , and there is an environment state that you don't have access to, say it also begins in state , and initially everything is unentangled, so we begin in the state . Then some unitary evolution happens that entangles us, say it takes  to the Bell state 

As we've seen, you should think of your state as being , and now it's clear why this is the right framework for probabilistic mixtures of quantum states: it's entirely natural to think of your part of the now-entangled system to be "an equal chance of  and ", and this indeed gives us the right density matrix. It also immediately implies that you are forced to also allow that it could be represented as "an equal chance of   and " where , and etc. 

But it makes it clear why we have this non-uniqueness of representation, or where the missing information went: we don't just "have a probabilistic mixture of quantum states", we have a small part of a big quantum system that we can't see all of, so the best we can do is represent it (non-uniquely) as a probabilistic mixture of quantum states

Now, you aren't obliged to take this view, that the only reason we have any uncertainty about our quantum state is because of this sort of decoherence process, but it's definitely a powerful idea. 

Yeah, this also bothered me. The notion of "probability distribution over quantum states" is not a good notion: the matrix I is both (|0\rangle \langle 0|+|1\rangle \langle 1|) and (|a\rangle \langle a|+|b\rangle \langle b|) for any other orthogonal basis. The fact that these should be treated equivalently seems totally arbitrary. The point is that density matrix mechanics is the notion of probability for quantum states, and can be formalized as such (dynamics of informational lower bounds given observations). I was sort of getting at this with the long "explaining probability to an alien" footnote, but I don't think it landed (and I also don't have the right background to make it precise)

Ahhh! Yes, this is very helpful! Thanks for the explanation.

Why view the density matrix as an operator instead of as a tensor? Like I think of it as kinda similar to a covariance matrix (except not mean-standardized and also with a separate dimension for each configuration instead of being restricted to one dimension for each way the configurations could vary), with the ordinary quantum states being kinda similar to mean vectors.

I think the reason is that in quantum physics we also have operators representing processes (like the Hamiltonian operator making the system evolve with time, or the position operator that "measures" position, or the creation operator that adds a photon), and the density matrix has exactly the same mathematical form as these other operators (apart from the fact the density matrix needs to be normalized). 

But that doesn't really solve the mystery fully, because they could all just be called "matrices" or "tensors" instead of "operators". (Maybe it gets us halfway to an explanation, because all of the ones other than the density operator look like they "operate" on the system to make it change its state.)

Speculatively, it might be to do with the fact that some of these operators are applied on continuous variables (like position), where the matrix representation has infinite rows and infinite columns - maybe their is some technicality where if you have an object like that you have to stop using the word "matrix" or the maths police lock you up.

I feel like for observables it's more intuitive for them to be (0, 2) tensors (bilinear forms) whereas for density matrices it's more intuitive for them to be (2, 0) tensors. But maybe I'm missing something about the math that makes this problematic, since I haven't done many quantum calculations.

The way it works normally is that you have a state , and its acted on by some operator, , which you can write as  . But this doesn't give a number, it gives a new state like the old   but different. (For example if a was the anhilation operator the new state is like the old state but with one fewer photons). This is how (for example) an operator acts on the state of the system to change that state. (Its a density matrix to density matrix map).

In dimensions terms this is:  (1,1) = (1, 1) * (1,1)

(Two square matrices of size N multiply to give another square matrix of size N).

However, to get the expected outcome of a measurement on a particular state you take : where Tr is the trace. The trace basically gets the "plug" at the left hand side of a matrix and twists it around to plug it into the right hand side. So overall what is happening is that the operators  and , each have shapes (1,1) and what we do is:

Tr( (1,1) * (1,1)) = Tr( (1, 1) ) = number.

The "inward facing" dimensions of each matrix get plugged into one another because the matrices multiply, and the outward facing dimensions get redirected by the trace operation to also plug into one another. (The Trace is like matrix multiplication but on paper that has been rolled up into a cylinder, so each of the two matrices inside sees the other on both sides). The net effect is exactly the same as if they had originally been organized into the shapes you suggest of (2,0) and (0,2) respectively.

So if the two "ports" are called A and B your way of doing it gives:

(AB, 0) * (0, AB) = (0, 0) IE number

The traditional way:

Tr( (A, B) * (B, A) ) = Tr( (A, A) ) = (0, 0) , IE number.

I haven't looked at tensors much but I think that in tensor-land this Trace operation takes the role of a really boring metric tensor that is just (1,1,1,1...) down the diagonal.

So (assuming I understand right) your way of doing it is cleaner and more elegant for getting the expectation value of a measurement. But the traditional system works more elegantly for applying an operator too a state to evolve it into another state.

Yes, applying a (0, 2) tensor to a (2, 0) tensor is like taking the trace of their composition if they were both regarded as linear maps.

Anyway for operators that are supposed to modify a state, like annihilation/creation or time-evolution, I would be inclined to model it as linear maps/(1, 1)-tensors like in the OP. It was specifically for observables that I meant it seemed most natural to use (0, 2) tensors.

Its a density matrix to density matrix map

I thought they were typically wavefunction to wavefunction maps, and they need some sort of sandwiching to apply to density matrices?

 I thought they were typically wavefunction to wavefunction maps, and they need some sort of sandwiching to apply to density matrices?

 Yes, this is correct. My mistake, it does indeed need the sandwiching like this  .

From your talk on tensors, I am sure it will not surprise you at all to know that the sandwhich thing itself (mapping from operators to operators) is often called a superoperator.

I think the reason it is as it is is their isn't a clear line between operators that modify the state and those that represent measurements. For example, the Hamiltonian operator evolves the state with time. But, taking the trace of the Hamiltonian operator applied to the state gives the expectation value of the energy.

From your talk on tensors, I am sure it will not surprise you at all to know that the sandwhich thing itself (mapping from operators to operators) is often called a superoperator.

Oh it does surprise me, superoperators are a physics term but I just know linear algebra and dabble in physics, so I didn't know that one. Like I'd think of it as the functor over vector spaces that maps .

I think the reason it is as it is is their isn't a clear line between operators that modify the state and those that represent measurements. For example, the Hamiltonian operator evolves the state with time. But, taking the trace of the Hamiltonian operator applied to the state gives the expectation value of the energy.

Hm, I guess it's true that we'd usually think of the matrix exponential as mapping  to , rather than as mapping  to . I guess it's easy enough to set up a differential equation for the latter, but it's much less elegant than the usual form.

In some papers people write density operators using an enhanced "double ket" Dirac notation, where eg. density operators are written to look like |x>>, with two ">"'s. They do this exactly because the differential equations look more elegant.

I think in this notation measurements look like  <<m|, but am not sure about that. The QuTiP software (which is very common in quantum modelling) uses something like this under-the-hood, where operators (eg density operators) are stored internally using 1d vectors, and the super-operators (maps from operators to operators) are stored as matrices.

So structuring the notation in other ways does happen, in ways that look quite reminiscent of your tensors (maybe the same).

You can switch back and forth between the two views, obviously, and sometimes you do, but I think the most natural reason is because the operators you get are trace 1 positive semidefinite matrices, and there's a lot of theory on PSD matrices waiting for you. Also, the natural maps on density matrices, the quantum channels or trace preserving completely positive maps have a pretty nice representation in terms of conjugation when you think of density matrices as matrices: \rho \mapsto \sum_i K_i \rho K_i^* for some operators K_i that satisfy \sum_i K_i^*K_i = I 

Obviously all of these translates to the (0,2) tensor view, but a lot of theory was already built for thinking of these as linear maps on matrix spaces (or c* algebras or whatever fancier generalizations mathematicians had already been looking at)

The QM state space has a preferred inner product, which we can use to e.g. dualize a (0,2) tensor (i.e. a thing that eats takes two vectors and gives a number) into a (1,1) tensor (i.e. an operator). So we can think of it either way.

Oh I meant a (2, 0) tensor.

Same difference 

A couple things to add:

  1. Since every invertible square matrix can be decomposed as , you don't actually need a unitary assumption. You can just say that after billions of years, all but the largest Z-matrices have died out.
  2. There's another tie between statistics and quantum evolution called the Wick rotation. If you set , then  so the inverse-temperature is literally imaginary time! You can recover the Boltzmann distribution by looking at the expected number of particles in each state:  where  is the th eigenvalue (energy in the th state).