Ah, so I'm working at a level of generality that applies to all sorts of dynamical systems, including ones with no well-defined volume. As long as there's a conserved quantity , we can define the entropy as the log of the number of states with that value of . This is a univariate function of , and temperature can be defined as the multiplicative inverse of the derivative .
You still in general to specify which macroscopic variables are being held fixed when taking partial derivatives. Taking a derivative with volume held constant is different from one with pressure held constant, etc. It's not a universal fact that all such derivatives give temperature. The fact that we're talking about a thermodynamic system with some macroscopic quantities requires us to specify this, and we have various types of energy functions, related by Legendre transformations, defined based off which conjugate pairs of thermodynamic quantities they are functions.
By
I mean
for some constant that doesn't vary with time. So it's incompatible with Newton's law.
And I don't believe this proportionality holds, given what I demonstrated between the forms for what you get when applying this ansatz with versus with . Can you demonstrate, for example, that the two different proportionalities you get between and are consistent in the case of an ideal gas law, given that the two should differ only by a constant independent of thermodynamic quantities in that case?
Oh, the asymmetric formula relies on the assumption I made that subsystem 2 is so much bigger than subsystem 1 that its temperature doesn't change appreciably during the cooling process. I wasn't clear about that, sorry.
Since it seems like the non-idealized symmetric form would multiply one term by and the other term by , can you explain why the non-idealized version doesn't just reduce to something like Newton's law of cooling, then?
Here is some further discussion on issues with the law.
For an ideal gas, the root mean square velocity is proportional to . Scaling temperature up by a factor of 4 scales up all the velocities by a factor of 2, for example. This applies not just to the rms velocity but to the entire velocity distribution. The punchline is, looking at a video of a hot ideal gas is not distinguishable from looking at a sped-up video of a cold ideal gas, keeping the volume fixed.
Continuing this scaling investigation, for a gas with collisions, slowing down the playback of a video has the effect of increasing the time between collisions, and as discussed, slowing down the video should look like lowering the temperature. And given a hard sphere-like collision of two particles, scaling up the velocities of the particles involved also scales up the energy exchanged in the collision. So, just from kinetic theory, we see that the rate of heat transfer between two gases must increase if the temperature of both gases were increased by the same proportionality. This is what Newton's law of cooling says, and it is the opposite of what your proposed law says.
Here is a further oddity: your law predicts that an infinitely hot heat bath has a bounded rate of heat exchange with any system with a finite, non-zero temperature, which similar to the above, doesn't agree with how would understand it from the kinetic theory of gases.
I'm going to open up with a technical point: it is important, not only in general but particularly in thermodynamics, to specify what quantities are being held fixed when taking partial derivatives. For example, you use this relation early on:
.
This is a relationship at constant volume. Specifically, the somewhat standard notation would be
,
where U is the internal energy. The change in internal energy at constant volume is equal to the heat transfer, so it reduces to the relationship you used.
That brings us to the lemma you wanted to use:
.
To get what you wanted, it has to actually be the derivative with constant volume on the right, but then there's a problem: it doesn't succeed in giving you the time derivative of V since .
Let's assume that problem with the lemma can somehow be fixed, though, for sake of discussion. There's another issue, which is that if the proportionality depends on thermodynamic variables, then you can have basically any relationship. For example, your heat equation:
.
If these proportionalities were and , it would actually give Newton's law of cooling. For an ideal gas, the equation of state means that the change in internal energy (which is just the heat transfer at constant volume) ought to be directly proportional to the temperature change, with no dependence on other thermodynamic variables (besides ) in the proportionality.
Now we have your formula for the derivative of temperature:
.
As a side note, I'm not sure how this is a heat capacity; it doesn't match any of the heat capacity formulas I remember. But the appearance of is notable; it makes it look a lot closer to Newton's law of cooling, and comparing it to the earlier equation for heat shows how the proposed proportionalities from the first lemma contain dependence on other thermodynamic variables. But you changed from and to and before this, so it's worth remembering that there should be a symmetric relationship between the two subsystems. Multiplying both of the inverse temperature terms by a single temperature produces an asymmetry in the time derivatives for the two subsystems.
This asymmetry in the temperature dependence would predict that one subsystem will heat faster than the other subsystem cools, which would tend to violate energy conservation. If we just imagine an ideal gas in two separate with an identical number of particles in each container, any temperature increase in one gas has to be exactly compensated by an identical magnitude temperature decrease in the other gas, since the internal energy is just proportional to temperature.
So I argue that this proposed law does not hold up.
Note, though, that time reversal is still an anti-unitary operator in quantum mechanics in spite of the hand-waving argument failing when time reversal isn't a good symmetry. Even when time reversal symmetry fails, though, there's still CPT symmetry (and CPT is also anti-unitary).
I argue that counting branches is not well-behaved with the Hilbert space structure and unitary time evolution, and instead assigning a measure to branches (the 'dilution' argument) is the proper way to handle this. (See Wallace's decision-theory 'proof' of the Born rule for more).
The quantum state is a vector in a Hilbert space. Hilbert spaces have an inner product structure. That inner product structure is important for a lot of derivations/proofs of the Born rule, but in particular the inner product induces a norm. Norms let us do a lot of things. One of the more important things is we can define continuous functions. The short version is, for a continuous function, arbitrarily small changes to the input should produce arbitrarily small changes to the output. Another thing commonly used for vector spaces is linear operators, which are a kind of function that maps vectors to other vectors in a way that respects scalar multiplication and vector addition. We can combine the notion of continuous functions with linear operators and we get bounded linear operators.
While quantum mechanics contains a lot of unbounded operators representing observables (position, momentum, energy, etc.), bounded operators are still important. In particular, projection operators are bounded, and every self-adjoint operator, whether bounded or unbounded, has projection-valued measures. Projection-valued measures go hand-in-hand with the Born rule, and they are used to give the probability of a measurement falling on some set of values. There's an analogy with probability distributions. Sampling from an arbitrary distribution can in principle give an arbitrarily large number, and many distributions even lack a finite average. However, the probability of a sample from an arbitrary distribution falling in the interval [a,b] will always be a number between 0 and 1.
If we are careful to ask only about probabilities instead of averages, or even just to only ask about averages when the quantity is bounded, we can do practically everything in quantum mechanics with bounded linear operators. The expectation values of bounded linear operators are continuous functions of the quantum state. And so now we get to the core issue: arbitrarily small changes to the quantum state produce arbitrarily small changes to the expectation value of any bounded operator, and in particular to any Born rule probability.
So what about branch counting? Let's assume for sake of discussion that we have a preferred basis for counting in, which is its own can of worms. For a toy model, if we have a vector like (1, 0, 0, 0, 0, 0, ....) that we count as having 1 branch and a vector like (1, x, x, x, 0, 0, ....) that we're going to count as 4 branches if x is an arbitrarily small but nonzero number, this branch counting is not a continuous function of the state. If you don't know the state with infinite precision, you can't distinguish whether a coefficient is actually zero or just some really small positive number. Thus, you can't actually practically count the branches: there might be 1, there might be 4, there might be an infinite number of branches. On the other hand, the Born rule measure changes continuously with any small change to the state, so knowing the state with finite precision also gives finite precision on any Born rule measure.
In short, arbitrarily small changes to the quantum state can result in arbitrarily large changes to branch counting.
I will amend my statement to be more precise:
Everett's proof that the Born rule measure (amplitude squared for orthogonal states) is the only measure that satisfies the desired properties has no dependence on tensor product structure.
Everett's proof that a "typical" observer sees measurements that agree with the Born rule in the long term uses the tensor product structure and the result of the previous proof.
I kind of get why Hermitian operators here makes sense, but then we apply the measurement and the system collapses to one of its eigenfunctions. Why?
If I understand what you mean, this is a consequence of what we defined as a measurement (or what's sometimes called a pre-measurement). Taking the tensor product structure and density matrix formalism as a given, if the interesting subsystem starts in a pure state, the unitary measurement structure implies that the reduced state of the interesting subsystem will generally be a mixed state after measurement. You might find parts of this review informative; it covers pre-measurements and also weak measurements, and in particular talks about how to actually implement measurements with an interaction Hamiltonian.
I don't see how that relates to what I said. I was addressing why an amplitude-only measure that respects unitarity and is additive over branches has to use amplitudes for a mutually orthogonal set of states to make sense. Nothing in Everett's proof of the Born rule relies on a tensor product structure.
Why should (2,1) split into one branch of (2,0) and one branch of (0,1), not into one branch of (1,0) and one branch of (1,1)?
Again, it's because of unitarity.
As Everett argues, we need to work with normalized states to unambiguously define the coefficients, so let's define normalized vectors v1=(1,0) and v2=(1,1)/sqrt(2). (1,0) has an amplitude of 1, (1,1) has an amplitude of sqrt(2), and (2,1) has an amplitude of sqrt(5).
(2,1) = v1 + sqrt(2) v2, so we need M[sqrt(5)] = M[1] + M[sqrt(2)] for the additivity of measures. Now let's do a unitary transformation on (2,1) to get (1,2) = -1 v1 + 2 sqrt(2) v2 which still has an amplitude of sqrt(5). So now we need M[sqrt(5)] = M[2 sqrt(2)] + M[-1] = M[2 sqrt(2)] + M[1]. This can only work if M[2 sqrt(2)] = M[sqrt(2)]. If one wanted a strictly monotonic dependence on amplitude, that'd be the end. We can keep going instead and look at the vector (a+1, a) = v1 + a sqrt(2) v2, rotate it to (a, a+1) = -v1 + (a+1) sqrt(2) v2, and prove that M[(a+1) sqrt(2)] = M[a sqrt(2)] for all a. Continuing similarly, we're led inevitably to M[x] = 0 for any x. If we want a non-trivial measure with these properties, we have to look at orthogonal states.
I guess I don't understand the question. If we accept that mutually exclusive states are represented by orthogonal vectors, and we want to distinguish mutually exclusive states of some interesting subsystem, then what's unreasonable with defining a "measurement" as something that correlates our apparatus with the orthogonal states of the interesting subsystem, or at least as an ideal form of a measurement?
Material properties such as thermal conductivity can depend on temperature. The actual calculation of thermal conductivity of various materials is very much outside of my area, but Schroeder's "An Introduction to Thermal Physics" has a somewhat similar derivation showing the thermal conductivity of an ideal gas being proportional to √T based off the rms velocity and mean free path (which can be related to average time between collisions).