The constant bound isn't not that relevant just because of the in principal unbounded size, it also doesn't constrain the induced probabilities in the second coding scheme much at all. It's an upper bound on the maximum length, so you can still have the weightings in codings scheme B differ differ in relative length by a ton, leading to wildly different priors
And of the encoding schemes that remain on the table, virtually all of them will behave identically with respect to the description lengths they assign to "natural" versus "unnatural" optimization criteria.
I have no idea how you're getting to this, not sure if it's claiming a formal result or just like a hunch. But I disagree both that there is a neat correspondence between a system being physically realizable and its having a concise implementation as a TM. Even granting that point, I don't think that nearly all or even most of these physically realisable systems will behave identically or even similarly w.r.t. how they assign codes to "natural" optimization criteria
If you parse this post as "attempting to impart a basic intuition that might let people (new to AI x-risk arguments) avoid certain classes of errors" rather than "trying to argue with the bleeding-edge arguments on x-risk", this post seems good
This seems reasonable in isolation, but it gets frustrating when the former is all Eliezer seems to do these days, with seemingly no attempt at the latter. When all you do is retread these dunks on "midwits" and show apathy/contempt for engaging with newer arguments, it makes it look like you don't actually have an interest in being maximally truth-seeking but instead like you want to just dig in and grandstand.
From what little engagement there is with novel criticisms of their arguments (like Nate's attempt to respond to Quintin/Nora's work), it seems like there's a cluster of people here who don't understand and don't particularly care about understanding some objections to their ideas and instead want to just focus on relitigating arguments they know they can win.
I can reason as follows: There is 0.5 chance that it is Heads. Let P represent the actual, unknown, state of the outcome of the toss (Heads or Tails); and let Q represent the other state. If Q, then anything follows. For example, Q implies that I will win $1 billion. Therefore the value of this bet is at least $500,000,000, which is 0.5 * $1,000,000, and I should be willing to pay that much to take the bet.
This doesn't go through, what you have are two separate propositions "H -> (T -> [insert absurdity here]" and "T -> (H -> [insert absurdity here]" [1], and actually deriving a contradiction from the consequent requires proving which antecedent obtains, which you can't do since neither is a theorem.
The distinction then with logical uncertainty is supposedly that you do already have a proof of the analogue of H or T, so you can derive the consequent that either H or T derives a contradiction
You don't really have these either, unless you can prove NOT(H AND T) i.e. can you definitively rule out a coin landing both heads and tails? But that's kinda pedantic
What are your thoughts on KL-div after the unembed softmax as a metric?
These results hold only if you assume risk aversion is entirely explained by a concave utility function, if you don't assume that then the surprising constraints on your preferences don't apply
IIRC that's the whole point of the paper - not that utility functions are in fact constrained in this way (they're not), but if you assume risk aversion can only come from diminishing maginal value of money (as many economists do), then you end up in weird places so maybe you should rethink that
I think what I'm getting at is more general than specifically talking about resources, I'm more getting at the degree of freedom in the problem description that lets you frame anything as technically optimizing something at a distance i.e. in 'Utility Maximization = Description Length Minimization' you can take any system, find its long-term and long-distance effects on some other region of space-time, and find a coding-scheme where those particular states have the shortest descriptions. The description length of the universe will by construction get minimized. Obviously this just corresponds to one of those (to us) very unnatural-looking "utility functions" over universe-histories or w/e
If we're first fixing the coding scheme then this seems to me to be equivalent to constraining the kinds of properties we're allowing as viable targets of optimization
I guess one way of looking at it is I don't think it makes sense to talk about a system as being an optimizer/not an optimizer intrinsically. It's a property of a system relative to a coding scheme/set of interesting properties/resources, everything is an optimizer relative to some encoding scheme. And all of the actual, empirical scariness of AI comes from how close the encoding scheme that by-definition makes it an optimizer is to our native encoding scheme - as you point out they'll probably have some overlap but I don't think that itself is scary
Thanks, I feel like I understand your perspective a bit better now.
Re: your "old" frame: I agree that the fact we're training an AI to be useful from our perspective will certainly constrain its preferences a lot, such that it'll look like it has preferences over resources we think in terms of/won't just be representable as a maximally random utility function. I think there's a huge step from that though to "it's a optimizer with respect to those resources" i.e there are a lot of partial orderings you can put over states where it broadly has preference orderings we like w.r.t resources without looking like a maximizer over those resources, and I don't think that's necessarily scary. I think some of this disagreement may be downstream of how much you think a superintelligence will "iron out wrinkles" like preference gaps internally though which is another can of worms
Re: your new frame: I think I agree that looking like a long-term/distance planner is much scarier. Obviously implicitly assuming we're restricting to some interesting set of resources, because otherwise we can reframe any myopic maximizer as long-term and vice-versa. But this is going round in circles a bit, typing this out I think the main crux here for me is what I said in the previous point in that I think there's too much of a leap from "looks like it has preferences over this resource and long-term plans" vs. "is a hardcore optimizer of said resource". Maybe this is just a separate issue though, not sure I have any local disagreements here
Re: your last pont, thanks - I don't think I have a problem with this, I think I was just misunderstanding the intended scope of the post
Thanks, I think that's a good distinction - I guess I have like 3 issues if we roll with that though
I think you've said something along the lines of one or two of these points in your links, sorry! Not expecting this to be super novel to you, half just helpful for me to get my own thoughts down explicitly
The actual result here looks right to me, but kinda surfaces a lot of my confusion about how people in this space use coherence theorems/reinforces my sense they get misused
You say:
This ties to a common criticism: that any system can be well-modeled as a utility maximizer, by simply choosing the utility function which rewards whatever the system in fact does. As far as I can tell, that criticism usually reflects ignorance of what coherence says
My sense of how this conversation goes is as follows:
"Utility maximisers are scary, and here are some theorems that show that anything sufficiently smart/rational (i.e. a superintelligence) will be a utility maximiser. That's scary"
"Literally anything can be modelled as a utility maximiser. It's not the case that literally everything is scary, so something's wrong here"
"Well sure, you can model anything as a utility maximiser technically, but the resource w.r.t which it's being optimal/the way its preferences are carving up state-space will be incredibly awkward/garbled/unnatural (in the extreme, they could just be utility-maximizing over entire universe-histories). But these are unnatural/trivial. If we add constraints over the kind of resources it's caring about/kinds of outcomes it can have preferences over, we constrain the set of what can be a utility-maximiser a lot. And if we constrain it to smth like the set of resources that we think in terms of, the resulting set of possible utility-maximisers do look scary"
Does this seem accurate-ish? If so it feels like this last response is true but also kind of vacuously so, and kind of undercuts the scariness of the coherence theorems in the first place. As in, it seems much more plausible that a utility-maximiser drawn from this constrained set will be scary, but then where's the argument we're sampling from this subset when we make a superintelligence? It feels like there's this weird motte-and-bailey going on where people flip-flop between the very unobjectionable "it's representable as a utility-maximiser" implied by the theorems and "it'll look like a utility-maximiser "internally", or relative to some constrained set of possible resources s.t. it seems scary to us" which feels murky and un-argued for.
Also on the actual theorem you outline here - it looks right, but isn't assuming utilities assigned to outcomes s.t. the agent is trying to maximise over them kind of begging most of the question that coherence theorems are after? i.e. the starting data is usually a set of preferences, with the actual work being proving that this along with some assumptions yields a utility function over outcomes. This also seems why you don't have to use anything like dutch-book arguments etc as you point out - but only because you've kind of skipped over the step where they're used
I feel like this could branch out into a lot of small disagreements here but in the interest of keeping it streamlined:
I agree with all of this, and wasn't gesturing at anything related to it, so I think we're talking past eachother. My point was simply that two UTMs even with not very-large prefix encodings can wind up with extremely different priors, but I don't think that's too relevant to what your main point is
I think I disagree with almost all of this. You can fix some gerrymandered extant physical system right now that ends up looking like a garbled world-history optimizer, I doubt that it would take on the order of length ~2^10^80 to specify it. But granting that these systems would in fact have astronomical prefixes, I think this is a ponens/tollens situation: if these systems actually have a huge prefix, that tells me that some the encoding schemes of some physically realisable systems are deeply incompatible with mine, not that those systems which are out there right now aren't physically realisible.
I imagine an objection is that these physical systems are not actually world-history optimizers and are actually going to be much more compressible than I'm making them out to be, so your argument goes through. In which case I'm fine with this, this just seems like a differing definition of what counts as when two schemes are acting "virtually identically" w.r.t to optimization criteria. If your argument is valid but is bounding this similarity to include e.g random chunks of a rock floating through space, then I'm happy to concede that - seems quite trivial and not at all worrying from the original perspective of bounding the kinds of optimization criteria an AI might have