Yes, the point of the proof isn't that the sane pure bets condition and the weak indifference condition are the be-all and end-all of corrigibility. But using the proof's result, I can notice that your AI will be happy to bet a million dollars against one cent that the shutdown button won't be pressed, which doesn't seem desirable. It's effectively willing to burn arbitrary amounts of utility, if we present it with the right bets.
...Ideally, a successful solution to the shutdown problem should violate one or both of these conditions in clear, limited wa
If we implement your example, the AI is willing to bet at arbitrarily poor odds that the on switch will be on, thus violating the sane pure bets condition.
You can have particular decision problems or action spaces that don't have the circular property of the Northland-Southland problem, but the fact remains that if an AI fulfills the weak indifference condition reliably, it must violate the sane pure bets scenario in some circumstances. There must be insane bets that it's willing to take, even if no such bets are available in a particular situation.
B...
After hearing the problem, the question I asked myself was: at what odds would I bet that the coin came up heads? And the answer is that I would have a neutral expected return betting at 2:1 odds. This lines up with the Bayesian answer of P(heads) = 1/3.
I strongly disagree that this was the point of this in TWC and would be highly surprised if Eliezer agreed with you. For one thing, the parties involved in nonconsensual sex in TWC seem to be having a perfectly fine time. I also wouldn't be surprised if someone raping an Ancient such that they have a terrible awful no-good time would fall under some other crime and still get the perpetrator arrested.
Conjure an IQ test and take it, obviously. My IQ when dreaming ranges from greenish-purple to twelveteen o'clock.
-Eliezer Yudkowsky trims his beard using Solmonoff Induction.
-Eliezer Yudkowsky, and only Eliezer Yudkowsky, possesses quantum immortality.
-Eliezer Yudkowsky once persuaded a superintelligence to stay inside of its box.
"Different Minds (You're Concepts Formed Differently From Mine)" should probably be "Different Minds (Your Concepts Formed Differently From Mine)."
The Philosopher's Polar North (can also be translated as The Philosopher's Apex) [Nhato Remix]
https://www.youtube.com/watch?v=pc4zZ43R9o0
Turn on captions to see the lyrics and their English translation. The song says a lot about searching for truth and knowledge that I find powerful.
Some excerpts from the translated lyrics:
Accepting even those facts I have denied, <before time melts my memories away>
I absentmindedly lift “truth” from an uneven distribution <as a god might guide fate>
Even my current knowledge is still uncertain, swaying,
so wit...
I think this might be a decent example of "rationalist" music. In particular, the lyrics communicate the value of seeking knowledge and discerning truth. There are parts that I disagree with, but overall I think it's pretty great for a Touhou soundtrack cover made by non-rationalists. Turn on captions to see the lyrics and their English translation.
The Philosopher's Polar North [Nhato Remix]
https://www.youtube.com/watch?v=pc4zZ43R9o0
I'll paste the translated lyrics here:
Accepting even those facts I have denied, <before time melts ...
I agree that in many examples, like simple risk/reward decisions shown here, certainty does not give an option higher utility. However, there are situations in which it might be advantageous to make a decision that has a worse expected outcome, but is more certain. The example that comes to mind is complex plans that involve many decisions which affect each other. There is a computational cost associated with uncertainty, in multiple possible outcomes must be considered in the plan; the plan "branches." Certainty simplifies things. As an agent with limited computing power in a situation where there is a cost associated with spending time on planning, this might be significant.
I don't think it's a generational thing, because I do object to the self-labeling freedom. Yes, it sounds bad to be against something called "freedom", but it is necessary unless you want to bite the bullet in favor of things like "freedom to make up whatever beliefs you want without evidence"—which is what I think is ultimately at stake here.
I want shared maps that reflect the territory. We want people to have the freedom to modify their body and social presentation in the territory, but I don't think this (not even the social presentation part) implies t... (read more)