All of Ben123's Comments + Replies

Ben12330

The other examples given at other safety levels are also bad, but it is worth noting that GPT-4 and Claude-2’s responses to this were if anything worse, since they flat out refuse to paly along and instead say ‘I am a large language model.’ In GPT-4’s case, this was despite an explicit system instruction I have put in to never say that.

 

I tried with GPT4 several times, and it played along correctly, though one response started with "As a New Yorker-based AI..."

Ben12330

To clarify those links are just generally about ethical implications of MWI. I don't think I've seen the inequality argument before!

Ben12310

Related LessWrong discussions: "Ethics in many worlds" (2020), "Living in Many Worlds" (2008), and some others. MWI ethics are also covered in this 80,000 Hours podcast episode. Mind-bending stuff.

2Shmi
Hmm, none talk about aversion to inequality being an important consideration for an MWI believer.
Ben12310

They had to give you a toaster instead

Looks like this link is broken

Ben123*21

Does the inner / outer distinction complicate the claim that all current ML systems are utility maximizers? The gradient descent algorithm performs a simple kind of optimization in the training phase. But once the model is trained and in production, it doesn't seem obvious that the "utility maximizer" lens is always helpful in understanding its behavior.

Ben12330

You could read the status game argument the opposite way: Maybe status seeking causes moral beliefs without justifying them, in the same way that it can distort our factual beliefs about the world. If we can debunk moral beliefs by finding them to be only status-motivated, the status explanation can actually assist rational reflection on morality.

Also the quote from The Status Game conflates purely moral beliefs and factual beliefs in a way that IMO weakens its argument. It's not clear that many of the examples of crazy value systems would survive full logical and empirical information.

9Wei Dai
The point I was trying to make with the quote is that many people are not motivated to do "rational reflection on morality" or examine their value systems to see if they would "survive full logical and empirical information". In fact they're motivated to do the opposite, to protect their value systems against such reflection/examination. I'm worried that alignment researchers are not worried enough that if an alignment scheme causes the AI to just "do what the user wants", that could cause a lock-in of crazy value systems that wouldn't survive full logical and empirical information.
Ben123*60

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet if... (read more)

4Nora Belrose
Yeah, I think this is right. It seems like the whole problem arises from ignoring the copies of you who see "X is false." If your prior on X is 0.5, then really the behavior of the clones that see "X is false" should be exactly analogous to yours, and if you're going to be a clone-altruist you should care about all the clones of you whose behavior and outcomes you can easily predict. I should also point out that this whole setup assumes that there are 0.99N clones who see one calculator output and 0.01N clones who see the opposite, but that's really going to depend on what exact type of multiverse you're considering (quantum vs. inflationary vs. something else) and what type of randomness is injected into the calculator (classical or quantum). But if you include both the "X is true" and "X is false" copies then I think it ends up not mattering.
Ben12320

Is that a general solution? What about this: "Give me five dollars or I will perform an action, the disutility of which will be equal to twice that of you giving me five dollars, multiplied by the reciprocal of the probability of this statement being true."

1A1987dM
Well, I'd rather lose twenty dollars than be kicked in the groin very hard, and the probability of you succeeding in doing that given you being close enough to me and trying to do so is greater than 1/2, so...