Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
Ben12330

The other examples given at other safety levels are also bad, but it is worth noting that GPT-4 and Claude-2’s responses to this were if anything worse, since they flat out refuse to paly along and instead say ‘I am a large language model.’ In GPT-4’s case, this was despite an explicit system instruction I have put in to never say that.

 

I tried with GPT4 several times, and it played along correctly, though one response started with "As a New Yorker-based AI..."

Ben12330

To clarify those links are just generally about ethical implications of MWI. I don't think I've seen the inequality argument before!

Ben12310

Related LessWrong discussions: "Ethics in many worlds" (2020), "Living in Many Worlds" (2008), and some others. MWI ethics are also covered in this 80,000 Hours podcast episode. Mind-bending stuff.

Ben12310

They had to give you a toaster instead

Looks like this link is broken

Ben12321

Does the inner / outer distinction complicate the claim that all current ML systems are utility maximizers? The gradient descent algorithm performs a simple kind of optimization in the training phase. But once the model is trained and in production, it doesn't seem obvious that the "utility maximizer" lens is always helpful in understanding its behavior.

Ben12330

You could read the status game argument the opposite way: Maybe status seeking causes moral beliefs without justifying them, in the same way that it can distort our factual beliefs about the world. If we can debunk moral beliefs by finding them to be only status-motivated, the status explanation can actually assist rational reflection on morality.

Also the quote from The Status Game conflates purely moral beliefs and factual beliefs in a way that IMO weakens its argument. It's not clear that many of the examples of crazy value systems would survive full logical and empirical information.

Ben12360

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet iff you do, using their own calculator to decide whether to bet on X or ¬X.

This reformulation is just the basic 0-clones problem repeated, and it recommends no bet.

if X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000
if ¬X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000

Now recall the "double count" calculation for the original problem.

if X, EVT = 9900 = 0.99 × N winners × $10
if ¬X, EVT = ¯10 = -0.01 × N losers × $1000

Notice what's missing: The winners when ¬X and, crucially, the losers when X. This is a real improvement in value -- if you're one of the clones when X is true, there's no longer any risk of losing money. 


 

Ben12320

Is that a general solution? What about this: "Give me five dollars or I will perform an action, the disutility of which will be equal to twice that of you giving me five dollars, multiplied by the reciprocal of the probability of this statement being true."