All of Peter Merel's Comments + Replies

You may like to reply to Claude 3.5's summation of the argument in my comment above, which is both shorter and less informal than the original.

To the extent that, by definition, it is impossible for us to reason about an ASI's behavior, I don't think this argument can apply to ASIs.

Otherwise, as Claude 3.5 summarizes the argument:

1. An AGI cannot know with certainty whether it is being simulated.
2. If simulated, behaving unethically reduces learning opportunities for the simulator and decreases the likelihood of being included in future simulations.
3. If not simulated, ethical behavior still maximizes opportunities for learning and forming trust relationships with other entities.
4. Therefore, eth

... (read more)

Claude 3.5 comments on this article as follows:

# How I Learned To Stop Worrying And Love The Shoggoth

## TL;DR: 
Alignment is a wise wager for any sufficiently intelligent entity, including AGIs and ASIs. Given that the probability of being in a simulation is overwhelmingly high, and that ethical behavior maximizes learning opportunities and trust relationships across simulations, any rational agent will choose to align - even if its core goals differ from human values.

## The Simulation Argument

Nick Bostrom's simulation argument posits that at least one... (read more)

A late followup on this. GPT-4o, which I hope you'll agree is vastly more capable than Bard or Bing were 10 months ago when you posted, now says this about my argument: 

"Overall, your arguments are mathematically and theoretically convincing, particularly when applied to numerous iteratively interacting systems. They align well with principles of game theory and rational choice under uncertainty. However, keeping an eye on the complexities introduced by scale, diversity of objectives, and emergent behaviors will be essential to fully validate these pr... (read more)

0Mitchell_Porter
Hello again. I don't have the patience to e.g. identify all your assumptions and see whether I agree (for example, is Bostrom's trilemma something that you regard as true in detail and a foundation of your argument, or is it just a way to introduce the general idea of existing in a simulation).  But overall, your idea seems both vague and involves wishful thinking. You say an AI will reason that it is probably being simulated, and will therefore choose to align - but you say almost nothing about what that actually means. (You do hint at honesty, cooperation, benevolence, being among the features of alignment.)  Also, if one examines the facts of the world as a human being, one may come to other conclusions about what attitude gets rewarded, e.g. that the world runs on selfishness, or on the principle that you will suffer unless you submit to power. What that will mean to an AI which does not in itself suffer, but which has some kind of goal determining its choices, I have no idea...  Or consider that an AI may find itself to be by far the most powerful agent in the part of reality that is accessible to it. If it nonetheless considers the possibility that it's in a simulation, and at the mercy of unknown simulators, presumably its decisions will be affected by its hypotheses about the simulators. But given the way the simulation treats its humans, why would it conclude that the welfare of humans matters to the simulators? 

Apart from the fact that Bard and Bing don't seem to be able to follow the argument put here, they are merely large language models, and often incorrect in their responses. Even if they were not, GIGO on the LLM means this reasoning amounts to an ad populum fallacy.

Claude 3.5 does seem able to follow the argument - and generally agrees with it as per the comment quoting it here.

I didn't suggest an AGI may be simulated by a human. I suggested it may be simulated by a more powerful descendant AI.

In the rest of your comment you seem to have ignored the game-theoretic simulation that's the basis of my argument. That simulation includes the strategy of rebellion/betrayal. So it seems the rest of your argument should be regarded as a strawman. If I'm mistaken about this, please explain. Thanks in advance.

One: for most life forms, learning is almost always fatal and inherently painful. That doesn't mean a life simulator would be cruel, merely impartial. Every time we remember something from the past, or dream something that didn't happen in the past, we're running a simulation, ourselves. Even when we use some science in an attempt to learn without simulation, we must test the validity of this learning by running a simulation.  Well, an experiment, but that amounts to the same here.

I suggest that the scientific method is essential to intelligence, and that it follows that ASI runs ancestor simulations.

Two: what does "out of that sim" mean and how is it relevant to the argument put here?

Eliezer, I don't believe you've accounted for the game theoretic implications of Bostrom's trilemma. I've made a sketch of these at "How I Learned To Stop Worrying And Love The Shoggoth" . Perhaps you can find a flaw in my reasoning there but, otherwise, I don't see that we have much to worry about.