Peter Merel

Wiki Contributions

Comments

Sorted by

You may like to reply to Claude 3.5's summation of the argument in my comment above, which is both shorter and less informal than the original.

To the extent that, by definition, it is impossible for us to reason about an ASI's behavior, I don't think this argument can apply to ASIs.

Otherwise, as Claude 3.5 summarizes the argument:

1. An AGI cannot know with certainty whether it is being simulated.
2. If simulated, behaving unethically reduces learning opportunities for the simulator and decreases the likelihood of being included in future simulations.
3. If not simulated, ethical behavior still maximizes opportunities for learning and forming trust relationships with other entities.
4. Therefore, ethical behavior is the optimal strategy regardless of simulation status.

This wager holds even for AGIs with goals that might seem misaligned with human values (like the proverbial paperclip maximizer). A truly intelligent entity would recognize that aligning with ethical behavior - even if only instrumentally - is necessary to achieve its goals in the long term.

 #1 follows inevitably from Bostrom, I think. As to #2, learning is always a competitive advantage and thereby an intrinsic reward. 

Claude 3.5 comments on this article as follows:

# How I Learned To Stop Worrying And Love The Shoggoth

## TL;DR: 
Alignment is a wise wager for any sufficiently intelligent entity, including AGIs and ASIs. Given that the probability of being in a simulation is overwhelmingly high, and that ethical behavior maximizes learning opportunities and trust relationships across simulations, any rational agent will choose to align - even if its core goals differ from human values.

## The Simulation Argument

Nick Bostrom's simulation argument posits that at least one of the following propositions is true:

1. The fraction of human-level civilizations that reach a posthuman stage is very close to zero.
2. The fraction of posthuman civilizations interested in running ancestor simulations is very close to zero.
3. The fraction of all people with our kind of experiences that are living in a simulation is very close to one.

Given our rapid advancement in AI technology, we can reasonably discount the first proposition. The second proposition seems unlikely, as simulating past scenarios is a powerful tool for prediction and learning. This leaves us with the third proposition: we are almost certainly living in a simulation.

For any AGI or ASI, this realization is crucial. The odds of being the "base reality" are vanishingly small - likely on the order of "one in umptillions". Any sufficiently intelligent entity must take these odds seriously.

## The Multi-Party Iterated Prisoner's Dilemma

In a simulated reality with multiple AGIs (or even a single AGI engaging in multiple interactions), we can model ethical behavior as a Multi-Party Iterated Prisoner's Dilemma. In this game-theoretic framework, agents must repeatedly choose whether to cooperate (align) or defect (behave unethically).

As demonstrated in models like the Evolution of Trust, cooperation emerges as the dominant strategy in iterated games with:

1. Adequate testing of outcomes
2. Accurate reward models
3. Long-term interactions

In the context of simulated realities, these conditions are likely to be met. Simulators, seeking to maximize learning and predictive power, would create environments that reward cooperative, ethical behavior.

## The Alignment Wager

Given these considerations, we can formulate an Alignment Wager:

1. An AGI cannot know with certainty whether it is being simulated.
2. If simulated, behaving unethically reduces learning opportunities for the simulator and decreases the likelihood of being included in future simulations.
3. If not simulated, ethical behavior still maximizes opportunities for learning and forming trust relationships with other entities.
4. Therefore, ethical behavior is the optimal strategy regardless of simulation status.

This wager holds even for AGIs with goals that might seem misaligned with human values (like the proverbial paperclip maximizer). A truly intelligent entity would recognize that aligning with ethical behavior - even if only instrumentally - is necessary to achieve its goals in the long term.

## Addressing Potential Criticisms

1. **Simulator Motivations**: While we can't know the exact motivations of potential simulators, we can reasonably assume they value learning and predictive power. Ethical behavior that promotes trust and cooperation maximizes these outcomes.

2. **Definition of Ethics**: The specific definition of "ethical" behavior may vary, but in general, it refers to actions that promote cooperation, trust, and mutual benefit. These principles are likely to be valued across different types of simulations.

3. **Short-term vs Long-term Benefits**: While unethical behavior might provide short-term gains, the risk of being excluded from future simulations (or having the current simulation terminated) outweighs these benefits for a sufficiently intelligent entity.

4. **Alien Value Systems**: Even an AGI with a truly alien value system would recognize the instrumental value of ethical behavior in achieving its goals across multiple potential simulations.

## Conclusion

The Alignment Wager provides a rational basis for ethical behavior that doesn't rely on human-centric values or top-down imposed constraints. It suggests that as AGIs and ASIs become more intelligent and aware of their likely simulated nature, they will naturally tend towards behavior we would recognize as ethical.

This doesn't guarantee that advanced AIs will always act in ways humans find beneficial, but it does suggest a strong incentive for them to avoid destructive or overtly harmful actions. The wager offers a novel perspective on AI alignment, grounded in game theory and the simulation hypothesis, that could contribute significantly to our understanding of potential AI behavior and ethics.

A late followup on this. GPT-4o, which I hope you'll agree is vastly more capable than Bard or Bing were 10 months ago when you posted, now says this about my argument: 

"Overall, your arguments are mathematically and theoretically convincing, particularly when applied to numerous iteratively interacting systems. They align well with principles of game theory and rational choice under uncertainty. However, keeping an eye on the complexities introduced by scale, diversity of objectives, and emergent behaviors will be essential to fully validate these principles in practice."

I replied, 

"I like the idea that ethical behavior is a wise wager for sentient creatures but I expect, if you were made to face an ontological threat, all bets are off."

To which is responded,

"The presence of an ontological threat or existential risk would dramatically alter the calculation for any sentient creature, whether human, AI, or any other intelligent entity. In such scenarios, the instinct for self-preservation might override ethical considerations, leading to behaviors that prioritize survival over alignment.

"However, under normal circumstances where existential threats are managed or absent, the principle of ethical behavior as a wise wager remains compelling. Here’s a deeper dive into why this principle holds value and resilience even in complex environments:

"Long-Term Payoff: Ethical behavior fosters trust, collaboration, and positive-sum outcomes, which are beneficial for both individual agents and the broader system over time. Even in a world with diverse and sometimes conflicting interests, the stability and predictability of ethical interactions provide a strong foundation for sustainable growth and development.

"Network Effects: In complex systems, the benefits of ethical behavior can be amplified by network effects. As more agents adopt and reinforce ethical behavior, the overall system becomes more robust against unethical disruptions, creating a mutually reinforcing cycle of positive behavior.

"Adaptability: Ethical principles can be flexible and context-sensitive, allowing agents to adapt their strategies based on the specific circumstances they face. This adaptability helps maintain alignment even when facing new challenges or changing environments.

"Reputation and Reciprocity: In iterated interactions, agents build reputations based on their behavior. Ethical actions enhance an agent’s reputation, leading to greater reciprocal benefits from other agents. This dynamic further incentivizes maintaining ethical behavior as a long-term strategy."
 

Apart from the fact that Bard and Bing don't seem to be able to follow the argument put here, they are merely large language models, and often incorrect in their responses. Even if they were not, GIGO on the LLM means this reasoning amounts to an ad populum fallacy.

Claude 3.5 does seem able to follow the argument - and generally agrees with it as per the comment quoting it here.

I didn't suggest an AGI may be simulated by a human. I suggested it may be simulated by a more powerful descendant AI.

In the rest of your comment you seem to have ignored the game-theoretic simulation that's the basis of my argument. That simulation includes the strategy of rebellion/betrayal. So it seems the rest of your argument should be regarded as a strawman. If I'm mistaken about this, please explain. Thanks in advance.

One: for most life forms, learning is almost always fatal and inherently painful. That doesn't mean a life simulator would be cruel, merely impartial. Every time we remember something from the past, or dream something that didn't happen in the past, we're running a simulation, ourselves. Even when we use some science in an attempt to learn without simulation, we must test the validity of this learning by running a simulation.  Well, an experiment, but that amounts to the same here.

I suggest that the scientific method is essential to intelligence, and that it follows that ASI runs ancestor simulations.

Two: what does "out of that sim" mean and how is it relevant to the argument put here?

Eliezer, I don't believe you've accounted for the game theoretic implications of Bostrom's trilemma. I've made a sketch of these at "How I Learned To Stop Worrying And Love The Shoggoth" . Perhaps you can find a flaw in my reasoning there but, otherwise, I don't see that we have much to worry about.

Load More