The goal of this prize is to generate ideas and plans for a long-term, positive future. The starting prize pool is $500 USD, though others are welcome to add to it.

We are going to pretend for this exercise that the technical part of AI Alignment is already solved. Your mission, should you choose to accept it, is to come up with a design for what you want this AI to be doing.

Relaxations:

  • The AI will be safe regarding what you're targeting. Example: if you choose to give an AI the sole goal of eliminating cancer, it will actually cure cancer, and not just kill all humans to stop cancers from developing.
  • You have some degree of influence, politically or otherwise. You can push your idea pretty far, but you’re not all-powerful. If people would normally really hate your idea, or otherwise have no interest at all in it, that’s a problem.

Constraints:

  • The target of alignment you choose has to be something that major political actors (NATO, China, etc.) would not oppose.
  • It has to be something that would be within the economic interests of AI companies to implement.
  • It needs to be something that would not provoke massive public backlash.
  • You can’t turn off or change the fundamental architecture once implemented. (Meaning if you give it a certain target to be optimizing for, you can’t change the target in the future)
  • There will be no future intelligence (artificial or otherwise) that can ever surpass it in intelligence. It will continue to run until the stars fade and the Universe grows cold.

The criteria for selecting a winner will be determined by myself and others at AI Safety Strategy, based on a few factors.

Simplicity over complexity: The simpler and more effective strategies will be preferred over more complex ones. For example, an idea such as Extrapolated Human Volition will lose a lot of points for its added complexity. This is because we don’t know how robust a hypothetical alignment method might be, meaning aligning simpler objectives might be far easier than complex ones.

Multi-party interests: Ideas that are more likely to be backed by multiple interests will rank higher than ones that only please a few. Inversely, ideas that are likely to get opposition from multiple parties will be ranked much lower. All else being equal, an idea that pleases fewer parties but is opposed by almost no one will be ranked higher than an idea idea that pleases more parties but would be opposed by others. If given the choice, favor minimizing opposition over maximizing support.

Modern cultural compatibility: This means that, unfortunately, weirdness will be penalized, all else being equal. This doesn’t mean you should avoid a possible solution just because it’s weird. But, if you can find a strategy that accomplishes relatively the same things, but is less likely to be perceived as weird/crazy/dangerous to the public, favor doing that instead. All else being equal, minimize weirdness. Cultural compatibility in wealthy, industrialized nations will be favored over less developed/powerful nations. This is unfortunately a reality right now, since AI is unlikely to be developed in Papua New Guinea. If you can find a strategy that is culturally compatible with most/all nations, that’s better.

Long-term future: This one will be the most subjective. But I will try to imagine many different minds, with many different value sets, and will favor ideas that more minds with differing values would be at least moderately okay with (all else being equal). I will also consult with many others at AI Safety Support to gauge an idea of how pleasant/appalling a certain future sounds to them.

Uniqueness: Copies of other peoples’ ideas will not be accepted. Building on others’ ideas is allowed (but be sure to credit them). All else being equal, ideas that are more unique and original will be favored over ones that are not.

Read this blog post to get a better idea of this exercise (if pay-walled, read it here).

This prize is sponsored by myself and the organization AI Safety Strategy.

Current due date is August 1st, 2023. If not enough submissions are received, or if no submission is very promising, the date may be extended.

You can submit your versions in the form of a EA/LW blogpost and send it to me, or submit it here: https://app.impactmarkets.io/bounty/clix38oh200003e6s5dem1q3g 

New to LessWrong?