Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

For example, a bad moral argument may argue that even the simplest being which is "capable of arguing for equal rights for itself," ought to deserve equal rights and personhood. This simplest being is a small AI.

Or that that simplest "being" is a rock with "I want equal rights" written on it.

Generally I think the Ideal World Benchmark could be useful for identifying some misaligned AIs. However, some misaligned AIs can be identified by asking "What are your goals?" and I do not expect the Ideal World Benchmark to be significantly more robust to deception.

If tomorrow my boss claimed to be sent by a future version of myself that obtained vast intelligence and power and asked me what that version should do, I would want some convincing prove before saying anything controversial.