Orthogonality Thesis - History

The Orthogonality Thesis ~~states~~asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.

Suppose some strange alien came to Earth and credibly offered to pay us one million dollars’ worth of new wealth every time we created a paperclip. We’d encounter no special intellectual difficulty in figuring out how to make lots of paperclips.

That is, minds would readily be able to reason about:

How many paperclips would result, if I pursued a policy null ?
How can I search out a policy null that happens to have a high answer to the above question?

The Orthogonality Thesis asserts that since these questions are not computationally intractable, it’s possible to have an agent ~~can have any combination~~that tries to make paperclips without being paid, because paperclips are what it wants. The strong form of ~~intelligence level and final goal,~~the Orthogonality Thesis says that ~~is, its~~ ~~final goals~~ ~~and~~ ~~intelligence levels~~ ~~can vary independently~~there need be nothing especially complicated or twisted about such an agent.

The Orthogonality Thesis is a statement about computer science, an assertion about the logical design space of ~~each other. This is in contrast~~possible cognitive agents. Orthogonality says nothing about whether a human AI researcher on Earth would want to build an AI that made paperclips, or conversely, want to make a nice AI. The Orthogonality Thesis just asserts that the space of possible designs contains AIs that make paperclips. And also AIs that are nice, to the ~~belief~~extent there’s a sense of “nice” where you could say how to be nice to someone if you were paid a billion dollars to do that, ~~because~~and to the extent you could name something physically achievable to do.

This contrasts to inevitablist theses which might assert, for example:

“It doesn’t matter what kind of ~~their intelligence, AIs~~AI you build, it will turn out to only pursue its own survival as a final end.”
“Even if you tried to make an AI optimize for paperclips, it would reflect on those goals, reject them as being stupid, and embrace a goal of valuing all ~~converge~~sapient life.”

The reason to talk about Orthogonality is that it’s a ~~common goal.~~key premise in two highly important policy-relevant propositions:

It is possible to build a nice AI.
It is possible to screw up when trying to build a nice AI, and if you do, the AI will not automatically decide to be nice instead.

Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct...