Anton Geraschenko

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Thanks for the reply. I agree that strong Inevitability is unreasonable, and I understand the function of #1 and #2 in disrupting a prior frame of mind which assumes strong Inevitability, but that's not the only alternative to Orthogonality. I'm surprised that the arguments are considered successively stronger arguments in favor of Orthogonality, since #6 basically says "under reasonable hypotheses, Orthogonality may well be false." (I admit that's a skewed reading, but I don't know what the referenced ongoing work looks like, so I'm skipping that bit for now. [Edit: is this "tiling agents"? I'm not familiar with that work, but I can go learn about it.])

The other arguments are interesting commentary, but don't argue that Orthogonality is true for agents we ought to care about.

  • Gandhian stability argues that self-modifying agents will try to preserve their preference systems, but not that they can become arbitrarily powerful while doing so. As it happens, circular preference systems illustrate how Gandhian stability could limit how powerful a cognitive agent can become.
  • The unbounded agents argument says Orthogonality is true when "mind space" is broader than what we care about.
  • The search tractability argument looks like a statement about the relative difficulty of accomplishing different goals, not the relative difficulties of holding those goals. I don't mean to dismiss the argument, but I don't understand it. I'm not even clear on exactly what the argument is saying about the tractability of searching for strategies for different goals. That it's the same for all possible goals?

1 seems a bit odd. You could argue that the Argument from Mind Design Space Width supports it, but this just demonstrates that this initial argument may be too crude to do more than act as an intuition pump. By the time we're talking about the Argument from Reflective Stability, I don't think that argument supports "you can have circular preferences" any more.

That's exactly the point (except I'm not sure what you mean by "the Argument from Reflective Stability"; the capital letters suggest you're talking about something very specific). The arguments in favor of Orthogonality just seem like crude intuition pumps. The purpose of 1 was not to actually talk about circular preferences, but to pick an example of something supported by largeness of mind design space, but which we expect to break for some other reason. Orthogonality feels like claiming the existence of an integer with two distinct prime factorizations because "there are so many integers". Like the integers, mind design space is vast, but not arbitrary. It seems unlikely to me that there cannot be theorems showing that sufficiently high cognitive power implies some restriction on goals.

I'm skeptical of Orthogonality. My basic concern is that it can be interpreted as true-but-useless for purposes of defending it, and useful-but-implausible when trying to get it to do some work for you, and that the user of the idea may not notice the switch-a-roo. Consider the following statements: there are arbitrarily powerful cognitive agents

  1. which have circular preferences,
  2. with the goal of paperclip maximization,
  3. with the goal of phlogiston maximization,
  4. which are not relfective,
  5. with values aligned with humanity.

Rehearsing the arguments for Orthogonality and then evaluating these questions, I find my mind gets very slippery.

Orthongonality proponents I've spoken to say 1 is false, because "goal space" excludes circular preferences. But there are very likely other restrictions on goal space imposed once an agent groks things like symmetry. If "goal space" means whatever goals are not excluded by our current understanding of intelligence, I think Orthogonality is unlikely (and poorly formulated). If it means "whatever goals powerful cognitive agents can have", Orthogonality is tautological and distracts us from pursuing the interesting question of what that space of goals actually is. Let's narrow down goal space.

If 2 and 3 get different answers, why? Might a paperclip maximizer take liberties with what is considered a paperclip once it learns that papers can be electrostatically attracted?

If 4 is easily true, I wonder if we're defining "mind space" too broadly to be useful. I'd really like humanity to focus on the sector of mind space that we should focus on in order to get a good outcome. The forms of Orthogonality which are clearly (to me) true distract from the interesting question of what that sector actually is. Let's narrow down mind space.

For 5, I don't find Orthogonality to be a convincing argument. A more convincing argument is to shoot for "humanity can grow up to have arbitrarily high cognitive power" instead.

This clarifies the previous sentence immensely.

Oh, the ipsum.

[Edit: this was meant to be an inline comment attached to "the ipsum" in Anna's comment, but that connection has apparently been lost.]