ec429 comments on Invisible Frameworks - Less Wrong

12 Post author: Eliezer_Yudkowsky 22 August 2008 03:36AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (40)

Sort By: Old

You are viewing a single comment's thread.

Comment author: ec429 23 September 2011 05:29:15AM 1 point [-]

Roko is adopting a special and unusual metamoral framework in regarding "Most agents do X!" as a compelling reason to change one's utility function. Why might Roko find this appealing? Humans, for very understandable reasons of evolutionary psychology, have a universalizing instinct; we think that a valid argument should persuade anyone.

Perhaps this can be fixed; maybe if we say Q:="moral(X):="A supermajority of agents which accept Q consider X moral"". Then agents accepting Q cannot agree to disagree, and Q-based arguments are capable of convincing any Q-implementing agent.

On the other hand, the universe could stably be in a state in which agents which accept Q mostly believe moral(torture), in which case they all continue to do so. However, this is unsurprising; there is no way to force everyone to agree on what is "moral" (no universally compelling arguments), so why should Q-agents necessarily agree with us?

But what we are left with seems to be a strange loop through the meta-level, with the distinction that it loops through not only the agent's own meta-level but also the agent's beliefs about other Q-agents' beliefs.

However, I'm stripping out the bit about making instrumental values terminal, because I can't see the point of it (and of course it leads to the "drive a car!" problem). Instead we take Q as our only terminal value; the shared pool of things-that-look-like-terminal-values {X : Q asserts moral(X)} is in fact our first layer of instrumental values.

Also, I'm not endorsing the above as a coherent or effective metaethics. I'm just wondering whether it's possible that it could be coherent or effective. In particular, is it PA+1 or Self-PA? Does it exhibit the failure mode of the Type 2 Calculator? After all, the system as a whole is defined as outputting what it outputs, but individual members are defined as outputting what everyone else outputs and therefore, um, my head hurts.