Stuart_Armstrong comments on Assessors that are hard to seduce - Less Wrong

5 Post author: Stuart_Armstrong 09 March 2015 02:19PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (12)

You are viewing a single comment's thread. Show more comments above.

Comment author: Slider 10 March 2015 12:53:01AM 1 point [-]

Appeasing an imaginary moral judger seems an awfully lot like pleasing a man in the clouds. Any reason why critisims of religion could not be used verbatim against this arrangement?

Comment author: Stuart_Armstrong 10 March 2015 09:25:43AM 1 point [-]

If we get to set the criteria of this man in the clouds, we can get useful behaviour. The main criticism of religion is that it's untrue, and that a dedicated observer will realise this. Here we know it to be untrue, but the AI cannot act on that knowledge (see my post on false thermodynamic miracles).

Comment author: Slider 10 March 2015 01:11:49PM 0 points [-]

Wouldn't we know that in this context this would be true (rather than untrue as you write)? Also the degree that the assessor is properly shielded from tampereing means it will become closer to imaginary (no need to mention assessor implementation details, but then it seems to work like "magic" lessening the evidecne to believe in it existence). Also it seems that things that make people turn on religion are valued and here we are counting on the AI not pulling those same stunts.

Hiding the assessor among multiple plausible targets might make the AI play mafia on people (such as trying to get the assessor replaced when it can't (no longer) satisfy it's demands, inhopes that the replacement has easier attitudes or atleast possibility to have flaws to exploit).

Comment author: Stuart_Armstrong 10 March 2015 02:36:56PM 0 points [-]

These can be defined in counterfactual ways, if needed. There need not actually be an assessor, just a small probability of one.

Comment author: Slider 11 March 2015 01:46:25AM 0 points [-]

Wouldn't that be the equivalent of thinking that a Pascal's wager will keep it in check?

Comment author: Stuart_Armstrong 11 March 2015 10:25:22AM 0 points [-]
Comment author: djm 10 March 2015 11:02:46AM 0 points [-]

I agree that useful behavior could come of this - religion has always been a very effective control mechanism.

The main criticism of religion is that it's untrue, and that a dedicated observer will realise this

Unfortunately, it would be a challenging problem to maintain this control over an increasingly intelligent AI.

Comment author: Stuart_Armstrong 10 March 2015 11:19:22AM 0 points [-]
Comment author: djm 10 March 2015 12:30:37PM 0 points [-]

That would likely work for initial versions of an AI, but I still cant help feeling that this is just tampering with the signal and that an advanced AI would detect this.

Would it not question the purpose of the utility function around detecting thermodynamic miracles - how would this work with its utility function to detect tampering or false data.

If I saw a miracle, I would [hope] my thinking would follow the logic below

a) it must be a trick/publicity stunt done with special effects b) I am having some sort of dream / mental breakdown / psychotic episode c) some other explanation I don't of

I don't think an intelligent agent would or should jump to "it's a miracle", and I would be concerned of its response if/when it does realise that it has been tricked all along.

Comment author: Stuart_Armstrong 10 March 2015 03:15:06PM 0 points [-]

Would it not question the purpose of the utility function around detecting thermodynamic miracles

Probably, but it's not programmed to care about that.

If I saw a miracle

Remember, it's not seeing a miracle. It's more that its decisions only matter if a miracle happened, so it's assuming that a miracle happened for decision making purposes.