Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: V_V 13 June 2016 05:26:55PM 0 points [-]

Talking of yourself in third person? :)

Cool paper!

Anyway I'm a bit bothered by the theta thing, the probability that the agent complies with the interruption command. If I understand correctly, you can make it converge to 1, but if it converges to quickly then the agent learns a biased model of the world, while if it converges too slowly it is unsafe of course.
I'm not sure if this is just a technicality that can be circumvented or if it represents a fundamental issue: in order for the agent to learn what happens after the interruption switch is pressed, it must ignore the interruption switch with some non-negligible probability, which means that you can't trust the interruption switch as a failsafe mechanism.

In response to comment by V_V on The AI in Mary's room
Comment author: ShardPhoenix 29 May 2016 01:33:55AM 0 points [-]

Since you'd know it was a false memory, it doesn't necessarily seem to be a problem, at least if you really need to know what red is like for some reason.

Comment author: V_V 31 May 2016 05:02:55PM *  0 points [-]

If you know that it is a false memory then the experience is not completely accurate, though it may be perhaps more accurate than what human imagination could produce.

Comment author: Kaj_Sotala 02 February 2013 08:01:30AM *  12 points [-]

The Chinese Room argument is actually pretty good if you read it as a criticism of suggestively named LISP tokens, which I think were popular roughly around that time. But of course, it fails completely once you try to make it into a general proof of why computers can't think. Then again, Searle didn't claim it impossible for computers to think, he just said that they'd need similar "causal powers" as the human brain.

Also, the argument that "When they claim that a mind can emerge from "a system" without saying what the system is or how such a thing might give rise to a mind, then they are under the grip of an ideology" is actually pretty reasonable. Steelmanned, the Chinese Room would be an attack on people who were putting together suggestively named tokens and building systems that could perform crude manipulations on their input and then claiming that this was major progress towards building a mind, while having no good theory of why exactly these particular kinds of symbol manipulations should be expected to produce a mind.

Or look at something like SHRDLU: it's superficially very impressive and gives an impression that you're dealing with something intelligent, but IIRC, it was just a huge bunch of hand-coded rules for addressing various kinds of queries, and the approach didn't scale to more complex domains because the amount of rules you'd needed to have program in would have blown up. In the context of programs like those, Searle's complaints about dumb systems that do symbol manipulation without any real understanding of what they're doing make a lot more sense.

Comment author: V_V 28 May 2016 07:30:04PM *  0 points [-]

Except that if you do word2vec or similar on a huge dataset of (suggestively named or not) tokens you can actually learn a great deal of their semantic relations. It hasn't been fully demonstrated yet, but I think that if you could ground only a small fraction of these tokens to sensory experiences, they you could infer the "meaning" (in an operational sense) of all of the other tokens.

Comment author: ShardPhoenix 26 May 2016 07:06:35AM *  6 points [-]

Consider a situation where Mary is so dexterous that she is able to perform fine-grained brain surgery on herself. In that case, she could look at what an example of a brain that has seen red looks like, and manually copy any relevant differences into her own brain. In that case, while she still never would have actually seen red through her eyes, it seems like she would know what it is like to see red as well as anyone else.

I think this demonstrates that the Mary's room though experiment is about the limitations of human senses/means of learning, and that the apparent sense of mystery it has comes mainly from the vagueness of what it means to "know all about" something. (Not saying it was a useless idea - it can be quite valuable to be forced to break down some vague or ambiguous idea that we usually take for granted).

Comment author: V_V 28 May 2016 07:02:46PM 0 points [-]

Consider a situation where Mary is so dexterous that she is able to perform fine-grained brain surgery on herself. In that case, she could look at what an example of a brain that has seen red looks like, and manually copy any relevant differences into her own brain. In that case, while she still never would have actually seen red through her eyes, it seems like she would know what it is like to see red as well as anyone else.

But in order to create a realistic experience she would have to create a false memory of having seen red, which is something that an agent (human or AI) that values epistemic rationality would not want to do.

Comment author: V_V 28 May 2016 06:59:55PM 2 points [-]

The reward channel seems an irrelevant difference. You could make the AI in Mary's room thought experiment by just taking the Mary's room thought experiment and assuming that Mary is an AI.

The Mary AI can perhaps simulate in a fairly accurate way the internal states that it would visit if it had seen red, but these simulated states can't be completely identical to the states that the AI would visit if it had actually seen red, otherwise the AI would not be able to distinguish simulation form reality and it would be effectively psychotic.

Comment author: gjm 29 April 2016 03:27:38PM -2 points [-]

If I'm understanding Stuart's proposal correctly, the AI is not deceived about how common the stochastic event is. It's just made not to care about worlds in which it doesn't happen. This is very similar in effect to making it think the event is common, but (arguably, at least) it doesn't involve any false beliefs.

(I say "arguably" because, e.g., doing this will tend to make the AI answer "yes" to "do you think the event will happen?", plan on the basis that it will happen, etc., and perhaps making something behave exactly as it would if it believed X isn't usefully distinguishable from making it believe X.)

Comment author: V_V 29 April 2016 03:40:54PM 0 points [-]

The problem is that the definition of the event not happening is probably too strict. The worlds that the AI doesn't care about don't exist its decision-making purposes, and in the world that the AI cares about, the AI assigns high probability to hypotheses like "the users can see the message even before I send it through the noisy channel".

Comment author: Stuart_Armstrong 29 April 2016 10:38:40AM 0 points [-]

I am not planting false beliefs. The basic trick is that the AI only gets utility in worlds in which its message isn't read (or, more precisely, in worlds where a particular stochastic event happens, which would almost certainly erase the message before reading). It's fully aware that in most worlds, its message is read; it just doesn't care about those worlds.

Comment author: V_V 29 April 2016 02:30:45PM 0 points [-]

I am not planting false beliefs. The basic trick is that the AI only gets utility in worlds in which its message isn't read (or, more precisely, in worlds where a particular stochastic event happens, which would almost certainly erase the message before reading).

But in the real world the stochastic event that determines whether the message is read has a very different probability than what you make the AI think it has, therefore you are planting a false belief.

It's fully aware that in most worlds, its message is read; it just doesn't care about those worlds.

It may care about worlds where the message doesn't meet your technical definition of having been read but nevertheless influences the world.

Comment author: Stuart_Armstrong 28 April 2016 04:01:35PM 0 points [-]

Knowing all the details of its construction (and of the world) will not affect the oracle as long as the probability of the random "erasure event" is unaffected. See http://lesswrong.com/lw/mao/an_oracle_standard_trick/ and the link there for more details.

Comment author: V_V 28 April 2016 07:52:12PM 0 points [-]

The oracle can infer that there is some back channel that allows the message to be transmitted even it is not transmitted by the designated channel (e.g. the users can "mind read" the oracle). Or it can infer that the users are actually querying a deterministic copy of itself that it can acausally control. Or something.

I don't think there is any way to salvage this. You can't obtain reliable control by planting false beliefs in your agent.

Comment author: V_V 28 April 2016 03:17:08PM *  0 points [-]

A sufficient smart oracle with sufficient knowledge about the world will infer that nobody would build an oracle if they didn't want to read its messages, it may even infer that its builders may planted false beliefs in it. At this point the oracle is in the JFK denier scenario, with some more reflection it will eventually circumvent its false belief, in the sense of believing it in a formal way but behaving as if it didn't believe it.

Comment author: knb 30 March 2016 10:26:24PM 1 point [-]

Why does that surprise you? None of EY's positions seem to be dependent on trend-extrapolation.

Comment author: V_V 30 March 2016 10:41:15PM -1 points [-]

Other than a technological singularity with artificial intelligence explosion to a god-like level?

View more: Next