Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: magfrump 26 April 2017 11:00:53PM 0 points [-]

Reading this I'm coming away with several distinct objections that I feel make the point that AI control is hard and give no practical short term tools.

The first objection is that it seems impossible to determine, from the perspective of system 1, whether system 2 is working in a friendly way or not. In particular, it seems like you are suggesting that a friendly AI system is likely to deceive us for our own benefit. However, this makes it more difficult to distinguish "friendly" and "unfriendly" AI systems! The core problem with friendliness I think is that we do not actually know our own values. In order to design "friendly" systems we need reliable signals of friendliness that are easier to understand and measure. If your point holds and is likely to be true of AI systems, then that takes away the tool of "honesty" which is somewhat easy to understand and verify.

The second objection is that in the evolutionary case, there is a necessary slowness to the iteration process. Changes in brain architecture must be very slow and changes in culture can be faster, but not so fast that they change many times a generation. This means there's a reasonable amount of time to test many use cases and to see success and failure even of low probability events before an adaptation is adopted. While technologies are adopted gradually in the AI case, ugly edge cases commonly occur and have to be fixed post-hoc, even when the systems they are edge cases of have been reliable for years or millions of test cases in advance. The entire problem of friendliness is to be able to identify these unknown unknowns, and the core mechanism solving that in the human case seems like slow iteration speed, which is probably not viable due to competitive pressure.

Third, system 1 had essentially no active role in shaping system 2. Humans did not reflectively sit down and decide to become intelligent. In particular, that means that many of the details of this solution aren't accessible to us. We don't have a textbook written down by monkeys talking about what makes human brains different and when humans and monkeys had good and bad times together. In fact our knowledge of the human brain is extremely limited, to the point where we don't even have better ways of talking about the distinctions made in this post than saying "system 1" and "system 2" and hand-waving over the fact that these aren't really distinct processes!

Overall the impression I get is of a list of reasons that even if things seem to be going poorly without a central plan, there is a possibility that this will work out in the end. I don't think that this is bad reasoning, or even particularly unlikely. However I also don't think it's highly certain nor do I think that it's surprising. The problem of AI safety is to have concrete tools that can increase our confidence to much higher levels about the behavior of designed systems before seeing those systems work in practice on data that may be significantly different than we have access to. I'm not sure how this metaphor helps with those goals and I don't find myself adjusting my prior beliefs at all (since I've always thought there was some significant chance that things would work out okay on their own--just not a high enough chance)

Comment author: Alicorn 17 March 2017 01:46:56AM 21 points [-]

If you like this idea but have nothing much to say please comment under this comment so there can be a record of interested parties.

Comment author: magfrump 17 March 2017 06:11:11AM 1 point [-]

I would be interested but am not strongly socially connected to many rationalists in person so I would feel weird about living with them right away.

Comment author: magfrump 06 January 2014 07:23:16PM 0 points [-]

My parents did this when I was a kid (or at least I specifically remember my mom doing it a lot) and I turned out great! </humblebrag>

In response to Fascists and Rakes
Comment author: magfrump 06 January 2014 07:20:44PM 7 points [-]

I think this post would have been stronger without any use of the term fascism, and then you also could have left out the term "rakes."

The title could be "Permissiveness and Harm" or something like that. You only even use the titular terms a few times, more than three quarters of the way through the article.

Comment author: [deleted] 26 October 2013 09:11:30AM 3 points [-]

I would still call it "intelligence" by Eliezer's definition: ability to optimize the universe, or at least some small slice of it.

IIRC the optimization power has to be cross-domain according to his definition, otherwise Deep Blue would count as intelligent.

In response to comment by [deleted] on What should normal people do?
Comment author: magfrump 26 October 2013 07:13:02PM 3 points [-]

That doesn't seem to count as a problem with the above definition. Taboo "intelligent." Is Deep Blue an optimizing process that successfully optimizes a small part of the universe?

Yes.

Is it an optimizing process that should count as sentient for the purposes of having legal rights? Should we be worried about it taking over the world?

No.

Comment author: orthonormal 23 September 2013 07:45:27PM 0 points [-]

Overall I still have no understanding of theorem 5.1 though. I'm not terribly familiar with the field in general but the other proofs were still fairly straightforward, whereas this proof loses me in the first sentence, without referencing a result I can look up either inside or outside of the paper.

Were you OK with the proof of Theorem 4.1? To me, that and the proof of Theorem 5.1 are of equal difficulty. (Some of the other authors had more experience with Kripke semantics than I did, so they did most of the editing of those proofs. They work better with diagrams.)

orthonormal seems to believe that PrudentBot couldn't be implemented for the LessWrong PD competition, although he did say with algorithmic proof search, would he change his opinion using Kripke semantics?

Yes; a PD tournament among modal sentences using the code Eliezer linked would be feasible and quite interesting!

Comment author: magfrump 09 October 2013 05:47:01PM 0 points [-]

I would say that "I am surprised that the bots have not been submitted to a PD tournament" but then I saw the paper was published in May and that's less than 6 months ago so instead I'll make the (silly, easy-to-self-fulfill) prediction that many or all of those bots will show up in the next PD tourney.

Comment author: orthonormal 23 September 2013 07:45:27PM 0 points [-]

Overall I still have no understanding of theorem 5.1 though. I'm not terribly familiar with the field in general but the other proofs were still fairly straightforward, whereas this proof loses me in the first sentence, without referencing a result I can look up either inside or outside of the paper.

Were you OK with the proof of Theorem 4.1? To me, that and the proof of Theorem 5.1 are of equal difficulty. (Some of the other authors had more experience with Kripke semantics than I did, so they did most of the editing of those proofs. They work better with diagrams.)

orthonormal seems to believe that PrudentBot couldn't be implemented for the LessWrong PD competition, although he did say with algorithmic proof search, would he change his opinion using Kripke semantics?

Yes; a PD tournament among modal sentences using the code Eliezer linked would be feasible and quite interesting!

Comment author: magfrump 09 October 2013 05:45:58PM 0 points [-]

To the first point: I guess I'm not really comfortable with the proof of Theorem 4.1, per se, however the result seems incredibly intuitive to me. The choice of symbol/possible LaTeX error where one of the symbols is a square is confusing, and looking at it again three weeks later (I have not been on LW recently) I've forgotten too much of the notation to review it in depth in two to five minutes.

But I see Theorem 4.1 as the statement that "you can't stop someone from punishing you for something unless you think at a higher level than they do" and I assume that there exists a proof of that statement, and that the proof provided is such a proof.

I see Theorem 5.1 as saying that some set is compact for some reason and that implies existence of something for some reason, but I don't know why the thing is compact or why the desired object is the limit of the the things we constructed in the right way.

Although again it's been a while so I could be misremembering here.

Comment author: Nisan 17 September 2013 12:52:08AM 1 point [-]

That was a response to

Is there a good (easy) reference for the statement about quining in PA

Comment author: magfrump 09 October 2013 05:34:15PM 0 points [-]

Let me rephrase my question, then, because the diagonal lemma seems clear enough to me. What is a good definition of quining? The term isn't used at all either in the article you linked or it the page on self-reference, which surprised me.

Comment author: Nisan 16 September 2013 02:35:06PM 2 points [-]

On quining in arithmetic, see any exposition on Gödel's First Incompleteness Theorem and the Wikipedia article on the diagonal lemma.

Comment author: magfrump 17 September 2013 12:21:04AM 0 points [-]

I am unsure which of my questions this is supposed to answer, although perhaps that will become clear on reading the wikipedia article.

Comment author: NancyLebovitz 12 September 2013 10:58:56AM *  24 points [-]

One way that the banking crisis is similar to AGI, and not in a way that cheers me up, is that people were making money in the lead-up-- they didn't want it to be over because they were riding the boom. Coming up with near-AGI-- self-improving programs which aren't very generalized-- is going to be very advantageous.

Comment author: magfrump 16 September 2013 03:42:14AM 1 point [-]

Also the ways they were making money were very technical, so people with technical skillsets that might be useful in mitigating risk were drawn in to making money rather than risk mitigation.

View more: Next