jsteinhardt comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 21 December 2010 03:44:35AM 4 points [-]
Comment author: DSimon 28 December 2010 10:14:04PM *  0 points [-]

Indeed, provided that the automated proof checker is absolutely entirely utterly bug-free and exploit-free! When was the last time someone released fairly complicated, mission-critical piece of software with an assertion that it was totally secure and bugless... and wasn't proven wrong afterwards?

Comment author: jimrandomh 28 December 2010 10:32:09PM 4 points [-]

Automated proof checking is actually very simple, and we already have bug-free and exploit-free proof checkers.

Comment author: DSimon 28 December 2010 10:36:55PM 0 points [-]

I was not expecting this answer at all! Can you elaborate a little further? Even just a link to such a checker would be great.

Comment author: timtyler 28 December 2010 11:02:59PM 1 point [-]
Comment author: jsteinhardt 29 December 2010 02:47:48AM *  2 points [-]

But how do we know they are bug-free and exploit-free?

I'm under the impression that current software methodologies can at least empirically guarantee error rates like 1 per million lines of code [ETA: this is wrong, should be 1 per 10,000, see child]. Since an automated proof checker requires far fewer lines than this, we can probably guarantee no errors with something like probability 99.9%, which isn't perfect, but then again a proof of friendliness is unlikely to give much better guarantees, since it might be wrong.

We can potentially add to this by not letting the AI know anything about the checker (this would actually be non-trivial, since the AI gets access to whatever information it wants about the outside world; I don't actually know how to solve this problem).

Comment author: Morendil 29 December 2010 02:52:40AM 2 points [-]

I'm under the impression that current software methodologies can at least empirically guarantee error rates like 1 per million lines of code.

Here's some evidence for you to update on.

Comment author: jsteinhardt 29 December 2010 03:45:30AM 1 point [-]

Thanks, that does bring my estimate down by about two orders of magnitude (to 1 error / 10,000 lines being the current standard for "very good" code). I think I mis-remembered the NASA one as 1 per million instead of 1 per 500,000 and thought that it was more typical than it was.

Still, 1 error per 10,000 seems to be definitely achievable today, and can probably be made higher if we need it. Hopefully by the time we're dealing with AGI we will have better metholodogies.