XiXiDu comments on What I would like the SIAI to publish - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (218)
That's clear. But let me again state what I'd like to inquire. Given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), isn't the nonhazardous subset of all possible outcomes much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc? Here is where this question stems from. Given my current knowledge about AGI I believe that any AGI capable of dangerous self-improvement will be very sophisticated, including a lot of restrictions. For example, I believe that any self-improvement can only be as efficient as the specifications of its output are detailed. If for example the AGI is build with the goal in mind to produce paperclips, the design specifications of what a paperclip is will be used as leveling rule by which to measure and quantify any improvement of the AGI's output. This means that to be able to effectively self-improve up to a superhuman level, the design specifications will have to be highly detailed and by definition include sophisticated restrictions. Therefore to claim that any work on AGI will almost certainly lead to dangerous outcomes is to assert that any given AGI is likely to work perfectly well, subject to all restrictions except one that makes it hold (spatiotemporal scope boundaries). I'm unable to arrive at that conclusion as I believe that most AGI's will fail extensive self-improvement as that is where failure is most likely for that it is the largest and most complicated part of the AGI's design parameters. To put it bluntly, why is it more likely that contemporary AGI research will succeed at superhuman self-improvement (beyond learning), yet fail to limit the AGI, rather than vice versa? As I see it, it is more likely, given the larger amount of parameters to be able to self-improve in the first place, that most AGI research will result in incremental steps towards human-level intelligence rather than one huge step towards superhuman intelligence that fails on its scope boundary rather than self-improvement.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn't know what a paperclip is, then it will go and find out, using whatever means it has available. It won't give up just because you weren't detailed enough in telling it what you wanted.
Then I don't think that there is anyone working on what you are envisioning as 'AGI' right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I'm reasonably certain that if I had the same advantages an AGI would have - that is, if I were converted into an emulation and given my own source code - then I could foom. And I think any reasonably skilled computer programmer could, too.
Debugging will be PITA. Both ways.