GuySrinivasan comments on Holden Karnofsky's Singularity Institute Objection 2 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (41)
In my opinion Karnofsky/Tallinn 2011 is required reading for this objection. Here is Holden's pseudocode for a tool:
Jaan mentions that prediction_function seems to be a too-convenient rug to sweep details under. They discuss some wrappers around a tool and what those might do. In particular here are a couple of Jaan's questions with Holden's responses (paraphrased):
As we've seen from decision theory posts on LW, prediction_function has some very tricky questions around whether it's thinking about the counterfactual "if do(humans implement $action) then $outcome" or "if do(this computation outputs $action) then $outcome" or others, and other odd questions like whether it considers inferences about the state of the world given that humans eventually take $action, etc. etc. I feel like moving to a tool doesn't get rid of these problems for prediction_function.
At least in the original post, I don't think Holden's point is that tool-AI is much easier than agent-AI (though he seems to have intuition that it is), but that it's potentially much safer (largely from increased feedback), and thus that it deserves more investigation (and that it's a bad sign of SIAI that it's neglected this approach).
Yes, good point. The objection is about SI not addressing tool-AI, much of their discussion is about addressing tool-AI, not the meta "why isn't this explicitly called out by SI?" In particular the intuitions Holden has as responses to those questions, that we may well be able to create extremely useful general AI without creating general AI that can improve itself, do seem like they have received too little in-depth discussion here. We've often mentioned the possibility and often decided to skip the question because it's very hard to think about but I don't recall many really lucid conversations trying to ferret out what it would look like if more-than-narrow, less-than-having-the-ability-to-self-improve AI were a large enough target to reliably hit.
(as an aside, I think as Holden thinks of it, tool-AI could self improve, but because it's tool like and not agent-like, it would not automatically self improve. Its outputs could be of the form "I would decide to rewrite my program with code X", but humans would need to actually implement these changes.)
The LW-esque brush-off version, but there is some truth to it.
utilityfunction = constructutility_function("peace on earth");
...
report($leading_action);
$ 'press Enter for achieving peace'
2 year later... Crickets and sunshine. Only clrickets.