Johnicholas comments on A Less Wrong singularity article? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (210)
As I understand your argument, you start with an artificial mind, a potential paperclipping danger, and then (for some reason? why does it do this? Remember, it doesn't have evolved motives) it goes through a blind-spot-eradication program. Afterward, all the blind spots remaining would be self-shadowing blind spots. This far, I agree with you.
The question of how many remaining blind spots, or how big they are has something to do with the space of possible minds and the dynamics of self-modification. I don't think we know enough about this space/dynamics to conclude that remaining blind spots would have to be carefully engineered.
You have granted a GAI paperclip maximiser. It wants to make paperclips. That's all the motive it needs. Areas of competitive weakness are things that may make it get destroyed by humans. If it is destroyed by humans less paperclips will be made. It will eliminate its weaknesses with high priority. It will quite possibly eliminate all the plausible vulnerabilities and also the entire human species before it makes a single paperclip. That's just good paperclip maximising sense.
As I understand your thought process (and Steve Omohundro's), you start by saying "it wants to make paperclips", and then, in order to predict its actions, you recursively ask yourself "what would I do in order to make paperclips?".
However, this recursion will inject a huge dose of human-mind-ish-ness. It is not at all clear to me that "has goals" or "has desires" is a common or natural feature of mind space. When we study powerful optimization processes - notably, evolution, but also annealing and very large human organizations - we generally can model some aspects of their behavior as goals or desires, but always with huge caveats. The overall impression that we get of these processes, considered as minds, is that they're insane.
Insane is not the same as stupid, and it's not the same as safe.
No, goals are not universal, but it seems likely that the vN-M axioms have a pretty big basin of attraction in mind-space, that a lot of minds will become convinced that sanity is following them, causing them to pick up a utility function, which will probably not capture everything we value and could easily be as simple or as irrelevant to what we value as counting paperclips or smiles.
I think you're still injecting human-mind-ish-ness. Let me try to stretch your conception of "mind".
The ocean "wants" to increase efficiency of heat transfer from the equator to the poles. It applies a process akin to simulated annealing with titanic processing power. Has it considered the von Neumann-Morganstern axioms? Is it sane? Is it safe? Is it harnessable?
A colony of microorganisms "wants" to survive and reproduce. In an environment with finite resources (like a wine barrel) is it likely to kill itself off? Is that sane? Are colonies of microorganisms safe? Are they harnessable?
A computer program that grows out of control could be more like the ocean optimizing heat transfer, or a colony of microorganisms "trying" to survive and reproduce. The von Neumann-Morganstern axioms are intensely connected to human notions of math, philosophy and happiness. I think predicting that they're attractors in mind-space is exactly as implausible as predicting that the Golden Rule is an attractor in mind-space.
It could. But it wouldn't be an AGI. They could still become 'grey goo' though, which is a different existential threat and yes, it is one where your 'find their weakness' thing is right on the mark. Are we even talking about the same topic here?
The topic as I understand it is how the "default future" espoused by SIAI and EY focuses too much on things that look something like HAL or Prime Intellect (and their risks and benefits), and not enough on entities that display super-human capacities in only some arenas (and their risks and benefits).
In particular, an entity that is powerful in some ways and weak in other ways could reduce existential risks without becoming an existential risk.
That seems to be switching context. I was originally talking about a "superintelligence", The ocean and grey goo would clearly not qualify.
FWIW, expected utility theory is a pretty general economic idea that nicely covers any goal-seeking agent.
That sounds like the SIAI party line :-(
Machine intelligence will likely have an extended genesis at the hands of humanity - and during its symbiosis with us, there will be a lot of time for us to imprint our values on it.
Indeed, some would say this process has already started. Governments are likely to become superintelligent agents in the future - and they already have detailed and elaborate codifications of the things that many humans value negatively - in the form of their legal systems.
Evolution apparently has an associated optimisation target. See my:
http://originoflife.net/direction/
http://originoflife.net/gods_utility_function/
Others have written on this as well - e.g. Robert Wright, Richard Dawkins, John Stewart,
Evolution is rather short-sighted - and only has the lookahead capabilities that organisms have (though these appear to be improving with time). So: whether the target can be described as being a "goal" is debatable.
However, we weren't talking about evolution, we were talking about superintelligences. Those are likely to be highly goal-directed.
This is because of the natural drives that we can reasonably expect many intelligent agents to exhibit - see:
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
http://selfawaresystems.com/2009/02/18/agi-08-talk-the-basic-ai-drives/