wedrifid comments on Not Taking Over the World - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (91)
Perhaps "Master, you now hold the ring, what do you wish me to turn the universe into?" isn't a question you have to answer all at once.
Perhaps the right approach is to ask yourself "What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?"
For example, are we less human or compassionate now we have Google available, than we were before that point?
Supposing an AI researcher, a year before the Google search engine was made available on the internet, ended up with 'the ring'. Suppose the researcher had asked the AI to develop for the researcher's own private use an internet search engine of the type that existing humans might create with 1000 human hours of work (with suitable restrictions upon the AI on how to do this, including "check with me before implementing any part of your plan that affects anything outside your own sandbox") and then put itself on hold to await further orders once the engine had been created. If the AI then did create something like Google, without destroying the world, then did put itself fully on hold (not self modifying, doing stuff outside the sandbox, or anything else except waiting for a prompt) - would that researcher then be in a better position to make their next request of the AI? Would that have been a strict improvement on the researcher's previous position?
Imagine a series of milestones on the path to making a decision about what to do with the universe, and work backwards.
You want a human or group of humans who have their intelligence boosted to make the decision instead of you?
Ok, but you don't want them to lose their compassion, empathy or humanity in the process. What are the options for boosting and what does the AI list as the pros and cons (likely effects) of each process?
What is the minimum significant boost with the highest safety factor? And what person or people would it make sense to boost that way? AI, what do you advise are my best 10 options on that, with pros and cons?
Still not sure, ok, I need a consensus of few top non-boosted currently AI researcher, wise people, smart people, etc. on the people to be boosted and the boosting process, before I 'ok' it. The members of the consensus group should be people who won't go ape-shit, who can understand the problem, who're good at discussing things in groups and reaching good decisions, who'll be willing to cooperate (undeceived and uncoerced) if it is presented to them clearly, and probably a number of other criteria that perhaps you can suggest (like will they politic and demand to be boosted themselves?). Who do you suggest? Options?
Basically, if the AI can be trusted to give honest good advice without hiding an agenda that's different from your own expressed one, and if it can be trusted to not, in the process of giving that good advice, do anything external that you wouldn't do (such as slaying a few humans as guinea pigs in the process of determining options for boosting), then that's the approach I hope the researcher would take: delay the big decisions, in favour of taking cautious minimally risky small steps towards a better capacity to make the big decision correctly.
Mind you, those are two big "if"s.
You can get away with (in fact, strictly improve the algorithm by) using only the second of the two caution-optimisers there, so: "What is the smallest step I can take that has the lowest risk of not being a strict improvement over the current situation?"
Naturally when answering the question you will probably consider small steps - and in the unlikely even that a large step is safer, so much the better!
Assuming the person making the decision is perfect at estimating risk.
However since the likelihood is that it won't be me creating the first ever AI, but rather that the person who does is reading this advice, I'd prefer to stipulate that they should go for small steps even if, in their opinion, there is some larger step that's less risky.
The temptation exists for them to ask, as their first step, "AI of the ring, boost me to god-like wisdom and powers of thought", but that has a number of drawbacks they may not think of. I'd rather my advice contain redundant precautions, as a safety feature.
"Of the steps of the smallest size that still advances things, which of those steps has the lowest risk?"
Another way to think about it is to take the steps (or give the AI orders) that can be effectively accomplished with the AI boosting itself by the smallest amount. Avoid, initially, making requests that to accomplish the AI will need to massively boost itself; if you can improve your decision making position just through requests that the AI can handle with its current capacity.
Or merely aware of the same potential weakness that you are. I'd be overwhelmingly uncomfortable with someone developing a super-intelligence without the awareness of their human limitations at risk assessment. (Incidentally 'perfect' risk assessment isn't required. They make the most of whatever risk assessment ability they have either way.)
I consider this a rather inferior solution - particularly in as much as it pretends to be minimizing two things. Since steps will almost inevitably be differentiated by size the assessment of lowest risks barely comes into play. An algorithm that almost never considers risk rather defeats the point.
If you must artificially circumvent the risk assessment algorithm - presumably to counter known biases - then perhaps make the "small steps" a question of satisficing rather than minimization.
Good point.
How would you word that?