Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer.
No - not at all. Perhaps you have read too much MIRI material, and not enough of the neuroscience and machine learning I referenced. An infant is not born with human 'terminal values'. It is born with some minimal initial reward learning circuitry to bootstrap learning of complex values from adults.
Stop thinking of AGI as some wierd mathy program. Instead think of brain emulations - and then you have obvious answers to all of these questions.
Saying the phrase "safe sandbox sim" is much easier than making a virtual machine that can withstand a superhuman intelligence trying to get out of it.
You apparently didn't read my article or links to earlier discussion? We can easily limit the capability of minds by controlling knowledge. A million smart evil humans is dangerous - but only if they have modern knowledge. If they have only say medieval knowledge, they are hardly dangerous. Also - they don't realize they are in a sim. Also - the point of the sandbox sims is to test architectures, reward learning systems, and most importantly - altruism. Designs that work well in these safe sims are then copied into less safe sims and finally the real world.
Consider the orthogonality thesis - AI of any intelligence level can be combined with any values. Thus we can test values on young/limited AI before scaling up their power.
Sandbox sims can be arbitrarily safe. It is the only truly practical workable proposal to date. It is also the closest to what is already used in industry. Thus it is the solution by default.
Even if your software is perfect, it can still figure out that its world is artificial and figure out ways of blackmailing its captors
Ridiculous nonsense. Many humans today are aware of the sim argument. The gnostics were aware in some sense 2,000 years ago. Do you think any of them broke out? Are you trying to break out? How?
If it's maximizing its own utility, which is necessary if you want it to behave anything like a child, what's to stop it from learning human greed and cruelty, and becoming an eternal tyrant?
Again, stop thinking we create a single AI program and then we are done. It will be a largescale evolutionary process, with endless selection, testing, and refinement. We can select for super altruistic moral beings - like bhudda/gandhi/jesus level. We can take the human capability for altruism, refine it, and expand on it vastly.
For starters, you want to be able to prove formally that its goals will remain stable as it self-modifies,
Quixotic waste of time.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Dumb agent could also cause human extinction. "To kill all humans" is computationly simpler task than to create superintelligence. And it may be simplier by many orders of magnitude.
I seriously doubt that. Plenty of humans want to kill everyone (or, at least, large groups of people). Very few succeed. These agents would be a good deal less capable.