The genie knows, but doesn't care
Followup to: The Hidden Complexity of Wishes, Ghosts in the Machine, Truly Part of You
Summary: If an artificial intelligence is smart enough to be dangerous, we'd intuitively expect it to be smart enough to know how to make itself safe. But that doesn't mean all smart AIs are safe. To turn that capacity into actual safety, we have to program the AI at the outset — before it becomes too fast, powerful, or complicated to reliably control — to already care about making its future self care about safety. That means we have to understand how to code safety. We can't pass the entire buck to the AI, when only an AI we've already safety-proofed will be safe to ask for help on safety issues! Given the five theses, this is an urgent problem if we're likely to figure out how to make a decent artificial programmer before we figure out how to make an excellent artificial ethicist.
I summon a superintelligence, calling out: 'I wish for my values to be fulfilled!'
The results fall short of pleasant.
Gnashing my teeth in a heap of ashes, I wail:
Is the AI too stupid to understand what I meant? Then it is no superintelligence at all!
Is it too weak to reliably fulfill my desires? Then, surely, it is no superintelligence!
Does it hate me? Then it was deliberately crafted to hate me, for chaos predicts indifference. ———But, ah! no wicked god did intervene!
Thus disproved, my hypothetical implodes in a puff of logic. The world is saved. You're welcome.
On this line of reasoning, Friendly Artificial Intelligence is not difficult. It's inevitable, provided only that we tell the AI, 'Be Friendly.' If the AI doesn't understand 'Be Friendly.', then it's too dumb to harm us. And if it does understand 'Be Friendly.', then designing it to follow such instructions is childishly easy.
The end!
...
Is the missing option obvious?
...
What if the AI isn't sadistic, or weak, or stupid, but just doesn't care what you Really Meant by 'I wish for my values to be fulfilled'?
When we see a Be Careful What You Wish For genie in fiction, it's natural to assume that it's a malevolent trickster or an incompetent bumbler. But a real Wish Machine wouldn't be a human in shiny pants. If it paid heed to our verbal commands at all, it would do so in whatever way best fit its own values. Not necessarily the way that best fits ours.
Value Deathism
I doubt human value is particularly fragile. Human value has evolved and morphed over time and will continue to do so. It already takes multiple different forms. It will likely evolve in future in coordination with AGI and other technology. I think it's fairly robust.
Like Ben, I think it is ok (if not ideal) if our descendants' values deviate from ours, as ours have from our ancestors. The risks of attempting a world government anytime soon to prevent this outcome seem worse overall.
We all know the problem with deathism: a strong belief that death is almost impossible to avoid, clashing with undesirability of the outcome, leads people to rationalize either the illusory nature of death (afterlife memes), or desirability of death (deathism proper). But of course the claims are separate, and shouldn't influence each other.
Change in values of the future agents, however sudden of gradual, means that the Future (the whole freackin' Future!) won't be optimized according to our values, won't be anywhere as good as it could've been otherwise. It's easier to see a sudden change as morally relevant, and easier to rationalize gradual development as morally "business as usual", but if we look at the end result, the risks of value drift are the same. And it is difficult to make it so that the future is optimized: to stop uncontrolled "evolution" of value (value drift) or recover more of astronomical waste.
Regardless of difficulty of the challenge, it's NOT OK to lose the Future. The loss might prove impossible to avert, but still it's not OK, the value judgment cares not for feasibility of its desire. Let's not succumb to the deathist pattern and lose the battle before it's done. Have the courage and rationality to admit that the loss is real, even if it's too great for mere human emotions to express.
Notion of Preference in Ambient Control
This post considers ambient control in a more abstract setting, where controlled structures are not restricted to being programs. It then introduces a notion of preference, as an axiomatic definition of constant (actual) utility. The notion of preference subsumes possible worlds and utility functions traditionally considered in decision theory.
Followup to: Controlling Constant Programs.
In the previous post I described the sense in which one program without parameters (the agent) can control the output of another program without parameters (the world program). These programs define (compute) constant values, respectively actual action and actual outcome. The agent decides on its action by trying to prove statements of a certain form, the moral arguments, such as [agent()=1 => world()=1000000]. When the time is up, the agent performs the action associated with the moral argument that promises the best outcome, thus making that outcome actual.
Let's now move this construction into a more rigorous setting. Consider a first-order language and a theory in that language (defining the way agent reasons, the kinds of concepts it can understand and the kinds of statements it can prove). This could be a set theory such as ZFC or a theory of arithmetic such as PA. The theory should provide sufficient tools to define recursive functions and/or other necessary concepts. Now, extend that theory by definitions of two constant symbols: A (the actual action) and O (the actual outcome). (The new symbols extend the language, while their definitions, obtained from agent and world programs respectively by standard methods of defining recursively enumerable functions, extend the theory.) With new definitions, moral arguments don't have to explicitly cite the code of corresponding programs, and look like this: [A=1 => O=10000].
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)