AlphaGo versus Lee Sedol

gjm

It seems to be a combination of all of these.

Training an AI to defer to one's eventual philosophical judgments and interim method of managing uncertainty (and not falling prey to marketing worlds and incorrect but persuasive philosophical arguments etc) seems really hard, and made harder by the recursive structure in ALBA and the fact that the first level AI is sub-human in capacity which then has to handle being bootstrapped and training the next level AI. What percent of humans can accomplish this task, do you think? (I'd argue that the answer is likely zero, but certainly very small.) How do the rest use your AI?
Assuming that deferring to humans on philosophy and managing uncertainty is feasible but costly, how many people could resist dropping this feature and the associated cost, in favor of adopting some sort of straightforward utility maximization framework with a fixed utility function that they think captures most or all of their values, if that came as a suggestion from the AI with an apparently persuasive argument? If most people do this and only a few don't (and those few are also disadvantaged in the competition to capture the cosmic commons due to deciding to carry these costs), that doesn't seem like much of a win.
This is tied in with 1 and 2, in that correct meta-philosophical understanding is needed to accomplish 1, and unreasonable philosophical certainty would cause people to fail step 2.
Even if the AIs keep deferring to their human users and don't end up short-circuit their philosophical judgements, if the AI/human systems become very powerful while still having incorrect and strongly held philosophical views, that seems likely to cause disaster. We also don't have much reason to think that if we put people in such positions of power (for example, being able to act as a god in some simulation or domain of their choosing), that most will eventually realize their philosophical errors and converge to correct views, that the power itself wouldn't further distort their already error-prone reasoning processes.

Re 1:

For a working scheme, I would expect it to be usable by a significant fraction of humans (say, comparable to the fraction that can learn to write a compiler).

That said, I would not expect almost anyone to actually play the role of the overseer, even if a scheme like this one ended up being used widely. An existing analogy would be the human trainers who drive facebook's M (at least in theory, I don't know how that actually plays out). The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what t... (read more)

30

AlphaGo versus Lee Sedol

30

30

30

AlphaGo versus Lee Sedol

30

30