Split the cake into parts and provide it to different extrapolated volitions that didn't cohere, after allocating some defense army patrol keeping the borders from future war?
allocating some defense army patrol keeping the borders from future war?
Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.
But if CEV doesn't give the same result when seeded with humans from any time period in history, I think that means it doesn't work, or else that human values aren't coherent enough for it to be worth trying.
That seems wrong.
As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.
Do you think the Hail Mary approach could produce much value?
Perhaps. But it is a desperate move, both in terms of predictability and in terms of the likely mind crime that would result in its implementation, since the conceptually easiest and most accurate ways to model other civilizations would involve fully simulating the minds of their members.
If we had to do it, I would be much more interested in aiming it at slightly modified versions of humanity as opposed to utterly alien civilizations. If everyone in our civilization had taken AI safety more seriously, and we could have coordinated to wait a few hundred years to work out the issues before building one, what kind of AI would our civilization have produced? I suspect the major issue with this approach is formalizing "If everyone in our civilization had taken AI safety more seriously" for the purpose of aiming an HM-implementing AI at those possibilities in particular.
Bostrom's Hail Mary approach involves the AI entirely gathering its information about what other AIs would want from its own mental modeling (p199/294 footnote 25). It seems strange then that it could do this if it thought there really was another AI out there, but not if it thought there were not. Why can't it just do what it would do if there were one?
I agree, the actual local existence of other AIs shouldn't make a difference, and the approach could work equally either way. As Bostrom says on page 198, no communication is required.
Nevertheless, for the process to yield a useful result, some possible civilization would have to build a non-HM AI. That civilization might be (locally speaking) hypothetical or simulated, but either way the HM-implementing AI needs to think of it to delegate values. I believe that's what footnote 25 gets at: From a superrational point of view, if every possible civilization (or every one imaginable to the AI we build) at this point in time chooses to use an HM approach to value coding, it can't work.
In Bostrom's Hail Mary approach, why is it easier to get an AI to care about another AI's values than about another civilization's values? (p198)
Powerful AIs are probably much more aware of their long-term goals and able to formalize them than a heterogenous civilization is. Deriving a comprehensive morality for post-humanity is really hard, and indeed CEV is designed to avoid the need of having humans do that. Doing it for an arbitrary alien civilization would likely not be any simpler.
Whereas with powerful AIs, you can just ask them which values they would like implemented and probably get a good answer, as proposed by Bostrom.
What did you find most interesting this week?
The Hail Mary and Christiano's proposals, simply for not having read about them before.
What do you think of Ernest Davis' view? Is the value loading problem a problem?
Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.
I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.
I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system becoming much better at answering moral questions than any individual humans are right now. I would be extremely surprised if we don't end up losing at least one order of magnitude of utility to this approach, and it wouldn't surprise me at all if it turns out to produce a hellish environment in short order. The cost is too high.
The superintelligence might rationally decide, like the King of Brobdingnag, that we humans are “the most pernicious race of little odious vermin that nature ever suffered to crawl upon the surface of the earth,” and that it would do well to exterminate us and replace us with some much more worthy species. However wise this decision, and however strongly dictated by the ultimate true theory of morality, I think we are entitled to object to it, and to do our best to prevent it.
I don't understand what scenario he is envisioning, here. If (given sufficient additional information, intelligence, rationality and development time) we'd agree with the morality of this result, then his final statement doesn't follow. If we wouldn't, it's a good old-fashioned Friendliness failure.
Did anyone else immediately try to come up with ways Davis' plan would fail? One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable? I know EY has written many times before about a "giant logical function that computes morality", but this puts that notion in a bit of a different light for me. Anyway, I'm sure there other less obvious ways Davis' plan could go wrong too. I also suspect he's sneaking a lot into that little word, "disapprove".
In general though, I'm continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that "common-sense" approaches have, still say "Okay, but why couldn't we just do [idea I came up with in five seconds]?"
One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?
Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the original referent, not any proxy that it looks at, but there's no particular reason to think that's impossible.
In general though, I'm continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that "common-sense" approaches have, still say "Okay, but why couldn't we just do [idea I came up with in five seconds]?"
Agreed.
One of the issues is that less efficient CUs have to defend their resources against more efficient CUs (who spend more of their resources on work/competition)
I am assuming (for now), a monopoly of power that enforces law and order and prevents crimes between C.U.s.
Note that CUs that spend most of their resources on instantiating busy EMs will probably end up with more human-like population per CU, and so (counting in human-like entities) may end up dominating the population of their society unless they are rare compared to low-population, high-subjective-wealth CUs.
I don't follow this. Can you elaborate?
To the extent that CUs are made up of human-like entities (as opposed to e.g. more flexible intelligences that can scale to effectively use all their resources), one of the choices they need to make is how large an internal population to keep, where higher populations imply less resources per person (since the amount of resources per CU is constant).
Therefore, unless the high-internal-population CUs are rare, most of the human-level population will be in them, and won't have resources of the same level as the smaller numbers of people in low-population CUs.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
That itself would go against some values.
E.R. Eddision, The Worm Ouroboros
True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?