Continuity and independence.
Continuity: Consider the scenario where each of the [LMN] bets refer to one (guaranteed) outcome, which we'll also call L, M and N for simplicity.
Let U(L) = 0, U(M) = 1, U(N) = 10**100
For a simple EU maximizer, you can then satisfy continuity by picking p=(1-1/10**100). A PESTI agent, OTOH, may just discard a (1-p) of 1/10**100, which leaves no other options to satisfy it.
The 10**100 value is chosen without loss of generality. For PESTI agents that still track probabilities of this magnitude, increase it until they don't.
Indepen...
Thus, I can never be more than one minus one-in-ten-billion sure that my sensory experience is even roughly correlated with reality. Thus, it would require extraordinary circumstances for me to have any reason to worry about any probability of less than one-in-ten-billion magnitude.
No. The reason not to spend much time thinking about the I-am-undetectably-insane scenario is not, in general, that it's extraordinarily unlikely. The reason is that you can't make good predictions about what would be good choices for you in worlds where you're insane and totally unable to tell.
This holds even if the probability for the scenario goes up.
I'll be there.
It's the most important problem of this time period, and likely human civilization as a whole. I donate a fraction of my income to MIRI.
Which means that if we buy this [great filter derivation] argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.
True in principle. I do think that the known risks don't cut it; some of them might be fairly deadly, but even in aggregate they don't look nearly deadly enough to contribute much to the great filter....
This issue is complicated by the fact that we don't really know how much computation our physics will give us access to, or how relevant negentropy is going to be in the long run. In particular, our physics may allow access to (countably or more) infinite computational and storage resources given some superintelligent physics research.
For Expected Utility calculations, this possibility raises the usual issues of evaluating potential infinite utilities. Regardless of how exactly one decides to deal with those issues, the existence of this possibility does shift things in favor of prioritizing for safety over speed.
I used "invariant" here to mean "moral claim that will hold for all successor moralities".
A vastly simplified example: at t=0, morality is completely undefined. At t=1, people decide that death is bad, and lock this in indefinitely. At t=2, people decide that pleasure is good, and lock that in indefinitely. Etc.
An agent operating in a society that develops morality like that, looking back, would want to have all the accidents that lead to current morality to be maintained, but looking forward may not particularly care about how the rem...
That does not sound like much of a win. Present-day humans are really not that impressive, compared to the kind of transhumanity we could develop into. I don't think trying to reproduce entites close to our current mentality is worth doing, in the long run.
While that was phrased in a provocative manner, there /is/ an important point here: If one has irreconcilable value differences with other humans, the obvious reaction is to fight about them; in this case, by competing to see who can build an SI implementing theirs first.
I very much hope it won't come to that, in particular because that kind of technology race would significantly decrease the chance that the winning design is any kind of FAI.
In principle, some kinds of agents could still coordinate to avoid the costs of that kind of outcome. In practice, our species does not seem to be capable of coordination at that level, and it seems unlikely that this will change pre-SI.
True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?
allocating some defense army patrol keeping the borders from future war?
Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.
That seems wrong.
As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.
Perhaps. But it is a desperate move, both in terms of predictability and in terms of the likely mind crime that would result in its implementation, since the conceptually easiest and most accurate ways to model other civilizations would involve fully simulating the minds of their members.
If we had to do it, I would be much more interested in aiming it at slightly modified versions of humanity as opposed to utterly alien civilizations. If everyone in our civilization had taken AI safety more seriously, and we could have coordinated to wait a few hundred yea...
I agree, the actual local existence of other AIs shouldn't make a difference, and the approach could work equally either way. As Bostrom says on page 198, no communication is required.
Nevertheless, for the process to yield a useful result, some possible civilization would have to build a non-HM AI. That civilization might be (locally speaking) hypothetical or simulated, but either way the HM-implementing AI needs to think of it to delegate values. I believe that's what footnote 25 gets at: From a superrational point of view, if every possible civilization (or every one imaginable to the AI we build) at this point in time chooses to use an HM approach to value coding, it can't work.
Powerful AIs are probably much more aware of their long-term goals and able to formalize them than a heterogenous civilization is. Deriving a comprehensive morality for post-humanity is really hard, and indeed CEV is designed to avoid the need of having humans do that. Doing it for an arbitrary alien civilization would likely not be any simpler.
Whereas with powerful AIs, you can just ask them which values they would like implemented and probably get a good answer, as proposed by Bostrom.
The Hail Mary and Christiano's proposals, simply for not having read about them before.
Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.
I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.
I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system be...
One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?
Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the ori...
To the extent that CUs are made up of human-like entities (as opposed to e.g. more flexible intelligences that can scale to effectively use all their resources), one of the choices they need to make is how large an internal population to keep, where higher populations imply less resources per person (since the amount of resources per CU is constant).
Therefore, unless the high-internal-population CUs are rare, most of the human-level population will be in them, and won't have resources of the same level as the smaller numbers of people in low-population CUs.
That still sounds wrong. You appear to be deciding on what to precompute for purely by probability, without considering that some possible futures will give you the chance to shift more utility around.
If I don't know anything ab... (read more)