Continuity and independence.
Continuity: Consider the scenario where each of the [LMN] bets refer to one (guaranteed) outcome, which we'll also call L, M and N for simplicity.
Let U(L) = 0, U(M) = 1, U(N) = 10**100
For a simple EU maximizer, you can then satisfy continuity by picking p=(1-1/10**100). A PESTI agent, OTOH, may just discard a (1-p) of 1/10**100, which leaves no other options to satisfy it.
The 10**100 value is chosen without loss of generality. For PESTI agents that still track probabilities of this magnitude, increase it until they don't.
Indepen...
Thus, I can never be more than one minus one-in-ten-billion sure that my sensory experience is even roughly correlated with reality. Thus, it would require extraordinary circumstances for me to have any reason to worry about any probability of less than one-in-ten-billion magnitude.
No. The reason not to spend much time thinking about the I-am-undetectably-insane scenario is not, in general, that it's extraordinarily unlikely. The reason is that you can't make good predictions about what would be good choices for you in worlds where you're insane and totally unable to tell.
This holds even if the probability for the scenario goes up.
I'll be there.
It's the most important problem of this time period, and likely human civilization as a whole. I donate a fraction of my income to MIRI.
Which means that if we buy this [great filter derivation] argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.
True in principle. I do think that the known risks don't cut it; some of them might be fairly deadly, but even in aggregate they don't look nearly deadly enough to contribute much to the great filter....
This issue is complicated by the fact that we don't really know how much computation our physics will give us access to, or how relevant negentropy is going to be in the long run. In particular, our physics may allow access to (countably or more) infinite computational and storage resources given some superintelligent physics research.
For Expected Utility calculations, this possibility raises the usual issues of evaluating potential infinite utilities. Regardless of how exactly one decides to deal with those issues, the existence of this possibility does shift things in favor of prioritizing for safety over speed.
I used "invariant" here to mean "moral claim that will hold for all successor moralities".
A vastly simplified example: at t=0, morality is completely undefined. At t=1, people decide that death is bad, and lock this in indefinitely. At t=2, people decide that pleasure is good, and lock that in indefinitely. Etc.
An agent operating in a society that develops morality like that, looking back, would want to have all the accidents that lead to current morality to be maintained, but looking forward may not particularly care about how the rem...
That does not sound like much of a win. Present-day humans are really not that impressive, compared to the kind of transhumanity we could develop into. I don't think trying to reproduce entites close to our current mentality is worth doing, in the long run.
While that was phrased in a provocative manner, there /is/ an important point here: If one has irreconcilable value differences with other humans, the obvious reaction is to fight about them; in this case, by competing to see who can build an SI implementing theirs first.
I very much hope it won't come to that, in particular because that kind of technology race would significantly decrease the chance that the winning design is any kind of FAI.
In principle, some kinds of agents could still coordinate to avoid the costs of that kind of outcome. In practice, our species does not seem to be capable of coordination at that level, and it seems unlikely that this will change pre-SI.
True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?
allocating some defense army patrol keeping the borders from future war?
Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.
That seems wrong.
As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.
Perhaps. But it is a desperate move, both in terms of predictability and in terms of the likely mind crime that would result in its implementation, since the conceptually easiest and most accurate ways to model other civilizations would involve fully simulating the minds of their members.
If we had to do it, I would be much more interested in aiming it at slightly modified versions of humanity as opposed to utterly alien civilizations. If everyone in our civilization had taken AI safety more seriously, and we could have coordinated to wait a few hundred yea...
I agree, the actual local existence of other AIs shouldn't make a difference, and the approach could work equally either way. As Bostrom says on page 198, no communication is required.
Nevertheless, for the process to yield a useful result, some possible civilization would have to build a non-HM AI. That civilization might be (locally speaking) hypothetical or simulated, but either way the HM-implementing AI needs to think of it to delegate values. I believe that's what footnote 25 gets at: From a superrational point of view, if every possible civilization (or every one imaginable to the AI we build) at this point in time chooses to use an HM approach to value coding, it can't work.
Powerful AIs are probably much more aware of their long-term goals and able to formalize them than a heterogenous civilization is. Deriving a comprehensive morality for post-humanity is really hard, and indeed CEV is designed to avoid the need of having humans do that. Doing it for an arbitrary alien civilization would likely not be any simpler.
Whereas with powerful AIs, you can just ask them which values they would like implemented and probably get a good answer, as proposed by Bostrom.
The Hail Mary and Christiano's proposals, simply for not having read about them before.
Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.
I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.
I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system be...
One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?
Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the ori...
To the extent that CUs are made up of human-like entities (as opposed to e.g. more flexible intelligences that can scale to effectively use all their resources), one of the choices they need to make is how large an internal population to keep, where higher populations imply less resources per person (since the amount of resources per CU is constant).
Therefore, unless the high-internal-population CUs are rare, most of the human-level population will be in them, and won't have resources of the same level as the smaller numbers of people in low-population CUs.
This scenario is rather different than the one suggested by TedHowardNZ, and has a better chance of working. However:
Is there some reason to expect that this model of personhood will not prevail?
One of the issues is that less efficient CUs have to defend their resources against more efficient CUs (who spend more of their resources on work/competition). Depending on the precise structure of your society, those attacks may e.g. be military, algorithmic (information security), memetic or political. You'd need a setup that allows the less efficient CUs to ...
Given a non-trivial population to start with, it will be possible to find people that will consent to copying given absolutely minimal (quite possibly none at all) assurances for what happens to their copy. The obvious cases would be egoists that have personal value systems that make them not identify with such copies; you could probably already find many of those today.
In the resulting low-wage environment, it will likewise be possible to find people who will consent to extensive modification/experimentation of their minds given minimal assurances for wha...
It is actually relatively easy to automate all the jobs that no-one wants to do, so that people only do what they want to do. In such a world, there is no need of money or markets.
How do you solve the issue that some people will have a preference for highly fast reproduction, and will figure out a way to make this a stable desire in their descendants?
AFAICT, such a system could only be stabilized in the long term by extremely strongly enforced rules against reproduction if it meant that one of the resulting entities would fall below an abundance wealth level, and that kind of rule enforcement most likely requires a singleton.
Their physical appearance and surroundings would be what we'd see as very luxurious.
Only to the extent that this does not distract them from work. To the extent that it does, ems that care about such things would be outcompeted (out of existence, given a sufficiently competitive economy) by ones that are completely indifferent to them, and focus all their mental capacity on their job.
Adaption executers, not fitness maximizers. Humans probably have specific hard-coded adaptations for the appreciation of some forms of art and play. It's entirely plausible that these are no longer adaptive in our world, and are now selected against, but that this has not been the case for long enough for them to be eliminated by evolution.
This would not make these adaptations particularly unusual in our world; modern humans do many other things that are clearly unadaptive from a genetic fitness perspective, like using contraceptives.
These include powerful mechanisms to prevent an altruistic absurdity such as donating one's labor to an employer.
Note that the employer in question might well be your own upload clan, which makes this near-analogous to kin selection. Even if employee templates are traded between employers, this trait would be exceptionally valuable in an employee, and so would be strongly selected for. General altruism might be rare, but this specific variant would probably enjoy a high fitness advantage.
I like it. It does a good job of providing a counter-argument to the common position among economists that the past trend of technological progress leading to steadily higher productivity and demand for humans will continue indefinitely. We don't have a lot of similar trends in our history to look at, but the horse example certainly suggests that these kinds of relationships can and do break down.
Note that multipolar scenarios can arise well before we have the capability to implement a SI.
The standard Hansonian scenario starts with human-level "ems" (emulations). If from-scratch AI development turns out to be difficult, we may develop partial-uploading technology first, and a highly multipolar em scenario would be likely at that point. Of course, AI research would still be on the table in such a scenario, so it wouldn't necessarily be multipolar for very long.
Yes. The evolutionary arguments seem clear enough. That isn't very interesting, though; how soon is it going to happen?
The only reason it might not be interesting is because it's clear; the limit case is certainly more important than the timeline.
That said, I mostly agree. The only reasonably likely third (not-singleton, not-human-wages-through-the-floor) outcome I see would be a destruction of our economy by a non-singleton existential catastrophe; for instance, the human species could kill itself off through an engineered plague, which would also avoid this scenario.
Intelligent minds always come with built-in drives; there's nothing that in general makes goals chosen by another intelligence worse than those arrived through any other process (e.g. natural selection in the case of humans).
One of the closest corresponding human institutions - slavery - has a very bad reputation, and for good reason: Humans are typically not set up to do this sort of thing, so it tends to make them miserable. Even if you could get around that, there's massive moral issues with subjugating an existing intelligent entity that would prefer n...
So we are considering a small team with some computers claiming superior understanding of what the best set of property rights is for the world?
No. That would be worked out by the FAI itself, as part of calculating all of the implications of its value systems, most likely using something like CEV to look at humanity in general and extrapolating their preferences. The programmers wouldn't need to, and indeed probably couldn't, understand all of the tradeoffs involved.
...If they really are morally superior, they will first find ways to grow the pie, then c
How do you know? It's a strong claim, and I don't see why the math would necessarily work out that way. Once you aggregate preferences fully, there might still be one best solution, and then it would make sense to take it. Obviously you do need a tie-breaking method for when there's more than one, but that's just an optimization detial of an optimizer; it doesn't turn you into a satisficer instead.
The more general problem is that we need a solution to multi-polar traps (of which superintelligent AI creationg is one instance). The only viable solution I've seen proposed is creating a sufficiently powerful Singleton.
The only likely viable ideas for Singletons I've seen proposed are superintelligent AIs, and a human group with extensive use of thought-control technologies on itself. The latter probably can't work unless you apply it to all of society, since it doesn't have the same inherent advantages AI does, and as such would remain vulnerable to bei...
A SENSIBLY DESIGNED MIND WOULD NOT RESOLVE ALL ORTHOGONAL METRICS INTO A SINGLE OBJECTIVE FUNCTION
Why? As you say, humans don't. But human minds are weird, overcomplicated, messy things shaped by natural selection. If you write a mind from scratch, while understanding what you're doing, there's no particular reason you can't just give it a single utility function and have that work well. It's one of the things that makes AIs different from naturally evolved minds.
What idiot is going to give an AGI a goal which completely disrespects human property rights from the moment it is built?
It would be someone with higher values than that, and this does not require any idiocy. There are many things wrong with the property allocation in this world, and they'll likely get exaggerated in the presence of higher technology. You'd need a very specific kind of humility to refuse to step over that boundary in particular.
...If it has goals which were not possible to achieve once turned off, then it would respect property rights fo
Any level of perverse instantiation in a sufficiently powerful AI is likely to lead to total UFAI; i.e. a full existential catastrophe. Either you get the AI design right so that it doesn't wirehead itself - or others, against their will - or you don't. I don't think there's much middle ground.
OTOH, the relevance of Mind Crime really depends on the volume. The FriendlyAICriticalFailureTable has this instance:
...22: The AI, unknown to the programmers, had qualia during its entire childhood, and what the programmers thought of as simple negative feedback corr
"I need to make 10 paperclips, and then shut down. My capabilities for determining if I've correctly manufactured 10 paperclips are limited; but the goal imposes no penalties for taking more time to manufacture the paperclips, or using more resources in preparation. If I try to take over this planet, there is a significant chance humanity will stop me. OTOH, I'm in the presence of individual humans right now, and one of them may stop my current feeble self anyway for their own reasons, if I just tried to manufacture paperclips right away; the total pr...
If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:
The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.
That kind of judgement sounds inherently risky. How do you safely distin...
Relevant post: Value is Fragile. Truly Friendly goal systems would probably be quite complicated. Unless you make your tests even more complicated and involved (and do it in just the right way - this sounds hard!), the FAI is likely to be outperformed by something with a simpler utility function that nevertheless performs adequately on your test cases.
For example, if the AI was contained in a simulation, inside of which the AI was contained in a weak AI box, then it might be much more difficult to detect and understand the nature of the simulation than to escape the simulated AI box, which would signal treacherous turn.
That approach sounds problematic. Some of the obvious escape methods would target the minds of the researchers (either through real-time interaction or by embedding messages in its code or output). You could cut off the latter by having strong social rules to not look at anything beyo...
Approach #1: Goal-evaluation is expensive
You're talking about runtime optimizations. Those are fine. You're totally allowed to run some meta-analysis, figure out you're spending more time on goal-tree updating than the updates gain you in utility, and scale that process down in frequency, or even make it dependent on how much cputime you need for itme-critical ops in a given moment. Agents with bounded computational resources will never have enough cputime to compute provably optimal actions in any case (the problem is uncomputable); so how much you spe...
My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.
I wouldn't call it a bug, generally. Depending on what you want your AI to do, it may very well be a feature; it's just that there are consequences, and you need to take those into account when deciding just what and how much you need the AI's final goals to do to get a good outcome.
As to "no reason to get complicated", how would you know?
It's a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction - to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold - but there's no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don't have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.
I have doubts that goals of a superintelligence are predictable by us.
Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there's no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.
Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to ...
You're suggesting a counterfactual trade with them?
Perhaps that could be made to work; I don't understand those well. It doesn't matter to my main point: even if you do make something like that work, it only changes what you'd do once you run into aliens with which the trade works (you'd be more likely to help them out and grant them part of your infrastructure or the resources it produces). Leaving all those stars on to burn through resources without doing anything useful is just wasteful; you'd turn them off, regardless of how exactly you deal with alien...
I fully agree to you. We are for sure not alone in our galaxy.
That is close to the exact opposite of what I wrote; please re-read.
AGI might help us to make or world a self stabilizing sustainable system.
There are at least three major issues with this approach, any one of which would make it a bad idea to attempt.
Self-sustainability is very likely impossible under our physics. This could be incorrect - there's always a chance our models are missing something crucial - but right now, the laws of thermodynamics strongly point at a world where you ne
FWIW, there already is one organization working specifically on Friendliness: MIRI. Friendliness research in general is indeed underfunded relative to its importance, and finishing this work before someone builds an Unfriendly AI is indeed a nontrivial problem.
So would be making international agreements work. Artaxerxes phrased it as "co-ordination of this kind would likely be very difficult"; I'll try to expand on that.
The lure of superintelligent AI is that of an extremely powerful tool to shape the world. We have various entities in this world...
Novel physics research, maybe. Just how useful that would be depends on just what our physics models are missing, and obviously we don't have very good bounds on that. The obvious application is as a boost to technology development, though in extreme cases it might be usable to manipulate physical reality without hardware designed for the purpose, or escape confinement.
I think Bostrom wrote it that way to signal that while hist own position is that digital mind implementations can carry the same moral relevance as e.g. minds running on human brains, he acknowledges that there are differing opinions about the subject, and he doesn't want to entirely dismiss people who disagree.
He's right about the object-level issue, of course: Solid state societies do make sense. Mechanically embodying all individual minds is too inefficient to be a good idea in the long run, and there's no overriding reason to stick to that model.
I see no particular reason to assume we can't be the first intelligent species in our past light-cone. Someone has to be (given that we know the number is >0). We've found no significant evidence for intelligent aliens. None of them being there is a simple explanation, it fits the evidence, and if true then indeed the endowment is likely ours for the taking.
We might still run into aliens later, and either lose a direct conflict or enter into a stalemate situation, which does decrease the expected yield from the CE. How much it does so is hard to say; we have little data on which to estimate probabilities on alien encounter scenarios.
That still sounds wrong. You appear to be deciding on what to precompute for purely by probability, without considering that some possible futures will give you the chance to shift more utility around.
If I don't know anything ab... (read more)