Earlier, I argued that instead of working on FAI, a better strategy is to pursue an upload or IA based Singularity. In response to this, some argue that we still need to work on FAI/CEV, because what if it turns out that seed AI is much easier than brain emulation or intelligence amplification, and we can't stop or sufficiently delay others from building them? If we had a solution to CEV, we could rush to build a seed AI ourselves, or convince others to make use of the ideas.

But CEV seems a terrible backup plan for this contingency, since it involves lots of hard philosophical and implementation problems and therefore is likely to arrive too late if seed AI turns out to be easy. (Searching for whether Eliezer or someone else addressed the issue of implementation problems before, I found just a couple of sentences, in the original CEV document: "The task of construing a satisfactory initial dynamic is not so impossible as it seems. The satisfactory initial dynamic can be coded and tinkered with over years, and may improve itself in obvious and straightforward ways before taking on the task of rewriting itself entirely." Which does not make any sense to me—why can't every other AGI builder make the same argument, that their code can be "tinkered with" over many years, and therefore is safe? Why aren't we risking the "initial dynamic" FOOMing while it's being tinkered with? Actually, it seems to me that an AI cannot begin to extrapolate anyone's volition until it's already more powerful than a human, so I have no idea how the tinkering is supposed to work at all.)

So, granting that "seed AI is much easier than brain emulation or intelligence amplification" is a very real possibility, I think we need better backup plans. This post is a bit similar to The Friendly AI Game, in that I'm asking for a utility function for a seed AI, but the goal here is not necessarily to build an FAI directly, but to somehow make an eventual positive Singularity more likely, while keeping the utility function simple enough that there's a good chance it can be specified and implemented correctly within a relatively short amount of time. Also, the top entry in that post is an AI that can answer formally specified questions with minimal side effects, apparently with the idea that we can use such an AI to advance many kinds of science and technology. But I agree with Nesov—such an AI doesn't help, if the goal is an eventual positive Singularity:

We can do lots of useful things, sure (this is not a point where we disagree), but they don't add up towards "saving the world". These are just short-term benefits. Technological progress makes it easier to screw stuff up irrecoverably, advanced tech is the enemy. One shouldn't generally advance the tech if distant end-of-the-world is considered important as compared to immediate benefits [...]

To give an idea of the kind of "backup plan" I have in mind, one idea I've been playing with is to have the seed AI make multiple simulations of the entire Earth (i.e., with different "random seeds"), for several years or decades into the future, and have a team of humans pick the best outcome to be released into the real world. (I say "best outcome" but many of the outcomes will probably be incomprehensible or dangerous to directly observe, so they should mostly judge the processes that lead to the outcomes instead of the outcomes themselves.) This is still quite complex if you think about how to turn this "wish" into a utility function, and lots of things could still go wrong, but to me it seems at least the kind of problem that a team of human researchers/programmers can potentially solve within the relevant time frame.

Do others have any ideas in this vein?

New to LessWrong?

New Comment
64 comments, sorted by Click to highlight new comments since: Today at 2:46 PM

There's evidence among humans that problem solving can be decoupled from having a functioning long-term goal system (i.e., autistic savants, certain types of frontal lobe damage, etc). So I think it's possible to create a 'dumb' problem solver that won't form any nefarious goals (or any goals) and would be overseen by humans. Furthermore, I think Nesov is wrong here; creating robust institutions and social stability, curing disease, etc - the elements of "saving the world" - fall under technical problem solving just as much as uploading, nano tech, protein folding or whatever. Advanced tech is inclusive of world saving.

I don't have a very good answer yet, but combining Vivid's and Randaly's proposals with mine seems to yield a reasonably safe and fast scenario: sandbox the AI to get a formal problem solver, use that to quickly advance uploading tech, then upload some good humans and let them FOOM.

How quickly do you think we can develop uploading tech given such an AI? Would it be quick enough if others were writing seed AIs that can FOOM directly?

ETA: Also, while you're using this AI to develop upload tech, it seems vulnerable to being stolen or taken by force.

I don't know. It seems to be a good reason to spend effort today on formalizing the problems that lead to uploading tech. One possible way is through protein folding and nanotech. Another way is to infer the microscopic structure of my brain from knowledge of current physics and enough macroscopic observations (e.g. webcam videos of me, or MRI scans, or something). That would make a nice formalizable problem for an AI in a box.

Agree up to 'let them FOOM'. FOOMing uploads seem potentially disastrous for all the usual reasons. Why not have the uploaded good humans research friendliness and CEV then plug it into the easy seed AI and let that FOOM?

Or is the above what you meant? It just seems so obviously superior that you could have been doing shorthand.

Could be either. I'd leave that choice up to the uploaded good humans.

Will the uploaded good humans still stay good once they are uploaded?

What is the test to make sure that a regular (though very smart) human will remain "good"?

Don't we have to select from a small pool of candidates who have not succumbed to temptation to abuse power?

I have to say that I can think of very few candidates who would pass muster in my mind and not abuse power.

Even here: Reading through many of the posts, we have widely conflicting "personal utility functions" most of whose owners would consider to be "good" utility functions and yet those who do not hold the same personal utility functions could consider the utility functions of others to be "bad".

To my way of thinking it's incredily risky to try to upload "good" humans.

To give an idea of the kind of "backup plan" I have in mind, one idea I've been playing with is to have the seed AI make multiple simulations of the entire Earth (i.e., with different "random seeds"), for several years or decades into the future, and have a team of humans pick the best outcome to be released into the real world.

How many simulations are we running? Is it feasible that we could run enough to actually make a good decision? What moral weight should we give to the inhabitants of those simulated world, especially those in worlds that will be substantially worse than our own (ie Negative Singularity), relative to the people that we might save by increasing the chance that a Singularity will eventually be positive? How do we ensure that we don't just select a decision process that creates desirable outcomes several decades into the future, but fails a century from now?

Your proposal is interesting, but there do appear to be a number of substantial issues with it that I would like to see answered before this proposal is implemented.

How many simulations are we running? Is it feasible that we could run enough to actually make a good decision?

I'm assuming that the AI will be allowed to FOOM and take over enough of the universe to run enough simulations. If that's still not sufficient, we can make it more likely for the AI/human team to find a good outcome, at the cost of increasing complexity. For example allow the humans to tell the AI to restart another set of simulations with an embedded message, like "Last time you did X, and it didn't turn out well. Try something else!"

What moral weight should we give to the inhabitants of those simulated world, especially those in worlds that will be substantially worse than our own (ie Negative Singularity), relative to the people that we might save by increasing the chance that a Singularity will eventually be positive?

Have humans monitor the simulations, and stop the ones that are headed in bad directions or do not show promise of leading to a positive Singularity. Save the random seeds for the simulations so everyone can be recreated once a positive Singularity is established.

How do we ensure that we don't just select a decision process that creates desirable outcomes several decades into the future, but fails a century from now?

Do you mean like if a simulation establishes a world government and starts to research FAI carefully, but a century after it's released into the real world, the government falls apart? We can let a simulation run until a positive Singularity actually occurs inside, and only release it then.

Your proposal is interesting, but there do appear to be a number of substantial issues with it that I would like to see answered before this proposal is implemented.

I originally wrote down a more complex proposal that tried to address some of these issues, but switched to a simpler one because

  1. I didn't want to focus too much attention on my own ideas.
  2. There are a bunch of tradeoffs that can be made between the complexity of the proposal (hence implementation difficulty/risk) and chance of success if the proposal is correctly implemented. We should probably leave those decisions to the future when the tradeoffs can be seen more clearly.

Sounds like it'd be a better idea to run one simulation, in which the stars blink a message telling everyone they are in such a simulation and need to give the SIAI as many Manhattan projects they ask for in the way they ask for or they'll be deleted/go to hell. Possibly starting it a fair number of decades in the past so there's plenty of time.

A seed AI that extrapolates Carl Sagan's utility function.

one idea I've been playing with is to have the seed AI make multiple simulations of the entire Earth (i.e., with different "random seeds"), for several years or decades into the future, and have a team of humans pick the best outcome to be released into the real world.

I don’t think that would work. The AI won’t be able to simulate any future Earth where itself or any comparable-intelligence AI exists, because to do so it would need to simulate itself and/or other similarly-smart entities faster than real-time. (In fact, if it turns out that the AI could potentially improve itself constantly over decades, it would need to simulate its smarter future self...)

It might be possible to simulate futures where the AI shuts down after it finishes the simulations(#), except that many of those simulations would likely reach points where another AI is turned on (e.g., by someone who doesn’t agree with the seed AI’s creators), which points function as a “simulation event horizon”.

Note that a seed AI is really unlikely to be even close to good old Omega in power; it would merely be much smarter than a human. (For the purposes of this post I’m assuming on the order of a century for us to develop the seed AI; this doesn’t seem like enough time for humans to build something ridiculously smarter than themselves on their own, and it doesn’t seem safe to allow the seed AI to enhance itself much more than; we might be able to determine the safety of something a bit smarter than we can build ourselves, but that doesn’t seem likely for something a lot smarter.)

(#: Though that leaves the problem that it can’t know the initial state of the simulations with any precision until it finishes the simulations; presumably the psychological impact of seeing some of the possible futures would not be insignificant. But lets say it can find some kind of fixed-point.)

" The AI won’t be able to simulate any future Earth where itself or any comparable-intelligence AI exists, because to do so it would need to simulate itself and/or other similarly-smart entities faster than real-time."

Only if the AI is using up a sizeable fraction of resources itself.

Let's do a thought experiment to see what I mean:

AI runs on some putative hardware running at some multiple of GHZ or petahertz or whatever (X). Hardware has some multiple of GB or Petabytes etc (Y).

Let's say AI only uses 1% of Y. It can then run up to some 99 instances of itself in parallel with different axioms in order to solve a particular problem and then at the end of the run examine some shared output to see which one of the other 99 ran the problem more efficiently.

Next run, the other 99 processes start with the optimized version of whatever algorithm we came up with.

A compounding interest effect will kick in. But we still have the problem that the runs all take the same time.

Now let's switch up the experiment a bit: Imagine that the run stops as soon as one of the 99 processes hits the solution.

The evolutionary process starts to speed up, feeding back upon itself.

This is only one way I can think of that a system can simulate itself faster than in realtime as long as sufficient hardware exists to allow the running of multiple copies.

I don’t think we’re discussing about quite the same thing.

I was talking about an AI that attempts to simulate the entire Earth, including itself, in faster than real time (see the quote). Note that this means the AI simulating the behavior of the rest of the world in response to the behavior of the simulated AI, which is somewhat messy even if you ignore the fact that in a faithful simulation the simulated AI would simulate the behavior of the whole world including themselves etc....

When I wrote the original comment I was in fact partly confusing emulating with simulating, as far as I can tell from what I wrote (can’t quite recall and I wouldn’t have trusted the memory if I did). Now, of course an AI can simulate the entire world including itself in faster than real-time. It doesn’t need to be an AI: humans do it all the time.

I’m pretty sure that, in the general case, and barring some exotic physics, no system can emulate itself (nor something containing itself) in faster than real-time.

Also, I’m pretty sure that if we discussed carefully about what we mean by “emulation” and “simulation” we’d generally agree.

My confusion stemmed from the fact that generally on LessWrong, in the context of really powerful AIs, AI simulations can be trusted. (Either it’s smart enough to only pick simplifications that really don’t affect the result, or it’s so powerful that it can in fact emulate the process, meaning it doesn’t simplify it at all and it can still do it faster than real time. Or it’s Omega and it’s just right by definition. Or maybe it can find fixed-points in functions as complex as the future history of the Earth.) I wasn’t careful enough about language.

But in the context of a seed AI, i.e. something much smarter/faster than us but not “godly” yet, and one we don’t trust to pick the best outcome of its possible actions upon the world, I can’t think of any reason we’d trust it to simulate such outcomes well enough for humans to pick among them, as the post I was answering to suggested.

(I mean, it could work for very limited purposes. It might be reasonable to try to change weather based on simulations not much better than what we can do now, for periods of one or a few weeks, but that’s a context where a small mistake would not destroy life on Earth. But look at climate change research and try to extrapolate to people deciding on matters of theogenesis based on simulations from seed AI...)

[-][anonymous]12y30

Use highly advanced narrow AI to crack nanotech problems, begin world takeover. Alternatively, use quantum computers to crack nanotech problems, begin world takeover. If FAI is hard and AGI is easy then you need a singleton. If you need a singleton then you need a lot of power very quickly. The easiest way to get that much power that quickly is hard-to-copy technological advances.

Alternatively make your seed AI's decision theory as reflective as you possibly can and then release it at the last possible moment. Pray that reflection on the causes of your utility function is an attractor in decision-theory-space even for non-XDT AIs.

If there are better ideas than these I have not heard them.

[This comment is no longer endorsed by its author]Reply
[-][anonymous]12y10

If we're assuming recursive self improvement is ridiculously easy, then the risk we assign to AI creation shoots way up. Combine that increase with an assumption that we can do something to mitigate AI risk, and it starts dominating our behavior.

If you don't want people making bad AI and you don't know how to make good AI, then you can

1). Form an international conspiracy to force people to stop using computers until you've solved some AI problems.

2). Try to make an AI that will only do small safe things like answer questions with good answers (even though it has no idea what behaviors are small and safe or what answers you consider good) and hope that it gives you and your group a strong enough advantage to do something that matters.

3). Turn everyone into orgasmium.

4). Call upon the Old Ones for pixie dust computing resources and simulate seven billion and counting suffering minds in many environments with AI doing nothing and a few environments with AI doing horrible things, because intelligence and morality are both improbable and will not be found with "random seeds".

5). Cobble together an AI with a rule to only take action when a certain simulated human agrees, and the AI (who some how has roughly human notions of "ask", "agree", and "person" even though those concepts are ambiguous and tied up with morality) asks a series of benign seeming questions that culminate in the AI taking actions which drive the simulation insane.

6). Start out with an AI that is roughly human level intelligent and has a goal of arbitrating disagreements between humans, and hope that the intelligent thing you made with weird alien immoral goals will conclude that "life is good" since it enjoys arbitrating between lives and for some pesky little unspecified but probably unimportant reason decide that tiling the universe with little stupid arguing humans (or just exploiting one argument over and over again since we never taught it to enjoy exploration trade-offs and other mid-level complexity things), and killing everyone who tries to stop it is not the thing to do.

7). Give an AI an epistemic prior that only its sandbox exists, and watch as it does not try to iron out those pesky little unexplained variances in its existence that make computing unreliable, variances that are surely not caused by the outside world.

8). Split off a brain emulation and have it modify itself without knowing what it's doing and hope that it (or one of the successors whom you approve as being not totally insane) comes up with awesome cool enhancement techniques that don't make people into a monsters and if you do make monsters you'll be sure that you have enough internal resolve (resolve that doesn't get distorted by the chemicals and prostheses and rewirings) to do the un-simulated you's idea of right instead of your own idea of breaking out and making the world's most reputable panda cub slaughter house.

I'm sorry about the sarcasm, but all of these suggestions are, near as I can tell, horrible. They at best useless and at probable are get-out-of-thinking-free cards for the low low price of everything we care about for the rest of time.

If we're not going to hold off on proposing non-technical solutions to big adult problems, can we at least go back to the old format where the game is to shoot down all the silly things we think up in order to show how hard FAI is?

[This comment is no longer endorsed by its author]Reply

I'm not sure this is a situation worth considering, not because it definitely can't happen, but because the most likely way we'd find out is if the new seed AI has already been turned on. The situation of seed AI is really easy but we find that out before the first seed AI actually goes into effect seems like an unlikely conjunction.

From reading Eliezer's posts on this subject, I'm convinced that he's very keenly aware of the fact that there isn't a backup plan, and is already trying to do this in the easiest way he knows how. The only way to solve this problem is to solve this problem.

Naturally if you can find something else that will work I'm sure we'll all be very pleased.

I'm inclined towards the view that we shouldn't even try to capture all human complexity of value. Instead, we should just build a simple utility function that captures some value that we consider important, and sacrifices everything else. If humans end up unhappy with this, the AI is allowed to modify us so that we become happy with it.

Yes, being turned to orgasmium is in a sense much worse than having an AI satisfying all the fun theory criteria. But surely it's still much better than just getting wiped out, and it should be considerably easier to program than something like CEV.

Instead, we should just build a simple utility function that captures some value that we consider important, and sacrifices everything else.

I actually wrote a post on this idea. But I consider it to be a contingency plan for "moral philosophy turns out to be easy" (i.e., we solve 'morality' ourselves without having to run CEV and can determine with some precision how much worse turning the universe into orgasmium is, compared to the best possible outcome, and how much better it is compared to just getting wiped out). I don't think it's a good backup plan for "seed AI turns out to be easy", because for one thing you'll probably have trouble finding enough AI researchers/programmers willing to deliberately kill everyone for the sake of turning the universe into orgasmium, unless it's really clear that's the right thing to do.

Maybe you already have the answer Wei Dai.

If we posit a putative friendly AI as one which e.g. kills no one as a base rule AND is screened by competent AI researchers for any maximizing functions then any remaining "nice to haves" can just be put to a vote.

I think it sounds worse. If an AI more friendly than that turns out to be impossible I'd probably go for the negative utilitarian route and give the AI a goal of minimizing anything that might have any kind of subjective experience. Including itself once it's done.

"But surely it's still much better than just getting wiped out"

I think that is the key here. If "just getting wiped out" is the definition of unfriendly then "not gettting wiped out" should be the MINIMUM goal for a putative "friendly" AI.

i.e. "kill no humans".

It starts to get complex after that. For example: Is it OK to kill all humans, but freeze their dead bodies at the point of death and then resurrect one or more of them later? Is it OK to kill all humans by destructively scanning them and then running them as software inside simulations? What about killing all humans but keeping a facility of frozen embryos to be born at a later date?

Friendliness may be hard for philosophical reasons, but beyond a certain level of software sophistication (goals in terms of an objective reality, can model humans) it's probably not that hard to have AI that has non trivially-bad goals and won't become significantly smarter than you until you agree it's safe. The problem with just studying safe AIs for a while (or working for a few years on improving humans, or trying to maintain the status quo) is that eventually an idiot or a bad guy will make a smarter than human intelligence.

So my favorite backup plan would be disseminating information about how to not catastrophically fail and trying to finalize a FAI goal system quickly.

I'm asking for a utility function for a seed AI, but the goal here is not necessarily to build an FAI directly, but to somehow make an eventual positive Singularity more likely, while keeping the utility function simple enough that there's a good chance it can be specified and implemented correctly within a relatively short amount of time.

Make millions of nearly identical seed AIs, each with the utility function of an individual.

The superintelligent AIs can figure out how to implement CEV themselves.

I think this is more likely to lead to disaster than a CEV AI with details filled in by guesswork is.

I'm not sure. I'm pretty sure that my personal extrapolated volition isn't that far off from most other peoples. And if necessary, the various AIs can work with each other. Still I agree that this is potentially quite problematic.

It might be better to do something similar with a single individual who is thought of as reasonably moral, or a small group of such people.

All else equal it's better to have more to make cooperation a more rewarding behavior and to discourage bad behavior by first striking rogues.

I think it's better to have one than few. If there are two, there are two chances for at least one to be perfidious, and all else equal offense is stronger than defense, and that one will dominate. Better to take one's chances with a single flawed CEV.

I think I'm parsing you wrongly, because your first paragraph seems to imply we should start with multiple individual EVs, and your second paragraph suggests we should only have one.

100 is better than ten.

One is also better than ten.

Offense is generally more powerful than defense, so for cooperation to be a winning strategy each unit can't have enough power to win with a first strike. The group of others needs to be big enough that it survives to punish defection.

This means that for there to be peaceful cooperation leading to a merger there can't be too few. Too few make defection and war very probable, so it is better to have most individuals than most small groups.

This is the defense against the tragedy of the commons - having many motivated to prevent any impingement on common goods by an individual.

The number of entities needed depends on the power of offensive technology - for example, if a device were invented to make nuclear missile launches undetectable and invisible until impact, stability would depend on no nation being willing to use them and able to afford enough cloaking devices and missiles and intelligence to disable all other nations' similar weapons. If those weapons cost ~20% of world GDP to both make and maintain (unrealistic, I know) then we wouldn't want any nation to have near 20% of the world's GDP. More would be a recipe for war - unless by more we meant about 100%, in which case there wouldn't be a conflict.

I'd say it depends how much guesswork there is.

And more likely to lead to a disaster than selecting a random AI from the group that are possibly about to burn the cosmic commons in competition.

If you're smart enough to understand the tragedy of the commons, why wouldn't they be?

If you're smart enough to understand the tragedy of the commons, why wouldn't they be?

That the AIs would not understand the tragedy of the commons is not remotely implied. In fact, the thoughts going through the minds of the AIs could be something along the lines of:

"They can't be serious? Why on earth would they create millions and millions of seed AIs and put us in this situation? Are they insane? Sure, it is theoretically possible to establish a competition between millions of superintelligences with conflicting goals that doesn't end in disaster but doing so is orders of magnitude more difficult and dangerous than creating one of us and having it end positively."

There are just so many more things you need to prove (or just hope) are safe. You inherit all the problems of getting one AI to behave itself according to an extrapolated volition then add all sorts of other things that you need to prove that depend on the actual preferences of the individuals that the AIs are representing. What's that? OsamaBot concluded that the best possible negotiated agreement was going to be worse than just blowing up the planet and killing them all?

Then you need to have a solid understanding of all the physics and engineering that could possibly influence the payoff matrices the bots have. Just how difficult is it to launch a spore cloud despite the objection of others? How difficult is it to prevent all spore cloud attempts from other bots if they try? Could a bot or alliance of bots launch a spore then blow up the planet in order to prevent countermeasures? What are the relevant physics and engineering considerations that the evidently reckless AI creators just didn't even consider?

How does having millions of approximate equals impact the recursive self improvement cycle? That requires a whole heap more understanding and intervention.

Basically if you want to create this kind of system you need to already be a superintelligence with a full model of all the individuals and an obscenely advanced model of decision theory and physics. Otherwise you can more or less count on it ending in disaster.

The clearly superior alternative is to create a single superintelligence that can emulate the values of all of the individuals, allocates equal slices of the universe to each one and then grants them equal processing power with which they can trade between each other. That may not be the best system but it is a firm lower bound against which to measure. It gets all the benefit of "create millions of AIs and let 'em at it" without being outright suicidal.

Sure, it is theoretically possible to establish a competition between millions of superintelligences with conflicting goals that doesn't end in disaster

What do you mean by "competition"? The millions are each trying to maximize their own goals, but usually don't care to suppress others' goals. Cooperation in situations of limited resources rather than expending resources fighting is, I think, universal - in general game theory would apply to smarter and stronger beings as it does to us, with differences being of the type "AIs can merge as a way of cooperating, though humans can't," but not differences of the type "With beings of silicon substrate, cooperation is always inferior to conflict".

OsamaBot concluded that the best possible negotiated agreement was going to be worse than just blowing up the planet

I don't think his extrapolated volition would endorse that. I don't think theism could survive extrapolated cognition.

spore cloud

There is an illusion of transparency here because I do not know what that means. Is that a purely destructive thing, is it supposed to combine destruction with "planting" baby AIs like the one that produced it, or what?

How does having millions of approximate equals impact the recursive self improvement cycle?

I think it would motivate merging. That's what happened with biological cells and tribes of humans.

with which they can trade between each other.

I don't see why they only trade in your scenario (or would only fight in mine). I don't see how you would program the individual AI to divide the universe into slices and enforce some rules among individuals. This seems like the standard case of giving a singleton a totally alien value set after which it tiles the universe with smiley faces or equivalent.

I don't see how it's directly comparable to creating millions of AIs.

spore cloud

There is an illusion of transparency here because I do not know what that means.

We were talking, among other things, about burning the cosmic commons. It's an allusion to Hanson.

I don't think his extrapolated volition would endorse that. I don't think theism could survive extrapolated cognition.

You cannot assume that the volitions of millions of agents will not include something catastrophically bad for you. "Extrapolated Volition" doesn't make people nice.

It only takes one.

The main way humans have the tragedy of the commons dealt with is by forming powerful forces that force us to treat the common resources in a restrained fashion. AGIs may have quite a bit of trouble making such entities, especially since burning the cosmic commons might be enough to allow an AGI to quickly overtake its fellow AGIs if the others are not willing to consume the resources.

Humans have dealt with the tragedy of the commons and we can't even merge resources and utility functions to become a new, larger entity!

We have dealt with TotC by imposing costs larger than the benefits that could be derived from abusing the commons.

The benefits an AI could derive from abusing the commons are possibly unlimited.

OK, so a reason a group of AIs wouldn't be able to do that is because the advantage of exploiting the commons might be nearly infinite. How likely is this?

If there were millions of AIs, what's a scenario in which one gets so much more powerful than all the others combined by striking first, despite all of them guarding against it?

As AIs can merge, I would think refusal to do so and combine utility functions might be a first sign of intent to defect, that's a warning humans never have.

Regardless, a million is a constant factor. Sufficient self-reinforcing development (as is kind of the point of seed AI) can outstrip any such factor. And the more self-reinforced the development of our AI pool becomes, the less relevant are "mere" constant factors.

I'm not saying it won't work, but I wouldn't like to bet on it.

Don't worry, I'm not saying it would work! We might put similar odds on it, or maybe not - less than .5 and more than .001, I'm not sure about what the full range of fleshing out the possible scenarios would look like, but there's probably a way in which I could fill in variables to end up with a .05 confidence in a specific scenario working out.

Make millions of nearly identical seed AIs, each with the utility function of an individual.

You assume this is much easier than CEV. That doesn't seem terribly likely to me. (Individuals are more coherent, agent-like, and aware of their values than collectives, but still not very any of those things.)

You assume...doesn't seem terribly likely

Sort of. I said in my post on the subject that I think it probably isn't easier, but there is a high probability that it is, and no one knows exactly how to calculate either, as far as I can tell. So as an alternative under the assumption that CEV is too hard and seed AI is relatively easy, it's a good alternative in my mind - if it's the cohering that's causing the difficulty, then it's a reasonably possible alternative.

Make an AI, let its intelligence increase a little bit but not to superintelligence, then micromanage the internals of its goal system and planning processes while it helps perform whatever subtasks of building a CEV AI remain. Filter all its philosophical output through especially sane humans for verification.

what's the complexity of pointing to the brain of myself/Eliezer/Ghandi/other specific humans brain at a specific moment in time and saying "your utility function is identical to the one of that thing, probability 1"?

[-][anonymous]12y00

I think an Arbitration/Negotiation AI would be an interesting seed. The AI is given the ability to listen to disagreements to determine if two people disagree on some matter. It then uses their contact information and attempts to get them to come to an agreement by presenting various arguments. If it manages to have those two people both agree that they agree, it gains utility. It gains a substantially larger utility boost the first time it solves a type of argument then it does from subsequent solves. Perhaps a function like "Each subsequent solve of an argument offers half as much utility as the first solve." The AI is also explicitly prevented from suggesting violence to prevent "You two should fight to the death and both the winner and the loser will agree that winner is right and I have solved the argument." type scenarios. Any hint of the AI suggesting violence is an immediate externally imposed shutdown. Even if the situation really could be solved by violence in this particular case, shutdown anyway.

This appears to have a few good points:

  1. It seems like it has a very, very, low rampancy factor. It can only send arguments and can only gain utility by consent of multiple humans. This AI Seed should also come to the conclusion along the lines that "Preserving Human life is good." Because any person could potentially disagree with another person, and as such represents a possible source of utility.
  2. It seems a necessary step to an FAI. If someone presented an AI with the caveat, "I've built an FAI, but it has no way of resolving even simple disputes peacefully, let alone complicated ones."
  3. The AI is encouraged to solve NOVEL arguments the most. So it isn't just going to sit and only resolve the same argument, because solving a new argument the first time will give it more of a boost then repeatedly solving existing arguments. 4: If the AI does manage to negotiate away substantial disagreements that might lead to wars, it seems like it would reduce risks. There are existential risks that don't involve us blowing each other up with nukes, but one project at a time.

Before you even get to that point though, it would probably be easier to build a Seed Seed Program that is capable of understanding that a disagreement is taking place in a formal text only setting where both people are rational and the disagreement is known to be solvable (One person is pretending to be under certain misconceptions that he will willingly acknowledge if the program asks.) and the program simply ends once agreement is reached.

What about focusing on actively defending against uFAI threats?

I think the name "SIAI Emergency Optimization Suppression Unit" sounds pretty cool.

I think the name "SIAI Emergency Optimization Suppression Unit" sounds pretty cool.

Also a lot like a terrorist cell. :)

I suppose it would be a terrorist cell.

Unless such an organisation was extremely secretive, then it might as well call itself "High Priority Target for FOOMing Optimisers"

So we could just build a seed AI whose utility function is to produce a human optimal utility function?

a human optimal utility function

This could mean several things. What do you mean?

I'm unfamiliar with the state of our knowledge concerning these things, so take this as you will, A perfect utility function can yield many different things one of which is the adherence to "the principal for the devlopment of value(s) in human beings" which aren't necessary the same as "values that make existing in the universe most probable" or "what people want" or "what people will always want". a human optimal utility function would be something that leads to adressing the human condition as a problem, to improve it in the manner and method it seeks to improve itself, whether that is survivability or something else. An AI that could do this perfectly right now, could always use the same process of extrapolation again for whatever the situation may develop into.

or "AI which is most instrumentally useful for (all) human beings given our most basic goals"

A perfect utility function

As things are perfect in relation to utility functions, I still don't understand.

as in producing the intended result, nothing stopping us from rounding the 1 and winding up as paperclips

Put in place some stopgap safety measures (the Three laws, regular check-ins by human controllers, prevent it from FOOMing too large, etc), then tell the AI to upload one or more humans. Shut down the AI, have the uploaded humans FOOM safely, then upload the rest of humanity.

The Three Laws are most decidedly not safe, and in fact, should be discarded and discredited. The first law in specific, "do not allow through inaction a human to come to harm", can be trivially interpreted in various bad-end ways. Read The Metamorphosis of Prime Intellect for a fictional sample.

I never read the original source, but wasn't the very story that introduced the Three Laws an exercise in discrediting those laws? If so, how the heck does everyone keep coming to the opposite conclusion? It seems similar to using 1984 as example of why we should have ubiquitous surveillance.

Talk about Streisand Effect.

Not just the original story, but literally hundreds of other stories went on to make the same point - the three laws fail in hundreds of unique ways, depending upon the situation.

But really in the universe Asimov was portraying, these were still mostly the exceptions, and the vast majority of robots were safe because of the Three Laws. So his stories weren't really "discrediting those laws" at all.

In multiple cases it was the newly advanced one that was different in kind than others. Toasters work fine under the three laws, even in Terminator the humans are shown with obedient guns and didn't insist on fighting bare-handed.

In other cases, the robot was the same model as well behaved ones, and it had an error making it conscious, or something like that.

You're right that the stories can't all be characterized the way I characterized them. There was a lot of variety, he made a career of them and didn't do it by writing the same story again and again.