I get that the shut up and multiply calculation will make it worth trying even if the odds are really low, but my emotions don't respond very well to low odds.

I can override that to some degree, but, at the end of the day, it'd be easier if the odds were actually pretty decent.

Comment or PM me, it's not something I've heard much about.

New to LessWrong?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 10:51 PM

I'd guess 1%. The small minority of AI researchers working on FAI will have to find the right solutions to a set of extremely difficult problems on the first try, before the (much better funded!) majority of AI researchers solve the vastly easier problem of Unfriendly AGI.

1%? Shouldn't your basic uncertainty over models and paradigms be great enough to increase that substantially?

"Friendliness" is a rag-bag of different things -- benevolence, absence of malevolence, the ability to control a system whether it's benevolent or malevolent , and so on. So the question is somewhat ill-posed.

As far as control goes, all AI projects involve an element of control, because if you can;t get the AI to do what you want, it is useless.So the idea that AI and FAI are disjoint is wrong.

We can't use frequentists probability approach to one time events, but as we are Bayesian, we could try to bet. (Betting also don't work fine, as nobody will get the bet). I prefer defining the probability of remote future events as "share of runs of our best world model, resulting in a given outcome", as it could be updated as we improve our world model.

My current bet P(Benevolent superintelligence|superintelligence) = 0.25

How I got it: more or less based on the gut feeling - too much or too less. I know it is the wrong approach, but I hope to improve it when I will get better world model.

I think it's about a 0.75 probability, conditional upon smarter-than-human AI being developed. Guess I'm kind of an optimist. TL;DR I don't think it will be very difficult to impart your intentions into a sufficiently advanced machine.

I don't think it will be very difficult to impart your intentions into a sufficiently advanced machine

Counterargument: it will be easy to impart an approximate version of your intentions, but hard to control the evolution of those values as you crank up the power. E.g. evolution, humans, make us want sex, we invent condoms.

No-one will really care about this until it's way too late and we're all locked up in nice padded cells and drugged up, or something equally bad but hard for me to imagine right now.

Exactly ZERO.

Nobody knows what's "friendly" (you can have "godly" there, etc. - with more or less the same effect).

Worse, it may easily turn out that killing all humanimals instantly is actually the OBJECTIVELY best strategy for any "clever" Superintelligence.

It may be even proven that "too much intelligence/power" (incl. "dumb" AIs) in the hands of humanimals with their DeepAnimal brains ("values", reward function) is a guaranteed fail, leading sooner or later to some self-destructive scenario. At least up to now it pretty much looks like this even for an untrained eye.

Most probably the problem will not be artificial intelligence, but natural stupidity.

Exactly ZERO.

...

Zero is not a probability! You cannot be infinitely certain of anything!

Nobody knows what's "friendly" (you can have "godly" there, etc. - with more or less the same effect).

By common usage in this subculture, the concept of Friendliness has a specific meaning-set attached to it that implies a combination of 1) a know-it-when-I-see-it isomorphism to common-usage 'friendliness' (e.g. "I'm not being tortured"), and 2) A deeper sense in which the universe is being optimized by our own criteria by a more powerful optimization process. Here's a better explanation of Friendliness than the sense I can convey. You could also substitute the more modern word 'Aligned' with it.

Worse, it may easily turn out that killing all humanimals instantly is actually the OBJECTIVELY best strategy for any "clever" Superintelligence.

I would suggest reading about the following:

Paperclip Maximizer Orthogonality Thesis The Mere Goodness Sequence. However, in order to understand it well you will want to read the other Sequences first. I really want to emphasize the importance of engaging with a decade-old corpus of material about this subject.

The point of these links is that there is no objective morality that any randomly designed agent will naturally discover. An intelligence can accrete around any terminal goal that you can think of.

This is a side issue, but your persistent use of the neologism "humanimal" is probably costing you weirdness points and detracts from the substance of the points you make. Everyone here knows humans are animals.

Most probably the problem will not be artificial intelligence, but natural stupidity.

Agreed.

Depending on the formality and requirements to qualify as "Friendly" as opposed to just "doesn't destroy humanity pretty quickly", I put it between 0% and 40%.

I will put it under 10 percent. AI progress is too fast and SIAI pardon me, MIRI has till now no kind of a code to experiment with.

On a related question, if Unfriendly Artificial Intelligence is developed, how "unfriendly" is it expected to be? The most plausible sounding outcome may be human extinction. The worst case scenario could be if the UAI actively tortures humanity, but I can't think of many scenarios in which this would occur.

I would only expect the latter if we started with a human-like mind. A psychopath might care enough about humans to torture you; an uFAI not built to mimic us would just kill you, then use you for fuel and building material.

(Attempting to produce FAI should theoretically increase the probability by trying to make an AI care about humans. But this need not be a significant increase, and in fact MIRI seems well aware of the problem and keen to sniff out errors of this kind. In theory, an uFAI could decide to keep a few humans around for some reason - but not you. The chance of it wanting you in particular seems effectively nil.)

I think 50% is a reasonable belief given the very limited grasp of the problem we have.

Most of the weight on success comes from FAI being quite easy, and all of the many worries expressed on this site not being realistic. Some of the weight for success comes from a concerted effort to solve hard problems.

Given that world GDP growth continues for at least another century, 100%. :)