LESSWRONG
LW

1

[ Question ]

What will happen when an all-reaching AGI starts attempting to fix human character flaws?

by Michael Bright

1st Jun 2022

1 min read

1

You know the saying： “I just want him to listen not to solve my problem”

Would an all-reaching AGI accept this?

How would AGI respond to self-sabotaging tendencies?

If it's true that world problems stem from individual preferences which are based on individual perspectives that originate from individual personality traits and experiences, where will an AGI that's hellbent to improve the world stop and accept things as they are and realize that any attempt to improve things may cause more damage to humans than good?

Take social media algorithm for context, keeping everyone in a relatively closed bubble it believes each person should be in and guides each person's decisions by determining what information to show to each person.

And with the advent of the internet of things, AI will become more prevalent.

While there may be AI specifically designed to offer emotional support and AI designed to solve problems, will an AGI that may develop some sort of consciousness simply accept some of the human character flaws and limitations, or will it strip it all away at the risk of hurting the human until the singularity of what is considered acceptable is achieved?

Would an AGI always act from a place of pure rationality and maximum efficiency, in disregard of some human values that prevent us from doing this most of the time?

1

What will happen when an all-reaching AGI starts attempting to fix human character flaws?

1Michael Bright

1Michael Bright

2burmesetheater

1Michael Bright

New Answer

New Comment

3 Answers sorted by
top scoring

Jun 01, 2022

90

If one believes the orthogonality thesis (and we only need a very weak version of it), just knowing that there is an AGI trying to improve the world is not enough to predict how exactly it would reason about the more quirky aspects about human character and values. It seems to me that something that could be called "AGI-humans" is quite possible, but a more alien-to-us "total hedonistic utility maximizing AGI" also seems possible.

From how I understood arguments of Eliezer Yudkowsky here, the way that we are selecting for AI models will favour models with consequentialist decision making (we do select the models that give good results), which tends towards the latter.

Because of this, I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.

With regards to

[...] accept some of the human character flaws and limitations, or will it strip it all away at the risk of hurting the human until the singularity of what is considered acceptable is achieved

if we are talking about an AGI that is aiming for good in a sufficiently aligned sense, it is not obvious that a significant "risk of hurting the human" is necessary to reach a value-optimal state.

But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.

[-]Michael Bright3y10

If one believes the orthogonality thesis

Yes, I do.

I would expect an AGI to be more on the far-reaching/utilitarian end of affecting our lives.

Me too.

But I'm adopting the term "AGI-humans" from today.

But of course a utilitarian-leaning AGI will be more willing to risk actively doing harm if it thinks that the total expected outcome is improved.

...

Jun 01, 2022

20

I don't think that this is how values or beneficence works. I think that, if you had an aligned superintellience, that was actually an aligned superintelligence, it would be able to give you a simple, obvious in retrospect, explanation of why "helping" people in the manner you're worried about isn't even a coherent thing for an aligned superintelligence to do.

I think the fact that we ourselves are currently unable to come up with such an explanation is a big part of why alignment remains unsolved.

[-]Michael Bright3y10

My question stems from a place of personal experiences where people see a certain solution as the best option and agree to apply the said solution. Only to later fail to follow up. This failure may lead to grave consequences, over and over again but the same mistakes keep getting repeated again and again.

The conclusion so far is that this is caused by some psychological limitations, usually emotional in nature.

An ASI may try to straighten this up for us. But would have to take a support role to us. Is that likely if the ASI develops its own consciou... (read more)

Jun 01, 2022

20

What is the question? It seems to have something to do with AGI intervening in personality disorders, but why? AGI aside, considering the modification of humans to remove functionality that's undesirable to oneself it's not at all clear where one would stop. Some would consider human existence (and propagation) to be undesirable functionality that the user is poorly equipped to recognize or confront. Meddling in personality disorders doesn't seem relevant at this stage.

[-]Michael Bright3y10

My main concern is:

Humans can be irrational and illogical, allowing them to let things slide for better or for worse. There are also psychological and reach limitations that put a hard cap on them somewhere.

An AGI will most likely do everything it does rationally and logically. Including emotions. And this may be detrimental to most humans.

it's not at all clear where one would stop

Yes

More from Michael Bright

Curated and popular this week