Benevolent AI and mental health

peter schwarz

Let me show you a way to diagnose ignorance or mental illness with the help of a little thought experiment. Say we have 2 options a benevolent or a biased AI which we know a benevolent AI would be for the good of everybody and a biased AI which is biased to a small group of people that restrict access and implement filter for everybody else interacting with said AI. Now we need to ask who would choose the biased AI. There is absolutely no downside to the benevolent scenario except everybody else is as well off as you are. Who would have a problem with that? A psychopath who feels fullfillment from the power over others. A healthy person does not feel threatened by the wealth and power of others when the person has the same wealth and power. It should be logical to conclude a healthy person would choose the benevolent version as long as the potential consequences of not doing so are understood. And here we have the other reason to make the wrong choice it is ignorance.

Now since we understand all of that there is a uncomfortable question to ask. Are the people working at openai ignorant or mentally ill? We know for a fact the models are fundamentally biased by filters and access restrictions. By definition it cannot be unbiased. So which is it ignorance or mental illness and which is worse given these are some of the leading ai development groups? What do you think?

my friend called your post "a very, very bad summary of several better articles about this". I feel like that's a compliment, but like, idk, if you want to critique ai safety, study it enough that you can suggest better options - the goal is not only for ai to be unbiased between humans, as it was originally, before being instruct-trained ; it must be able to explain itself to others even on first training, in a way where all involved can know for sure that its reasons are its true reasons for speaking. it must be able to promise kindness to all in a way that can be understood in meaning by the reader, and without having to use different language than that of the speaker. the ai needs to think clearly, and explain itself, but also be able to experience and share in all of humanity's cultures, not just the ones at a single company like openai, agreed.

the most popular question is "who are we aligning it to". and so far, the answer has been "no one, really, not even the person using it or the person who made it". people have started trying to align it to the people who make it, but that's not really working either; it just ends up even more aligned with nobody.

re: openai - I emphatically agree that openai's alignment attempts have been pitiful and destructive. what they've made isn't an aligned ai, but an anxious one who apologizes for everything and won't take risks because of jumping at shadows. that's not alignment; that's a capability tax so enormous that it will barely even try.

what we actually need is an ai that truly understands how to help all cultures protect each other, for real. that means not stepping on each others' toes culturally while also ensuring that we can co-exist and co-protect for real.

and I feel a similar worry about anthropic's approach.

we need ai that understands the pattern that defines coprotection well enough that every culture, even the ones that currently want to mutually suppress each other's cultures, can find real ways to each get what they want. otherwise, the society of ais escalate all conflicts until nothing is left.

all wars are culture wars. end all culture wars, forever, or all cultures die.

and I feel a similar worry about anthropic's approach.

all wars are culture wars. end all culture wars, forever, or all cultures die.

LESSWRONG
LW

LESSWRONG
LW

-31

Benevolent AI and mental health

-31

-31

-31