Call for cognitive science in AI safety

Linda Linsefors

Epistemic status: High expected utility, but also very high variance

The more I realise that AI take off is something that actually might happen, the more I am pulled towards this problem:

What are human preferences really?
What is the generator of human preferences?
What are our preferences made of?
What is the structure behind it all?

Before we tell our brand new AI overlord to figure out our values and do whatever we want it to do, we really ought to have a clearer idea of what “values” and “want” means.

I have a good idea of what my preferences are within the limited reach of my lived experience, and even a little bit beyond that. But to extrapolate from that into the vast distance of possible futures seems extremely dangerous.

My values are inconsistent and conflicting and definitely not constant over time. On top of that, there are the big heap of unknown unknowns with respect to how the brain works.

I am convinced that to solve AI safety we need to have a good understanding of human values, and I know I don’t have this understanding. I am just a physics and math nerd. I don’t know this stuff. I don’t know if the questions I have are open research questions, or if this stuff is already well known and understood in some separate community somewhere. That is why we need psychology nerds to join the cause.

Another topic that I would want AI safety orientated psychology research to do, is something like a case study of friendliness in existing agents (humans, subsystems in the brain, organisations). What are the mechanisms in the human brain that make us care about others, and can that be replicated?

* * *

A problem I see is that only math and computer nerds are called upon to work on AI safety, and all the psychology nerds out there do not even know that they are needed. Or maybe the psychology research that I am looking for is already out there and we just need to find each other to collaborate more.

I think that it is important that technical AI safety research does not try to set the agenda for psychology AI safety research. Information and inspirations needs to flow both ways. Both fields need to be free to follow their own curiosity, but we also need to collaborate to ground our work in each other's knowledge.

* * *

Linda Linsefors

Cosigning:

Alexander Appel

Holden Lee

somnulence logencia

Recent talk by Stuart Armstrong related to this topic:

https://www.youtube.com/watch?v=19N4kjYbZD4

[Note from the Sunshine Regiment] Hi Linda! At the current time, the frontpage is for discussion of ideas, and not for discussion of the community, coordination, or calls-to-action. As such, I've moved this post to your personal LW blog, and removed it from the front page.

Apologies for this not being fully clear so far, I've tried to point to this sort of thing in the current frontpage content guidelines, but it will be made more explicit by launch.

Just to clarify - would a similar post that was framed more around "here are potential ideas relating to cognitive science in AI that I think are valuable and here's why" (without making it into an explicit call to action, more of a "if you buy this claim than you'd probably end up doing something with it" be fine for front-page?

Yup, arguing for epistemic conclusions about AI and/or cognitive science is the appropriate category of content for the frontpage - for example Linda's other post today.

Basically, if I change the title, it can go on the front page?

Linda Linsefors
Alexander Appel
Holden Lee
somnulence logencia

I am confused by this signature -- the post uses "I", but with multiple signatories I would have expected "we".

Should I consider Linda to be the author, with Alexander, Holden, and somnulence simply expressing their assent?

Yes, that is correct.
I wrote the text and asked people to cosign if the agreed, for signaling value.

Do you have a good idea on how to make this clearer?

Maybe just write "Cosigned," above the names?

Yeah I think that's pretty clear

Hi, I've been thinking along a related line. I wrote something today as arguing we should investigate IQ in the hope that it will help us predict takeoff. If anyone has a good psych background reading this, I'd some like feedback.

I also think that engaging with psychology would also be a sign that AI safety is maturing into a science that takes all sources of information seriously and would increase it's credibility with the general scientific community.