When I was first introduced to AI Safety, coming from a background studying psychology, I kept getting frustrated about the way people defined the and used the word "intelligence". They weren't able to address my questions about cultural intelligence, social evolution, and general intelligence in a way I found rigorous enough to be convincing. I felt like professionals couldn't answer what I considered to be basic and relevant questions about general intelligence, which meant that I took a lot longer to take AI Safety seriously than I otherwise would have. It feels possible to me that other people have run into AI Safety pitches and been turned off because of something similar -- a communication issue because both parties approached the conversation with very different background information. I'd love to try to minimize these occurrences, so if you've had anything similar happen, could you please share:
What is something that you feel AI Safety pitches usually don't seem to understand about your field/background? What's a common place where you feel you've become stuck in a conversation with AI Safety pitches? What question/information makes/made the conversation stop progressing and start circling?
(Cross-posted from the EA forum)
I think many working in AI safety, and specifically those focused on alignment, basically don't understand what values are and, alarmingly to me, haven't invested a lot of effort into figuring it out. I think this is motivated stopping, though, because figuring out what values are and how they work is hard and a problem that doesn't easily avail itself to methods well known by AI researchers, so it gets ignored or pushed off as something AI will figure out for us. As a result, most of the time value is treated as some kind of black box at worst and as an abstract mathematical construct akin to preferences at best, which is a step beyond just totally ignoring not knowing what values are but not anywhere close to where I think we'll need to be to build aligned AI.
I've tried to address this in the past, but don't know how much impact I've really had. The recent work on the shard theory of values gives me hope the situation is changing.