Exactly, and thanks for writing this.
I would go further and say that - AI safety is AI dev - and this happened years ago. If we stopped it all now, we'd extend our timelines:
https://www.lesswrong.com/posts/vkzmbf4Mve4GNyJaF/the-case-for-stopping-ai-safety-research
Interesting read, would be great to see more done in this direction. However,it seems that mind-body dualism is still the prevalent (dare I say "dominant") mode of understanding human will and consciousness in CS and AI-safety. In my opinion - the best picture we have of human value creation comes from social and psychological sciences - not metaphysics and mathematics - and it would be great to have more interactions with those fields.
For what it's worth I've written a bunch on agency-loss as an attractor in AI/AGI-human interactions.
Sorry, fixed broken link now.
The problem with "understanding the concept of intent" - is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent - and correlates like "well-being" mean - for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.
Seth. I just spoke about this work at ICML yesterday. Some other similar works:
Eliezers work from way back in 2004. https://intelligence.org/files/CEV.pdf. I haven't read it in full - but it's about AIs that interact with human volition - which is what I'm also worried about.
Christiano's: https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like. This is a lot about slow take offs and AI's that slowly become unstoppable or unchangeable because they become part of our economic world.
My paper on arxiv is a bit of a long read...
Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other "human values". The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn't create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next - then approaches that depend on human intent and values (broadly) are not as safe anymore.
Thanks so much for writing this, I think it's a much needed - perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it's still surprises me how little the AI-safety community touches on this central part of agency - namely that you can't have agents without this closed loop.
I've been speculating a bit (mostly to myself) about the possibility that "simulators" are already a type of organism - given that appear to do active inference - ...
Thanks for the comment Erik (and taking the time to read the post).
I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an "optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned". I see little difference - but I could be persuaded otherwise.
My post was meant to show that it's pretty easy to find significant holes in some of the most central co...
Thank you so much for this effectiveness focused post. I thought I would add another perspective, namely "against the lone wolf" approach, i.e. that AI-safety will come down to one person, or a few persons, or an elite group of engineers somewhere. I agree for now there are some individuals who are doing more conceptual AI-framing than others, but in my view I am "shocked that everyone's dropping the ball" by putting up walls and saying that general public is not helpful. Yes, they might not be helpful now, but we need to work on this!... Maybe someo...
Hi Chin. Thanks for writing this review, it seems like a well-needed and timed article - at least from my perspective as I was looking for something like this. In particular, I'm trying to frame my research interest relative to AI-safety field, but as you point out this is still too early.
I am wondering if you have any more insights for how you came up with your diagram above? In particular, are there any more peer-reviewed articles, or arXiv papers like Amodei et al (https://arxiv.org/abs/1606.06565) that you relied on? For example, I do...
Thanks for the reply Jonathan. Indeed I'm also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.
One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed - i.e. that it evolved as a by-effect/side-effect of some inter-organism communication - and now plays many other roles.
Hi Roman.
First of all, thank you so much for reading and taking the time to respond.
I don't have the time - or knowledge - to respond to everything, but from your response, I worry that my article partially missed the target. I'm trying to argue that humans may not be just - utility - maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there's no real utility for some or perhaps the most important things that we value. Seeking out "surprising" resu...
Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI - and less on misuse. I don't expect a large ai-misuse audience here.
Your response - that "truly-aligned-AI" would not change human intent - was also suggested by other AI researchers. But this doesn't address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intenti...
Great post Peter. I think a lot about whether it even makes sense to use the term "aligned AGI" as powerfull AGIs may break human intention for a number of reasons (https://www.lesswrong.com/posts/3broJA5XpBwDbjsYb/agency-engineering-is-ai-alignment-to-human-intent-enough).
I see you didn't refer to AIs become self driven (as in Omohundro: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf). Is there a reason you don't view this as part of the college kid problem?
LOL. Your question opens a can of worms. It took more than a year from when I first committed to writing about simulators, but the reason it took so long wasn't because writing the actual words in this post took a long time, rather:
Hi Charlie. Thanks for the welcome!
Indeed, I think that's a great way to put it "preserving human agency around powerful systems" (I added it to the article). Thanks for that! I am pessimistic that this is possible (or that the question makes sense as it stands). I guess what I tried to do above - was a soft argument that "intent-aligned AIs" might not make sense without further limits or boundaries on both human intent and what AIs can do.
I agree hard wiring is probably not the best solution. However, humans are probably hardwired with a bunch of to...
Thanks shminux. My apologies for the confusion, part of my point was that we don't have consensus on whether we have free will (the professional philosophers usually fall into ~60% compatibilists; but the sociologists have a different conception altogether; and the physicists etc.). I think this got lost because I was not trying to explain the philosophical position on free will. [I have added a very brief note in the main text to clarify what I think of as the "free will problem"].
The rest of the post was an attempt to argue that because human actio...
Great write up!
Why don't you do this in a mouse first? The whole cycle from birth to phenotype, including complex reasoning (e.g. bayesian inference, causality) can take 6 months.