Really excited about this sequence, as I'm currently spending a lot of time on clarifying and formalizing the underlying assumptions and disagreements of what you're calling the Philosophy view (I don't completely agree with the term, but I think you're definitely pointing at important aspects of it). Hence having someone else posting on the different strengths and weaknesses of this and the Engineering view sound great!
I've just binge-read the entire sequence and thoroughly enjoyed it, thanks a lot for writing! I really like the framework of three anchors - thought experiments, humans, and empirical ML - and emphasising the strength and limitations of each and the need for all 3. Most discourse I've seen tends to strongly favour just one anchor, but 'all are worth paying attention to, none should be totalising' seems obviously true
I really enjoyed this sequence, it provides useful guidance on how to combine different sources of knowledge and intuitions to reason about future AI systems. Great resource on how to think about alignment for an ML audience.
I'm reluctant to frame engineering and philosophy as adversarial disciplines in this conversation as AI and ML research have long drawn on both. As an example, Minsky and Papert's work on the "Society of Mind" and Minsky's "Perceptrons" are hands that wash each other then reach forward to underpin much of what is now accepted in neural network research.
Moreover, there aren't just two disciplines feeding this sport; learnings have been taken from computer science, philosophy, psychology and neuroscience over the fifty odd years of AI work. The more successful ML shops have been using the higher order language of psychology to describe and intervene on operational aspects (in game AlphaGo, i.e.) and neuroscience to create the models (Hassabis, 2009).
I will be surprised if biological models of neurotransmitters don't make an appearance as an nth anchor in the next decade or so. These may well take inspiration from Patricia Churchland's decades long cross disciplinary work in philosophy and neuroscience. They may also draw from the intersection of psychology and neuroscience that is informing mental health treatments; both chemical and experiential.
This is all without getting into those fjords of philosophy in which many spend their time prioritising happiness over truth; ethics and morality... which is what I think this blogpost is really talking about when it says philosophy. Will connectionist modelling learn from and contribute to deontological, utilitarian, consequentialist and hedonistic ethics? I don't see how it cannot.
I'm reluctant to frame engineering and philosophy as adversarial disciplines in this conversation
I think it was talking about how people approach/view 'AI risk'.
I'm a newcomer here, but this issue has been bothering me for some time. You're right that these aren't necessarily adverserial disciplines for many thinkers. The adversity can perhaps be expressed culturally: first, the financial economics of technological development leaves little room for philosophy, and second, people who have an intellectual predilection, plus the resources of time and energy, for engaging in cross-disciplinary thought are probably not prevalent among those doing the everyday, boots-on-the-ground work in AI/ML.
Which is why I'm glad to have found the discussions here about AI/ML, and alas, this comment will probably not be posted. It's hard to be rational about culture, but it's the water we're swimming in.
Curated. This post cleanly gets at some core disagreement in the AI [Alignment] fields, and I think does so from a more accessible frame/perspective than other posts on LessWrong and the Alignment forum. I'm hopeful that this post and others in the sequence will enable better and more productive conversations between researchers, and for that matter, just better thoughts!
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?
We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?
Here it is! https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe
You might want to edit the description and header image.
Thanks for writing this post.
I tend to believe that we need a combination of both. Purely pursuing these problems from a philosophical perspectives risks becoming untethered from reality due to the lossiness and slipperiness of high-level abstractions. Purely pursuing these problems from an engineering mindset risks losing the forest for the trees or a failure to look outside of the present situation, even as it is rapidly changing.
I suspect that the suspicion of those in an engineering mindset towards philosophy may be a factor in the bounty that I recently posted on the potential circular dependency of counterfactuals receiving very little engagement on the issue that I was trying to highlight, despite the large number of comments.
Are you aware of other concepts that have a similar circular dependency, or have seemed to have it?
If you start following the parents from any concept you'll ultimately end up going around in circles, but:
a) Further down the chain
b) Counterfactuals or something very much like them (like possible worlds) are somewhere in the loop.
I think this makes a difference because if circularity is far down the chain then you can effectively ignore the circularity. Depending on other quite different concepts also makes it easier to ignore.
I would actually give concrete evidence in favor of what I think you're calling "Philosophy," although of course there's not much dramatic evidence we'd expect to see if the theory is wholly true.
Here, however, is a YouTube video that should really be called, Why AI May Already Be Killing You. It uses hard data to show that an algorithm accidentally created real-world money and power, that it did so by working just as intended in the narrowest sense, and that its creators opposed the logical result so much that they actively tried to stop it. (Of course, this last point had to do with their own immediate profits, not the long-term effects.)
I'd be mildly surprised, but not shocked, to find that this creation of real-world power has already unbalanced US politics, in a way which could still destroy our civilization.
Here's a link to the version on my blog: https://bounded-regret.ghost.io/appendix-more-is-different-in-related-fields/
Interesting. Reading the different paragraphs I am somewhat confused on how you classify thought experiments: part of engineering, part of philosophy, or third thing by itself?
I'd be curious to see you expand on following question: if we treat thought experiments as not being a philosophical technique, what other techniques or insights does philosophy have to offer to alignment?
Another comment: you write
When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy approach.
My recent critique here (and I expand on this in the full paper) is that the x-risk community is not in fact using a broad engineering approach to ML safety at all. What is commonly used instead is a much more narrow ML research approach, the approach which sees every ML safety problem as a potential ML research problem. On the engineering side, things need to get much more multi-disciplinary.
I really like this post, you explained your purpose in writing the sequence very clearly. Thanks also for writing about how your beliefs updated over the process of writing this.
Machine learning is touching increasingly many aspects of our society, and its effect will only continue to grow. Given this, I and many others care about risks from future ML systems and how to mitigate them.
When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy approach:
I'll discuss these approaches mainly in the context of ML safety, but the same distinction applies in other areas. For instance, an Engineering approach to AI + Law might focus on how to regulate self-driving cars, while Philosophy might ask whether using AI in judicial decision-making could undermine liberal democracy.
While Engineering and Philosophy agree on some things, for the most part they make wildly different predictions both about what the key safety risks from ML will be and how we should address them:
In my experience, people who strongly subscribe to the Engineering worldview tend to think of Philosophy as fundamentally confused and ungrounded, while those who strongly subscribe to Philosophy think of most Engineering work as misguided and orthogonal (at best) to the long-term safety of ML. Given this sharp contrast and the importance of the problem, I've thought a lot about which—if either—is the "right" approach.
Coming in, I was mostly on the Engineering side, although I had more sympathy for Philosophy than the median ML researcher (who has ~0% sympathy for Philosophy). However, I now feel that:
On the other hand, I also feel that:
I've reached these conclusions through a combination of thinking, discussing with others, and observing empirical developments in ML since 2011 (when I entered the field). I've distilled my thoughts into a series of blog posts, where I'll argue that:
This post is the introduction to the series. I'll post the next part each Tuesday, and update this page with links once the post is up. In the meantime, leave comments with any thoughts you have, or contact me if you'd like to preview the upcoming posts and leave feedback.