First, great news on founding an alignment organization on your own. While I give this work a low chance of making progress, if you succeed the benefits would be vast.
I'll pre-register a prediction. You will fail with 90% probability, but potentially usefully fail. My reasons are as follows:
Inner alignment issues have a good chance of wrecking your plans. Specifically there are issues like instrumental convergence causing deception and power-seeking by default. I notice an implicit assumption where inner alignment is either not a problem or so easy to solve by default that it's not worth worrying about. This may hold, but I suspect more likely than not not to hold.
I suspect that cultural AI is only relevant in the below human and human regime, and once above the human regime happens, there's a fairly massive incentives to simply not care about humans culture the same way way that humans don't really care about the less powerful animals. Actually bettering less powerful being's lives is very hard.
> First, great news on founding an alignment organization on your own.
Actually I founded it with my cofounder, Nick Hay!
https://www.encultured.ai/#team
Also available on the EA Forum.
Followed by: Encultured AI, Part 2 (forthcoming)
Hi! In case you’re new to Encultured AI, we’re a for-profit start-up with a public benefit mission: developing technologies promoting the long-term survival and flourishing of humanity and other sentient life. However, we also realize that AI poses an existential risk to humanity if not developed with adequate safety precautions. Given this, our goal is to develop products and services that help humanity steer toward the benefits and away from the risks of advanced AI systems. Per the “Principles” section of our homepage:
In the following, we’ll describe the AI existential safety context that motivated us to found Encultured, and go into more detail about what we’re planning to do.
What’s trending in AI x-safety?
The technical areas below have begun to receive what we call “existential attention” from AI researchers, i.e., attention from professional AI researchers thinking explicitly about the impact of their work on existential safety:
In other words, the topics above lie in the intersection of the following Venn diagram:
See Appendix 1 for examples of research in these areas. More research in these areas is definitely warranted. A world where 20%+ of AI and ML researchers worldwide pivoted to focusing on the topics above would be a better world, in our opinion.
If our product is successful, we plan to grant access to researchers inside and outside our company for performing experiments in the areas above, interacting directly with users on our platform. And, our users will be aware of this ;) We’re planning on this not only because it will benefit the world, but because it will benefit our products directly: the most valuable tools and services are trustworthy, truthful, preference-sensitive, interpretable, and robust.
What’s emerging in AI x-safety?
The following topics have received research attention from some researchers focused on existential safety, and AI research attention from other researchers, but to us the two groups don’t (yet) seem to overlap as much as for the ‘trending’ topics above.
Also see Appendix 2 for a breakdown of why we think these areas are “emerging” in AI x-safety.
What’s missing?
While continuing to advocate for the above, we’ve asked ourselves: what seems to be completely missing from research and discourse on AI existential safety? The following areas are topics that have been examined from various perspectives in AI research, but little or not at all from the perspective of x-safety:
Research in AI ethics and fairness can be viewed as addressing “health problems” at the scale of society, but these topics aren’t frequently examined from the perspective of x-safety.
Cultural acquisition is a large part of how humans align with one another’s values, especially during childhood but also continuing into adulthood. We believe attention to culture and the process of cultural acquisition is important in AI value alignment for several reasons:
To make sure these aspects of safety can be addressed on our platform, we decided to start by working on a physics engine for high-bandwidth interactions between artificial agents and humans in a virtual environment.
Recap
We think we can create opportunities for humanity to safety-test future systems, by building a platform usable for testing opportunities. We're looking to enable testing for both popular and neglected safety issues, and we think we can make a platform that brings them all together.
In our next post, we'll talk about how and why we decided to provide a consumer-facing product as part of our platform.
Followed By:
Encultured AI, Part 1 Appendix: Relevant Research Examples
Encultured AI Pre-planning, Part 2: Providing a Service