I just posted another LW post that is related to this here: https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/intent-alignment-should-not-be-the-goal-for-agi-x-risk
Thanks.
There seems to be pretty wide disagreement about how intent-aligned AGI could lead to a good outcome.
For example, even in the first couple comments to this post:
Thanks for those links and this reply.
1.
for a sufficiently powerful AI trained in the current paradigm, there is no goal that it could faithfully pursue without collapsing into power seeking, reward hacking, and other instrumental goals leading to x-risk
I don't see how this is a counterargument to this post's main claim:
P(misalignment x-risk | intent-aligned AGI) >> P(misalignment x-risk | societally-aligned AGI).
That problem of the collapse of a human provided goal into AGI power-seeking seems to apply just as much to the problem of inte...
It's definitely not the case that:
all of our intents have some implied "...and do so without disrupting social order."
There are many human intents that want to disrupt social order, and more generally cause things that are negative for other humans.
And that is one of the key issues with intent alignment.
Relatedly, Cullen O'Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J/p/9RZodyypnWEtErFRM
We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is:
- Bad in expectation for the long-term future.
- Easier to bridge than the gap between intent alignment and deeper alignment with moral truth.
- Therefore worth addressing.
As a follow-up here, to expand on this a little more:
If we do not yet have sufficient AI safety solutions, advancing general AI capabilities may not be desirable because it leads to further deployment of AI and to bringing AI closer to transformative levels. If new model architectures or training techniques were not going to be developed by other research groups within a similar timeframe, then that increases AI capabilities. The specific capabilities developed for Law-Informed AGI purposes may be orthogonal to developments that contribute toward general A...
Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable.
There is definitely room for ethics outside of the law. When increasingly autonomous systems are navigating the world, it is important for AI to attempt to understand (or at least try to predict) moral judgements of humans encountered.
However, imbuing an understanding of an ethical framew...
law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of "what should the AI's values be?" would be "aligned with the person who's using it", known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI.
The law-informed AI framework sees intent alignment as (1.) something that private...
Thank you for this detailed feedback. I'll go through the rest of your comments/questions in additional comment replies. To start:
What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren't necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?
Given that progress in AI capabilities research is driven, in large part, by shared benchmarks that thousands of researchers globally use to guide...
This is a great point.
Legal tech startups working on improving legal understanding capabilities of AI has two effects.
We should definitely invest efforts in understanding the boundaries where AI is a pure tool just making humans more efficient in their work on law-making and where AI is doing truly substantive work in making law. I will think more about how to start to define that and what research of this nature would look like. Would love suggestions as well!
Thanks for the reply.
Good idea. Will do!
Regarding, "Any AGI is highly likely to understand democratic laws."
There is likely much additional work to be done to imbue a comprehensive understanding of law in AGI systems -- in particular many of our legal standards (versus rules, which are easier and are already legible to AI) and many nuanced processes that are only in human legal expert minds right now. Making those things structured enough for a computational encoding is not easy.
If we solve that, though, there is still the work to be done on (1.) verifying AGI legal understandings (and AI ...
I don't think anyone is claiming that law is "always humane" or "always just" or anything of that nature.
This post is claiming that law is imperfect, but that there is no better alternative of a synthesized source of human values than democratic law. You note that law is not distinguished from "other forms of nonfiction or for that matter novels, poetry, etc" in this context, but the most likely second best source of a synthesized source of human values would not be something like poetry -- it would be ethics. And, there are some critical distinguishing fa...
Any thoughts on whether (and how) the generalized financing mechanism might apply to any AI Safety sub-problems?
Thanks so much for sharing that paper. I will give that a read.