All of John Nay's Comments + Replies

Thanks so much for sharing that paper. I will give that a read.

I just posted another LW post that is related to this here: https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/intent-alignment-should-not-be-the-goal-for-agi-x-risk

Thanks. 

There seems to be pretty wide disagreement about how intent-aligned AGI could lead to a good outcome. 

For example, even in the first couple comments to this post: 

  1. The comment above (https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/?commentId=zpmQnkyvFKKbF9au2) suggests "wide open decentralized distribution of AI" as the solution to making intent-aligned AGI deployment go well. 
  2. And this comment I am replying to here says, "I could see the concerns in this post being especially important if things work out such that a full soluti
... (read more)

Thanks for those links and this reply.

1. 

for a sufficiently powerful AI trained in the current paradigm, there is no goal that it could faithfully pursue without collapsing into power seeking, reward hacking, and other instrumental goals leading to x-risk

I don't see how this is a counterargument to this post's main claim:

P(misalignment x-risk | intent-aligned AGI) >> P(misalignment x-risk | societally-aligned AGI). 

That problem of the collapse of a human provided goal into AGI power-seeking seems to apply just as much to the problem of inte... (read more)

It's definitely not the case that:

all of our intents have some implied "...and do so without disrupting social order."

There are many human intents that want to disrupt social order, and more generally cause things that are negative for other humans. 

And that is one of the key issues with intent alignment.

2Gunnar_Zarncke
I don't disagree. Intent alignment requires solving social alignment. But I think most people here understand that to be the case.

Relatedly, Cullen O'Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J/p/9RZodyypnWEtErFRM

We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is:

  1. Bad in expectation for the long-term future.
  2. Easier to bridge than the gap between intent alignment and deeper alignment with moral truth.
  3. Therefore worth addressing.
1John Nay
I just posted another LW post that is related to this here: https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/intent-alignment-should-not-be-the-goal-for-agi-x-risk

As a follow-up here, to expand on this a little more:

If we do not yet have sufficient AI safety solutions, advancing general AI capabilities may not be desirable because it leads to further deployment of AI and to bringing AI closer to transformative levels. If new model architectures or training techniques were not going to be developed by other research groups within a similar timeframe, then that increases AI capabilities. The specific capabilities developed for Law-Informed AGI purposes may be orthogonal to developments that contribute toward general A... (read more)

Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable. 

 

There is definitely room for ethics outside of the law. When increasingly autonomous systems are navigating the world, it is important for AI to attempt to understand (or at least try to predict) moral judgements of humans encountered. 

However, imbuing an understanding of an ethical framew... (read more)

law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of "what should the AI's values be?" would be "aligned with the person who's using it", known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI. 

 

The law-informed AI framework sees intent alignment as (1.) something that private... (read more)

2John Nay
Relatedly, Cullen O'Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J/p/9RZodyypnWEtErFRM

Thank you for this detailed feedback. I'll go through the rest of your comments/questions in additional comment replies. To start:

What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren't necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?

Given that progress in AI capabilities research is driven, in large part, by shared benchmarks that thousands of researchers globally use to guide... (read more)

This is a great point.

Legal tech startups working on improving legal understanding capabilities of AI has two effects.

  1. Positive: improves AI understanding of law and furthers the agenda laid out in this post.
  2. Negative: potentially involves AI in the law-making (broadly defined) process. 

We should definitely invest efforts in understanding the boundaries where AI is a pure tool just making humans more efficient in their work on law-making and where AI is doing truly substantive work in making law. I will think more about how to start to define that and what research of this nature would look like. Would love suggestions as well!

1John Nay
As a follow-up here, to expand on this a little more: If we do not yet have sufficient AI safety solutions, advancing general AI capabilities may not be desirable because it leads to further deployment of AI and to bringing AI closer to transformative levels. If new model architectures or training techniques were not going to be developed by other research groups within a similar timeframe, then that increases AI capabilities. The specific capabilities developed for Law-Informed AGI purposes may be orthogonal to developments that contribute toward general AGI work. Technical developments achieved for the purposes of AI understanding law better that were not going to be developed by other research groups within a similar timeframe anyway are likely not material contributors to accelerating timelines for the global development of transformative AI.  However, this is an important consideration for any technical AI research – it's hard to rule out AI research contributing in at least some small way to advancing capabilities – so it is more a matter of degree and the tradeoffs of the positive safety benefits of the research with the negative of the timeline acceleration. Teaching AI to better understand the preferences of an individual human (or small group of humans), e.g. RLHF, likely leads to additional capabilities advancements faster and to the type of capabilities that are associated with power-seeking of one entity (human, group of humans, or AI), relative to teaching AI to better understand public law and societal values as expressed through legal data. Much of the work on making AI understand law is data engineering work, e.g., generating labeled court opinion data that can be employed in evaluating the consistency of agent behavior with particular legal standards. This type of work does not cause AGI timeline acceleration as much as work on model architectures or compute scaling.

Thanks for the reply. 

  1. There does seem to be legal theory precise enough to be practically useful for AI understanding human preferences and values. To take just one example: the huge amount of legal theory on the how to craft directives. For instance, whether to make directives in contracts and legislation more of a rule nature or a standards nature. Rules (e.g., “do not drive more than 60 miles per hour”) are more targeted directives than standards. If comprehensive enough for the complexity of their application, rules allow the rule-maker to have mo
... (read more)

Regarding, "Any AGI is highly likely to understand democratic laws."

 

There is likely much additional work to be done to imbue a comprehensive understanding of law in AGI systems -- in particular many of our legal standards (versus rules, which are easier and are already legible to AI) and many nuanced processes that are only in human legal expert minds right now. Making those things structured enough for a computational encoding is not easy.

If we solve that, though, there is still the work to be done on (1.) verifying AGI legal understandings (and AI ... (read more)

I don't think anyone is claiming that law is "always humane" or "always just" or anything of that nature.

This post is claiming that law is imperfect, but that there is no better alternative of a synthesized source of human values than democratic law. You note that law is not distinguished from "other forms of nonfiction or for that matter novels, poetry, etc" in this context, but the most likely second best source of a synthesized source of human values would not be something like poetry -- it would be ethics. And, there are some critical distinguishing fa... (read more)

6Zac Hatfield-Dodds
Thanks for an excellent reply! One possible crux is that I don't think that synthesized human values are particularly useful; I'd expect that AGI systems can do their own synthesis from a much wider range of evidence (including law, fiction, direct observation, etc.). As to the specific points, I'd respond: * There is no unified legal theory precise enough to be practically useful for AI understanding human preferences and values; liberal and social democracies alike tend to embed constraints in law, with individuals and communities pursuing their values in the lacunae. * The rigorous tests of legal theories are carried out inside the system of law, and bent by systems of unjust power (e.g. disenfranchisement). We cannot validate laws or legal theories in any widely agreed-upon manner. * Law often lacks settled precedent, especially regarding new technologies, or disagreements between nations or different cultures. * I reject the assertion that imposition by a government necessarily makes a law legitimate. While I agree we don't have a mechanism to 'align the rest of the humans' with a theory or meta-theory, I don't think this is relevant (and in any case it's equally applicable to law). * I agree that "moral lock-in" would be a disaster. However, I dispute that law accurately reflects the evolving will of citizens; or the proposition that so reflecting citizen's will is consistently good (c.f. reproductive rights, civil rights, impacts on foreign nationals or future generations...) These points are about law as it exists as a widely-deployed technology, not idealized democratic law. However, only the former is available to would-be AGI developers! Law does indeed provide useful evidence about human values, coordination problems, and legitimacy - but this alone does not distinguish it.

Any thoughts on whether (and how) the generalized financing mechanism might apply to any AI Safety sub-problems?