Comment Permalink

Jemal Young3y50

I mean extracting insights from capabilities research that currently exists, not changing the direction of new research. For example, specification gaming is on everyone's radar because it was observed in capabilities research (the authors of the linked post compiled this list of specification-gaming examples, some of which are from the 1980s). I wonder how much more opportunity there might be to piggyback on existing capabilities research for alignment purposes, and maybe to systemize that going forward.

See in context

11

[ Question ]

How might we make better use of AI capabilities research for alignment purposes?

by Jemal Young

31st Aug 2022

1 min read

A

1 4

11

When I check ArXiv for new AI alignment research papers, I see mostly capabilities research papers, presumably because most researchers are working on capabilities. I wonder if there’s alignment-related value to be extracted from all that capabilities research, and how we might get at it. Is anyone working on this, or does anyone have any good ideas?

AI CapabilitiesAI

Frontpage

11

New Answer

New Comment

1 Answers sorted by
top scoring

Martin Vlach

Sep 01, 2022*

20

I'm fairly interested in that topic and wrote a short draft here explaining a few basic reasons to explicitly develop capabilities measuring tools as it would improve risk mitigations. What resonates from your question is that for 'known categories' we could start from what the papers recognise and dig deeper for more fine coarsed (sub-)capabilities.

[-]gwern3y30

(Your link seems to be missing.)

Reply

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:30 PM

[-]P.3y30

Do you mean from what already exists or from changing the direction of new research?

Reply

[-]Jemal Young3y50

I mean extracting insights from capabilities research that currently exists, not changing the direction of new research. For example, specification gaming is on everyone's radar because it was observed in capabilities research (the authors of the linked post compiled this list of specification-gaming examples, some of which are from the 1980s). I wonder how much more opportunity there might be to piggyback on existing capabilities research for alignment purposes, and maybe to systemize that going forward.

Reply

Moderation Log

11

[ Question ]

How might we make better use of AI capabilities research for alignment purposes?

11

11

1 Answers sorted by top scoring

Sep 01, 2022*

1 Answers sorted by
top scoring