ctic2421

Replying toMetaAI: less is less for alignment.

ctic24213y

MetaAI: less is less for alignment.

Curious if you could elaborate more on why MACHIAVELLI isn't a good test for outer alignment!

1

0

Replying toMetaAI: less is less for alignment.

ctic24213y

MetaAI: less is less for alignment.

Yep, it's a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.

1

0

Replying toChatGPT's "fuzzy alignment" isn't evidence of AGI alignment: the banana test

ctic24213y

ChatGPT's "fuzzy alignment" isn't evidence of AGI alignment: the banana test

GPT-4 seems to pass the banana test.

3

10

0

LESSWRONG
LW

LESSWRONG
LW

ctic2421

ctic2421

ctic2421