User Comment Replies

Evolution can only optimize over our learning process and reward circuitry, not directly over our values or cognition. Moreover, robust alignment to IGF requires that you even have a concept of IGF in the first place. Ancestral humans never developed such a concept, so it was never useful for evolution to select for reward circuitry that would cause humans to form values around the IGF concept.

Another example may be lactose tolerance. First you need animal husbandry and dairy production, then you get selective pressure favoring those who can reliably process lactose, without the "concept of husbandry" there's no way for the optimizer to select for it.

Alignment, Goals, and The Gut-Head Gap: A Review of Ngo. et al.

mr-ubik2y*20

Couple of questions wrt to this:

Could LLMs develop the type of self awareness you describe as part of their own training or RL-based fine-tuning? Many LLM do seem to have "awareness" of their existence and function (incidentally this could be evidenced by the model evals run by Anthropic). I assume a simple future setup could be auto-GPT-N with a prompt like "You are the CEO of Walmart, you want to make the company maximally profitable" in that scenario I would contend that the Agent could be easily aware of both its role and function and easily be attra

mr-ubik2y10

LESSWRONG
LW

All of mr-ubik's Comments + Replies