User Comment Replies

the loss function does not require the model to be an agent.

What worries me is that models that happen to have a secret homunculus and behave as agents would score higher than those models which do not. For example, the model could reason about itself being a computer program, and find an exploit of the physical system it is running on to extract the text it's supposed to predict in a particular training example, and output the correct answer and get a perfect score. (Or more realistically, a slightly modified version of the correct answer, to not al... (read more)

Architects of Our Own Demise: We Should Stop Developing AI Carelessly

Rusins1y10

Unfortunately I do not know the reasoning behind why the people you mentioned might not see AI as a threat, but if I had to guess – people not worried are primarily thinking about short term AI safety risks like disinformation from deepfakes, and people worried are thinking about super-intelligent AGI and instrumental convergence, which necessitates solving the alignment problem.

Alignment Implications of LLM Successes: a Debate in One Act

Rusins1y20

LESSWRONG
LW

All of Rusins's Comments + Replies