User Comment Replies

All AGI Safety questions welcome (especially basic ones) [April 2023]

For a RL agent, the "opioid addiction" thing could be as simple as increasing the portion of the loss proportional to weight norm. You'd expect that to cause the agent to lobotomize itself into only fulfilling the newly unlocked goal.

All AGI Safety questions welcome (especially basic ones) [April 2023]

BaseThreeDee2y10

Hello, this concerns an idea I had back in ~2014 which I abandoned because I didn't see anyone else talking about it and I therefore assumed was transparently stupid. After talking to a few researchers, I have been told the idea is potentially novel and potentially useful, so here I go (sweating violently trying to suppress my sense of transgression).

The idea concerns how one might build safety margin into AI or lesser AGI systems in a way that they can be safely iterated on. It is not intended as anything resembling a solution to alignment, just an easy-t... (read more)

1BaseThreeDee2y

LESSWRONG
LW

All of BaseThreeDee's Comments + Replies