PabloAMC's Shortform

PabloAMC

LESSWRONG
LW

PabloAMC's Shortform

by PabloAMC

2nd Sep 2023

AI Alignment Forum

1 min read

2 Ω 2

This is a special post for quick takes by PabloAMC. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

PabloAMC's Shortform

1PabloAMC

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 5:10 PM

[-]PabloAMC2y10

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.

Moderation Log