Can you be Not Even Wrong in AI Alignment?
In January, I submitted a response to the Eliciting Latent Knowledge problem. I received a response with some short comments on what I labeled "Experience engaging with the problem", but no response to my approach, setup, or analysis. I subsequently followed up to no further response. I suspect that the...
Great, I can see some places where I went wrong. I think you did a good job of conveying the feedback.
This is not so much a defense of what I wrote as it is an examination of how meaning got lost.
=> Of course the inner working of the Agent is known! From the very definitions you just provided, it must implement some variation on:
```
def AgentAnswer(Observable, Secret):
if Observable and not Secret:
yield True
else:
yield False
```
This would be our desired agent, but we don't get to write our agent. In the context of the "Self-contained problem statement" at https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8#heading=h.c93m7c1htwe1 , we do not get to assert anything about... (read 490 more words →)