throwaway8238 — LessWrong

LESSWRONG
LW

Replying toCan you be Not Even Wrong in AI Alignment?

Can you be Not Even Wrong in AI Alignment?

Great, I can see some places where I went wrong. I think you did a good job of conveying the feedback.

This is not so much a defense of what I wrote as it is an examination of how meaning got lost.

=> Of course the inner working of the Agent is known! From the very definitions you just provided, it must implement some variation on:

```
def AgentAnswer(Observable, Secret):
if Observable and not Secret:
yield True
else:
yield False
```

This would be our desired agent, but we don't get to write our agent. In the context of the "Self-contained problem statement" at https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8#heading=h.c93m7c1htwe1 , we do not get to assert anything about... (read 490 more words →)

Replying toCan you be Not Even Wrong in AI Alignment?

throwaway82384y

Can you be Not Even Wrong in AI Alignment?

Thank you. Thinking about it from a recruiting lens is helpful. They handled hundreds of submissions. There's a lot of noise regardless of the (dis)merit of any submission. Absence of substantive feedback should be treated as a weak signal, if a signal at all. It's not a "you're not cut out for this!", and not even a "you're not even wrong".

Replying toCan you be Not Even Wrong in AI Alignment?

throwaway82384y

Can you be Not Even Wrong in AI Alignment?

My intent when simplifying the problem was to demonstrate that you must explicitly relax certain assumptions to make any progress.

The families of proposals in https://www.lesswrong.com/posts/zjMKpSB2Xccn9qi5t/elk-prize-results mostly fit into this lens:

* Train a reporter that is useful to an auxiliary AI -> The Secret is still unmanaged, so you have arbitrary information escape. Even if the reporter could only reply yes/no to exactly one question, you still have non-finite entropy using only the time to respond. Resolving this requires having knowledge of the structure (relax the "inner workings of the Agent are unknown" requirement).

* Require the reporter to be continuous -> The Secret is still unmanaged. Any notion of continuity requires asserting a... (read more)

Can you be Not Even Wrong in AI Alignment?

throwaway8238

In January, I submitted a response to the Eliciting Latent Knowledge problem. I received a response with some short comments on what I labeled "Experience engaging with the problem", but no response to my approach, setup, or analysis. I subsequently followed up to no further response.

I suspect that the reason for the limited engagement has to do with the quality of my analysis. I think it's hard to read these things and people have limited time. If there were obvious deficiencies with my analysis in the context of how the field prefers to reason, I trust that those would be stated. However, if my analysis is Not Even Wrong, and the odds... (read 2369 more words →)