eli_sennesh comments on A toy model of the treacherous turn - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (13)
Can we just build a Link to the Past minigame that actually models this with real, running code, and then post a bunch of YouTube videos of Link trying naively to kill Sahasrala?
Besides the obvious benefit of being awesome, I think there could be a more serious benefit to this. One extreme failure mode when imagining the behavior of an AI is not merely to fail to imagine it as being superintelligent but to imagine it as being less intelligent than yourself, as not doing things you could think of (a la That Alien Message). A game that consisted of you, the player, needing to come up with increasingly complicated ways to trick these 'shopkeeper' agents could illustrate this pretty neatly.
PS: Were you offering to do or partially do such a project?
I would totally contribute to such a project, although we should coordinate what sort of language and reasoning techniques we're using first. Reinforcement learning is actually a reasonably involved thing to code, after all.
Would you mind if I put you in contact with Jaan Tallinn on this issue?
PS: PM me your email if so
I could only contribute, not write the whole thing, though, since I've basically got stuff on my plate at all times: Latex fix for conference paper, actually arranging travel to conference, gym, social life, structure-learning project, studying, etc.
Social lives are for the weak! ;-)
That's a sick statement.
Fun! Do it if you can, but the model needs to be further clarified first, I think.