SilentCal comments on Tackling the subagent problem: preliminary analysis - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (16)
I'm not sure where to post an idea for AI control research so I do it here. It somehow spun off from your post, the recent treacherous turn post and LW slack discussions.
That is the idea: Could we gameify AI safety research? The approach would be to create a setting where the players have to obey the AI safety rules and still achieve an objective in the in-game world. This can be a simulated virtual world in a computer game or a role playing world. To get sufficient motivation the in-game world would e.g. consist of a population of evil (to a typical human player) beings that interact and your most likely purpose is to make them do things you want (as in many other computer games too). Try to squeeze out as much resources as you can. While still obeying the rules. The game would progress from simple AI control rules like Asimovs robot laws to more advanced AI control rules. And find out whether people can hack these. If people can an AI probably can too.
That's essentially what these posts are to me, except instead of a video game it's pen-and-paper with Stuart Armstrong as DM :).
It might be worth the extra motivation of writing up a framing with evil AI designers applying the proposed controls. I'll consider doing this on future posts.
Awesome! Stuart Armstrong be our Dungeon Master! :-) I haven't seen you write up your responses to our DM though. I'd like to see them.
I've made a few shots, e.g. at http://lesswrong.com/r/discussion/lw/mfq/presidents_asteroids_natural_categories_and/cjkr and http://lesswrong.com/lw/m25/high_impact_from_low_impact/cah1. There's no explicit role-playing, but I was very much in the mindset of trying to break the protection scheme.
I haven't been keeping up with these posts as well lately.