You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

taygetea comments on In what language should we define the utility function of a friendly AI? - Less Wrong Discussion

3 Post author: Val 05 April 2015 10:14PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread.

Comment author: taygetea 06 April 2015 01:07:10AM 8 points [-]

Determining the language to use is a classic case of premature optimization. No matter what the case, it will have to be provably free of ambiguities, which leaves us programming languages. In addition, in terms of the math of FAI, we're still at the "is this Turing complete" sort of stage in development. So it doesn't really matter yet. I guess one consideration is that the algorithm design is going to take way more time and effort than the programming, and the program has essentially no room for bugs (Corrigibility is an effort to make it easier to test an AI without it resisting). So in that sense, it could be argued that the lower level the language, the better.

Directly programming human values into an AI has always been the worst option, partially for your reason. In addition, the religious concept you gave can be trivially broken by two different beings having different or conflicting utility functions, and so acting as if they were the same is a bad outcome. A better option is to construct a scheme so that the smarter the AI gets, the better it approximates human values, by using its own intelligence to determine them, as in coherent extrapolated volition.