steven0461 comments on Three Approaches to "Friendliness" - Less Wrong

14 Post author: Wei_Dai 17 July 2013 07:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (84)

You are viewing a single comment's thread.

Comment author: steven0461 18 July 2013 03:21:46AM 2 points [-]

Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand.

On the other hand, to the extent that our uncertainty about whether different BBMAI designs do philosophy correctly is independent, we can build multiple ones and see what outputs they agree on. (Or a design could do this internally, achieving the same effect.)

it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.

This seems to be an argument for building a hybrid of what you call metaphilosophical and normative AIs, where the normative part "only" needs to be reliable enough to prevent initial disaster, and the metaphilosophical part can take over afterward.