Jadagul comments on Magical Categories - Less Wrong

24 Post author: Eliezer_Yudkowsky 24 August 2008 07:51PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (89)

Sort By: Old

You are viewing a single comment's thread.

Comment author: Jadagul 25 August 2008 09:45:53PM 3 points [-]

Shane, the problem is that there are (for all practical purposes) infinitely many categories the Bayesian superintelligence could consider. They all "identify significant regularities in the environment" that "could potentially become useful." The problem is that we as the programmers don't know whether the category we're conditioning the superintelligence to care about is the category we want it to care about; this is especially true with messily-defined categories like "good" or "happy." What if we train it to do something that's just like good except it values animal welfare far more (or less) than our conception of good says it ought to? How long would it take for us to notice? What if the relevant circumstance didn't come up until after we'd released it?