William_S comments on Superintelligence 24: Morality models and "do what I mean" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (47)
Suppose we have a bunch of short natural language descriptions of what we would want the AI to value. Can we simply give the AI a list of these, and tell it to maximize all of these values given some kind of equal weighting? It seems to me that, much more than in other areas of superintelligence design, the things we come up with are likely to point to what we want, and so aggregating a bunch of these descriptions is more likely to lead to what we want than picking any description individually. Does it seem like this would work? Is there any way this can go wrong?