loup-vaillant comments on Muehlhauser-Wang Dialogue - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (284)
First, thank you for publishing this illuminating exchange.
I must say that Pei Wang sounds way more convincing to an uninitiated, but curious and mildly intelligent lay person (that would be me). Does not mean he is right, but he sure does make sense.
When Luke goes on to make a point, I often get lost in a jargon ("manifest convergent instrumental goals") or have to look up a paper that Pei (or other AGI researchers) does not hold in high regard. When Pei Wang makes an argument, it is intuitively clear and does not require going through a complex chain of reasoning outlined in the works of one Eliezer Yudkowsky and not vetted by the AI community at large. This is, of course, not a guarantee of its validity, but it sure is easier to follow.
Some of the statements are quite damning, actually: "The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it." If one were to replace AI with physics, I would tend to dismiss EY as a crank just based on this statement, assuming it is accurate.
What makes me trust Pei Wang more than Luke is the common-sense statements like "to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety." and "unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe". Similarly, the SIAI position of “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” rubs me the wrong way. While it does not necessarily mean it is wrong, the inability to convince outside experts that it is right is not a good sign.
This might be my confirmation bias, but I would be hard pressed to disagree with "To develop a non-trivial education theory of AGI requires a good understanding about how the system works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good education theory can be “proved” in advance, pure theoretically. Rather, we’ll learn most of it by interacting with baby AGIs, just like how many of us learn how to educate children."
As a side point, I cannot help but wonder if the outcome of this discussion would have been different were it EY and not LM involved in it.
I felt the main reason was anthropomorphism:
Note that I don't want to accuse Pei Wang of anthropomorphism. My point is, his choice of words appeal to our anthropomorphism, which is highly intuitive. Another example of an highly intuitive, but not very helpful sentence:
Intuitive, because applied to humans, we can easily see that we can change plans according to experience. Like apply for a PhD, then dropping out when finding out you don't enjoy it after all. You can abandon the goal of making research, and have a new goal of, say, practicing and teaching surfing.
Not very helpful, because the split between initial goals and later goals does not help you build an AI that will actually do something "good". Here, the split between instrumental goals (means to an end), and terminal goals (the AI's "ulterior motives") is more important. To give a human example, in the case above, doing research or surfing are both means to the same end (like being happy, or something more careful but so complicated nobody knows how to clearly specify it yet). For an AI, as Pei Wang implies, the initial goals aren't necessarily supposed to constraint all future goals. But its terminal goals are indeed supposed to constrain the instrumental goals it will form later. (More precisely, the instrumental goals are supposed to follow from the terminal goals and the AI's current model of the world.)
Edit: it just occurred to me, that terminal goals have somehow to be encoded into the AI before we set it loose. They are necessarily initial goals (if they aren't, the AI is by definition unfriendly —not a problem if its goals miraculously converge towards something "good", though). Thinking about it, it looks like Pei Wang doesn't believe it is possible to make an AI with stable terminal goals.
Excellent Freudian slip there.
Corrected, thanks.