How do you know that a person is really friendly? You use methods that have worked in the past and look for manipulative techniques that misleadingly friendly people use to make you think they are friendly. We know that someone is friendly via the same methodology that we determine what it means to be friendly, subjective benefits (emotional support etc.) and goal assistance (helping you move when they could simply refuse to do so) without malicious motives that ultimately disservice you.
In the case of FAI we want more surety, and we can presumably get this via simulation and proofs of correctness. I would assume that even after we had a proof of correctness for a meta-ethical system we would want to run it through as many virtual scenarios as possible, since the human brain is simply not capable of the chains of reasoning within the meta-ethics that the machine would be, so we would want to introduce it to scenarios that are as complex as possible in order to determine that it fits our intuition of friendliness.
It seems to me that the bulk of the work is in the arena of identifying the most friendly meta-ethical architecture. The Lokhorst paper lukeprog posted a while ago clarified a few things for me, though I have no access to the cutting edge work on FAI (save for what leaks out into the blog posts), and judging by what Will Newsome has said in the past (cannot find the post) they have compiled a relatively large list of possibly relevant sub-problems that I would be very interested to see (even if many of them are likely to be time drains).
Follow up to: Best career models for doing research?
First, I must apologize for the somewhat self-serving post, but as it is in the discussion section I hope that this can be forgiven. Also, I would not be surprised if there are at least a few college age people lurking around with very similar problems/issues, so I expect that this might prove very useful to at least a couple of people here. If this works out, I do hope to eventually put it into the form of a more general top-level post on career advice for those interested in a career in AGI.
Now, on to the issue:
It has come to my attention that research opportunities in AGI appear to both be somewhat limited, and somewhat unstructured compared to more well-developed fields that I have looked into. It seems to me that it would be useful to have a discussion here, given the unusual population density of AGI enthusiasts/professionals, about the possible pathways that one might take after the completion of an undergraduate degree. In my case, I have a strong background in mathematics, computer science and philosophy as well as a growing knowledge base in psychology. I've been studying Pearl's work, Timeless Decision Theory, cognitive science, evolutionary and cognitive psychology, Bishop's book on Pattern Recognition and Machine Learning, the link between category theory and cognitive science/AI (which appears to have some promise for building ontologies that can combine concepts and generalize), game theory, probability/statistics, computational complexity and I have been trying to get a few more programming languages under my belt.
My initial impulse was to go ahead and study for, and then take, all of the relevant GRE subject tests (Mathematics, Psychology and Computer science anyway) and apply to cognitive science and computer science programs with strong AGI groups. I've found that the latter option is more difficult that I had realized, which is somewhat disheartening, as my future planning model does not seem to work in such an underdeveloped field, and there is no easy to find established standard source for finding out which schools/programs to look at. I also realized that the former option does not necessarily conform to my research interestes as much as I would like it to, this being a fairly long term commitment.
Perhaps I lack the knowledge to successfully evaluate AGI programs; perhaps in the case of this particular area getting a PhD is not the best option; perhaps if I were more knowledgeable or wiser I might be better able to navigate where to go next, but I seem to be at a loss here. So; I come to you, fellow Less Wrongians, in search of guidance. Can any of you help to point me (and hopefully plenty of others) in the right (or at least less wrong) direction?