I'm not speaking for SIAI as this is more of a Visiting Fellows thing than an SIAI thing, but there are people working on Friendliness, and creating a Friendliness roadmap. We have lists of hundreds of problems, and lists of potentially relevant fields or concepts. Work is getting started on combining these lists into a real roadmap despite the uncertainty and difference of emphasis among researchers. Obviously we'd rather not release things for the public to see unless there were rather good reasons for doing so -- less output means less chance for screwing up public relations, which is important because SIAI Visiting Fellows output is easy to conflate with SIAI output in ways that might be misleading. I've started a blog where I'll put my own thoughts on something-like-Friendliness that I feel are not at all dangerous, and I might encourage other Friendliness researchers to do so as well. I'll link to my blog in a discussion post once I have a few more posts seeded. At some point you might see summaries of collaborative research somewhere. But until we have a better idea of who our audience is and what security precautions are sane, we'd like to work quietly. Again, I'm mostly speaking for myself, kind of speaking for a group of partially-SIAI-affiliated folk, and not at all for SIAI as an organization.
(There aren't that many people that can speak for SIAI, unfortunately. Like, two maybe. If you're an Oppenheimer (strong rationality and remarkable ability to get uber-nerds to work like a well-oiled machine), please consider applying for Visiting Fellowship. We're a bright group, but that has more to do with being bright than it has to do with being a group, and we'd like to change that.)
I'm not speaking for SIAI as this is more of a Visiting Fellows thing than an SIAI thing, but there are people working on Friendliness, and creating a Friendliness roadmap. We have lists of hundreds of problems, and lists of potentially relevant fields or concepts.
Meh. Now I'm a bit annoyed in that I did try to poke people into a direction where they'd do something like that when I was there as a Visiting Fellow, but mostly the reaction seemed to be "we should leave all thinking about Friendliness to Eliezer". But upon reflection, I realize th...
Eliezer has written a great deal about the concept of Friendly AI, for example in a document from 2001 titled Creating Friendly AI 1.0. The new SIAI overview states that:
The SIAI Research Program lists under its Research Areas:
Despite the enormous value that the construction of a Friendly AI would have; at present I'm not convinced that researching the Friendly AI concept is a cost-effective way of reducing existential risk. My main reason for doubt is that as far as I can tell, the problem of building a Friendly AI has not been taskified to a sufficiently fine degree for it to be possible to make systematic progress toward obtaining a solution. I'm open-minded on this point and quite willing to change my position subject to incoming evidence
The Need For Taskification
In The First Step is to Admit That You Have a Problem Alicorn wrote:
In Let them eat cake: Interpersonal Problems vs Tasks HughRistik wrote:
We know that the problems of making a peanut butter sandwich and of finding a romantic partner can (often) be taskified because many people have succeeded in solving them. It's less clear that a given problem that has never been solved can be taskified. Some problems are in principle unsolvable whether because they are mathematically undecidable or because physical law provides an obstruction to their solution. Other currently unsolved problems have solutions in the abstract but lack solutions that are accessible to humans. That taskification is in principle possible is not a sufficient condition for solving a problem but it is a necessary condition.
The Difficulty of Unsolved Problems
There's a long historical precedent of unsolved problems being solved. Humans have succeeded in building cars and skyscrapers, have succeeded in understanding the chemical composition of far away stars and of our own DNA, have determined the asymptotic distribution of the prime numbers and have given an algorithm to determine whether a given polynomial equation is solvable in radicals, have created nuclear bombs and have landed humans on the moon. All of these things seemed totally out of reach at one time.
Looking over the history of human achievement gives one a sense of optimism as to the feasibility of accomplishing a goal. And yet, there's a strong selection effect at play: successes are more interesting than failures and we correspondingly notice and remember successes more than failures. One need only page through a book like Richard Guy's Unsolved Problems in Number Theory to get a sense for how generic it is for a problem to be intractable. The ancient Greek inspired question of whether there are infinitely many perfect numbers remains out of reach for best mathematicians of today. The success of human research efforts has been as much a product of wisdom in choosing one's battles as it has been a product of ambition.
The Case of Friendly AI
My present understanding is that there are potential avenues for researching AGI. Richard Hollerith was kind enough to briefly describe Monte Carlo AIXI to me last month and I could sort of see how it might be in principle possible to program a computer to do Bayesian induction according to an approximation to a universal prior and implement the computer with a decision making apparatus based on its epistemological state at a given time. Some people have suggested to me that the amount of computer power and memory needed to implement human level Monte Carlo AIXI is prohibitively large but (in my current, very ill-informed state; by analogy with things that I've seen in computational complexity theory) I could imagine ingenious tricks yielding an approximation to Monte Carlo AIXI which uses much less computing power/memory and which is a sufficiently close to approximation to serve as a substitute for practical purposes. This would point to a potential taskification of the problem of building an AGI. I could also imagine that there are presently no practically feasible AGI research programs; I know too little about the state of strong artificial intelligence research to have anything but a very unstable opinion on this matter.
As Eliezer has said; the problem of creating a Friendly AI is inherently more difficult than that of creating an AGI and may be a problem much more difficult than that of creating an AGI. At present, the Friendliness aspect of a Friendly AI seems to me to strongly resist taskificaiton. In his poetic Mirrors and Paintings Eliezer gives the most detailed description of what a Friendly AI should do that I've seen, but the gap between concept and implementation here seems so staggeringly huge that it doesn't suggest to me any fruitful lines of Friendly AI research. As far as I can tell, Eliezer's idea of a Friendly AI is at this point not significantly more fleshed out (relative to the magnitude of the task) than Freeman Dyson's idea of a Dyson sphere. In order to build a Friendly AI, beyond conceiving of what a Friendly AI should be in the abstract one has to convert one's intuitive understanding of friendliness into computer code in a formal programming language.
I don't even see how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings. Solving this problem would seem to require as a prerequisite an understanding of the make up of the hypothetical AGI; something which people don't seem to have a clear grasp of at the moment. Even if one does have a model for a hypothetical AGI, writing code conducive to it recognizing humans as distinguished beings seems like an intractable task. And even with a relatively clear understanding of how one would implement a hypothetical AGI with the ability to recognize humans as distinguished beings; one is still left with the problem of making such a hypothetical AGI Friendly toward such beings.
In view of all this, working toward stable whole-brain emulation of a a trusted and highly intelligent person concerned about human well being seems to me like a more promising strategy of reducing existential risk at the present time than researching Friendly AI. Quoting a comment by Carl Shulman
There are various things that could go wrong with whole-brain emulation and it would be good to have a better option but Friendly AI research doesn't seem to me to be one in light of an apparent total absence of even the outlines of a viable Friendly AI research program.
But I feel like I may have missed something here. I'd welcome any clarifications of what people who are interested in Friendly AI research mean by Friendly AI research. In particular, is there a conjectural taskification of the problem?