eli_sennesh comments on On Terminal Goals and Virtue Ethics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (205)
The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That's an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.
FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I'm sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.
It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That'd be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.
But that's not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we'd better start investing heavily in alternatives.
The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.
We don't have time to be dicking around doing basic research on whiteboards.
Ok, let me finally get around to answering this.
FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in "philosophy" or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.
In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.
Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem".
If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.
Luckily, we don't need to dick around.
Name three. FAI contains a number of counterintuitive difficulties and it's unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn't MIRI citing it, is far more probable from my perspective and prior.
I wouldn't say that there's someone out there directly solving FAI problems without having explicitly intended to do so. I would say there's a lot we can build on.
Keep in mind, I've seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.
One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.
That's only two, but I'm a comparative beginner at this stuff and Eld Science isn't very good at focusing on our problems, so I expect that there's actually more to discover and I'm just limited by lack of time and knowledge to do the literature searches.
By the way, I'm already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.
That's a large portion of the FAI problem right there.
EDIT: To clarify, by this I don't mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.
In-context, what was meant by "Oracle AI" is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.
You have to give it a set of directed goals and a utility function which favors achieving those goals, in order for the oracle AI to be of any use.
Why? How are you structuring your Oracle AI? This sounds like philosophical speculation, not algorithmic knowledge.
Ok, but a system like you've described isn't likely to think about what you want it to think about or produce output that's actually useful to you either.
Well yes. That's sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.
Well, ok, but if you agree with this then I don't see how you can claim that such a system would be particularly useful for solving FAI problems.
Well, I don't know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.
All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.
Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.
Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.
The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the "actions" are internal : they are changes to some data structure within the computer's memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.
"Internal" to the "agent" is very different from having an external output to a computational system outside the "agent". "Actions" that come from an extremely limited, non-Turing-complete "vocabulary" (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.
The same distinction applies for hypothesis class that the learner can learn: if it's not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.
This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff's algorithmic probability in order to build AGI.
I agree, of course, that none of the examples I gave ("primitive classifiers") are dangerous. Indeed, the "plans" they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).
But, that doesn't seem to relevant to the argument at all. You claimed
You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions. What would "a learning algorithm without decision-theory or utility function, something that has no desire to do anything" even look like? Does the concept even make sense? Eliezer writes here
/facepalm
There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as "in the class" or "not in the class" does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.
It does not narrow the future.
Now, what we've been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.
Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world's future.
1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of "correct" behavior according to a goal set that is by necessity included in the modifications as well.
2) Designing such a high level set of goals which ensure "friendliness".
Designing, not evolving?
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you'd end up with.
I don't see why the search algorithm would need to be self modifying.
I don't see why you would be searching for stability as opposed to friendliNess. Human testers can judge friendliness directly.
It's how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don't need friendliness anyway.
If I draw a box around the selection algorithm and find there is nothing self modifying inside ...where's the circularity?
(1) is naturalized induction, logical uncertainty, and getting around the Loebian Obstacle.
(2) is the cognitive science of evaluative judgements.
Great, you've got names for answers you are looking for. That doesn't mean the answers are any easier to find. You've attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn't make the search for a solution suddenly have a fixed timeline. It's uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won't necessarily make the project go any faster.
And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today's research frontier?
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we're pretty sure, but it's turning out UFAI might need it, too.
"Basic research is performed without thought of practical ends."
"Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met."
-National Science Foundation.
We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.
Instead MIRI is performing basic research. It's basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don't have a grasp on expected utility, how can we prioritize? There's a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don't get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It's spending millions of dollars to develop the pressurized-ink "Space Pen", when the humble pencil would have done just fine.
Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:
If we're only "pretty sure" it's needed for FAI, if we can't quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don't see MIRI doing any planning like this (or if they are, it's not public).
Are you on the "Open Problems in Friendly AI" Facebook group? Because much of the planning is on there.
Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.
The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.
If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly... then congratulations, you've got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).
Around here you should usually clarify whether your uncertainty is logical or indexical ;-).
Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.
Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?
I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including "ideal learner" models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.
You don't need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.