Eliezer_Yudkowsky comments on The Level Above Mine - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (387)
I certainly intend to try that recruiting thing (Paul Christiano ain't half bad) but recruiting philosophy seems much less straightforward than recruiting mathematical talent. If I have to resolve it all myself, I wouldn't flinch from trying. It seems like that part should be less difficult in an absolute sense than the rest of the labor, though that might just be comparative advantage talking. The resolutions to philosophical confusions usually seem relatively straightforward once you have them, in my experience so far.
As I asked in the linked comment, if you're the only philosopher in the team, how will others catch your mistakes? I would not trust that when you stop feeling confused, the problem has been correctly solved, or that your feelings of confusion are a reliable indicator of problems existing in the first place.
Having Paul involved certainly makes me feel better, and if you do succeed in recruiting more philosophy talent, then the issue may be moot. But I'm still concerned about your readiness to go it alone, and what that implies about your views, not only of how hard the problems are, but also how much one needs to worry about making philosophical mistakes.
Do you have some reliable way of recruiting? What's the policy alternative? You do what you gotta do, if ends up being just you, nonetheless, you do what you gotta do. Zero people won't make fewer mistakes than one person.
Quoting Carl Shulman from about a year ago:
I'm not sure if he had both math and philosophy in mind when he wrote that or just math, but in any case surely the same principle applies to the philosophy. If you don't reach a high confidence that the philosophy behind some FAI design is correct, then you shouldn't move forward with that design, and if there is only one philosopher on the team, you just can't reach high confidence in the philosophy.
This does not sound correct to me. Resolutions of simple confusions usually look pretty obvious in retrospect. Or do you mean something broader by "philosophy" than trying to figure out free will?
Did you read the rest of that thread where I talked about how in cryptography we often used formalizations of "security" that were discovered to be wrong years later, and that's despite having hundreds of people in the research community constantly trying to attack each other's ideas? I don't see how formalizing Friendliness could be not just easier and less error prone than formalizing security, but so much so that just one person is enough to solve all the problems with high confidence of correctness.
I mean questions like your R1 and R2, your "nonperson predicate", how to distinguish between moral progress and moral error / value drift, anthropic reasoning / "reality fluid". Generally, all the problems that need to be solved for building an FAI besides the math and the programming.
Yes, formalizing Friendliness is not the sort of thing you'd want one person doing. I agree. I don't consider that "philosophy", and it's the sort of thing other FAI team members would have to be able to check. We probably want at least one high-grade actual cryptographer.
Of the others, the nonperson predicate and the moral-progress parts are the main ones where it'd be unusually hard to solve and then tell that it had been solved correctly. I would expect both of those to be factorable-out, though - that all or most of the solution could just be published outright. (Albeit recent experience with trolls makes me think that no insight enabling conscious simulations should ever be published; people would write suffering conscious simulations and run them just to show off... how confident they were that the consciousness theory was wrong, or something. I have a newfound understanding of the utter... do-anything-ness of trolls. This potentially makes it hard to publicly check some parts of the reasoning behind a nonperson predicate.) Anthropic reasoning / "reality fluid" is the sort of thing I'd expect to be really obvious in retrospect once solved. R1 and R2 should be both obvious in retrospect, and publishable.
I have hopes that an upcoming post on the Lob Problem will offer a much more concrete picture of what some parts of the innards of FAI development and formalizing look like.
In principle, creating a formalization of Friendliness consists of two parts, conceptualizing Friendliness, and translating the concept into mathematical language. I'm using "philosophy" and "formalizing Friendliness" interchangeably to refer to both of these parts, whereas you seem to be using "philosophy" to refer to the former and "formalizing Friendliness" for the latter.
I guess this is because you think you can do the first part, then hand off the second part to others. But in reality, constraints about what kinds of concepts can be expressed in math and what proof techniques are available means that you have to work from both ends at the same time, trying to jointly optimize for philosophical soundness and mathematical feasibility, so there is no clear boundary between "philosophy" and "formalizing".
(I'm inferring this based on what happens in cryptography. The people creating new security concepts, the people writing down the mathematical formalizations, and the people doing the proofs are usually all the same, I think for the above reason.)
My experience to date has been a bit difference - the person asking the right question needs to be a high-grade philosopher, the people trying to answer it only need enough high-grade philosophy to understand-in-retrospect why that exact question is being asked. Answering can then potentially be done with either math talent or philosophy talent. The person asking the right question can be less good at doing clever advanced proofs but does need an extremely solid understanding of the math concepts they're using to state the kind-of-lemma they want. Basically, you need high math and high philosophy on both sides but there's room for S-class-math people who are A-class philosophers but not S-class-philosophers, being pointed in the right direction by S-class-philosophers who are A-class-math but not S-class-math. If you'll pardon the fuzzy terminology.
What happened (if you don't mind sharing)?
I get the impression that you have something different in mind as far as 'trolls' go than fools who create stereotypical conflicts on the internet. What kind of trolls are these?
The kind who persuade depressed people to commit suicide. The kind who post people's address on the internet. The kind that burn the Koran in public.
My psychological model says that all trolls are of that kind; some trolls just work harder than others. They all do damage in exchange for attention and the joy of seeing others upset, while exercising the limitless human ability to persuade themselves it's okay. If you make it possible for them to do damage on their home computers with no chance of being arrested and other people being visibly upset about it, a large number will opt to do so. The amount of suffering they create can be arbitrarily great, so long as they can talk themselves into believing it doesn't matter for <stupid reason> and other people are being visibly upset to give them the attention-reward.
4chan would have entire threads devoted to building worse hells. Yes. Seriously. They really would. And then they would instantiate those hells. So if you ever have an insight that constitutes incremental progress toward being able to run lots of small, stupid, suffering conscious agents on a home computer, shut up. And if somebody actually does it, don't be upset on the Internet.
At least for now, it'd take a pretty determined troll who could build an em for the sole purpose of being a terrible person. Not saying some humanity-first movement mightn't pull it off, but by that point you could hopefully have legal recognition (assuming there's no risk or accidental fooming and they pass the Turing test.)
I don't think we're talking ems, we're talking conscious algorithms which aren't necessarily humanlike or even particularly intelligent.
And as for the Turing Test, one oughtn't confuse consciousness with intelligence. A 6-year old human child couldn't pass off as an adult human, but we still believe the child to be conscious, and my own memories indicate that I indeed was at that age.
Well, I think consciousness, intelligence and personhood are sliding scales anyway, so I may be imagining the output of a Nonperson Predicate somewhat differently to LW norm. OTOH, I guess it's not a priori impossible that a simple human-level AI could fit on something avvailable to the public, and such an insight would be ... risky, yeah. Upvoted.
Re non-person predicates, do you even have a non-sharp (but non-trivial) lower bound for it? How do you know that the Sims from the namesake game aren't persons? How do we know that Watson is not suffering indescribably when losing a round of Jeopardy? And that imagining someone (whose behavior you can predict with high accuracy) suffering is not as bad as "actually" making someone suffer? If this bound has been definitively established, I'd appreciate a link.
It's unclear where our intuitions on the subject come from or how they work, and they are heavily .... distorted ... by various beliefs and biases. OTOH, it seems unlikely that rocks are conscious and we just haven't extrapolated far enough to realize. It's also unclear whether personhood is binary or there's some kind of sliding scale. Nevertheless, it seems clear that a fly is not worth killing people over.
Even a person who has never introspected about their moral beliefs can still know that murder is wrong. They're more likely to make mistakes, but still.
How are these related? One is epistemology and one is ontology.
Can you give some more examples of this, besides "free will"? (I don't understand where your intuitions comes from that certain problems will turn out to have solutions that are obvious in retrospect, and that such feelings of obviousness are trustworthy. Maybe it would help me see your perspective if I got some more past examples.)
A tree falls in a forest with no-one to hear it. Does it make a sound?
I don't class that as a problem that is discussed by professional philosophers. It's more of a toy question that introduces the nature of phil. problems -- and the importance of asking "it depends on what you mean..." -- to laypeople.
I agree, but that's not what I was aiming for. It's an example of obviousness after the fact, not philosophers being wrong/indecisive.
It's not an example that lends much credence to the idea that all problems can be solved that way, even apart from the generalisation-from-one-example issue.
And the other example being generalised from isnt that good
Do you have an example in mind where a certain philosophical question claimed to have been solved or dissolved by Eliezer turned out to be not solved after all, or the solution was wrong?
Also, instances where Eliezer didn't seem to realize that a problem existed until someone pointed it out to him:
Order-dependence and butterfly effects - knew about this and had it in mind when I wrote CEV, I think it should be in the text.
Counterfactual Mugging - check, I don't think I was calling TDT a complete solution before then but the Counterfactual Mugging was a class of possibilities I hadn't considered. (It does seem related to Parfit's Hitchhiker which I knew was a problem.)
Solomonoff Induction - again, I think you may be overestimating how much weight I put on that in the first place. It's not a workable AI answer for at least two obvious reasons I'm pretty sure I knew about from almost-day-one, (a) it's uncomputable and (b) it can't handle utility functions over the environment. However, your particular contributions about halting-oracles-shouldn't-be-unimaginable did indeed influence me in toward my current notion of second-order logical natural induction over possible models of axioms in which you could be embedded. Albeit I stand by my old reply that Solomonoff Induction would encompass any computable predictions or learning you could do about halting oracles in the environment. (The problem of porting yourself onto any environmental object is something I already knew AIXI would fail at.)
Ok, I checked the CEV writeup and you did mention these briefly. But that makes me unsure why you claimed to have solved metaethics. What should you do if your FAI comes back and says that your EV shows no coherence due to order dependence and butterfly effects (assuming it's not some kind of implementation error)? If you're not sure the answer is "nothing", and you don't have another answer, doesn't that mean your solution (about the meaning of "should") is at least incomplete, and possibly wrong?
You said that TDT solves Parfit's Hitchhiker, so I don't know if you would have kept looking for more problems related to Parfit's Hitchhiker and eventually come upon Counterfactual Mugging.
Both of these can be solved without also solving halting-oracles-shouldn't-be-unimaginable. For (a), solve logical uncertainty. For (b), switch to UDT-with-world-programs.
Also, here is another problem that maybe you weren't already aware of.
Wouldn't that kind of make moral reasoning impossible?
Both.
You never did any engineering-level mathematical modeling of real system, did you?
The main difficulty is not proving the theorems, it is finding the right axioms to describe the relevant aspects of the system and the properties of interest. And that's where errors often occur.
Now, typical engineering tasks pale in comparison to the task you are trying to undertake: creting a fully specified mathematical model of ethics.
Most likely it's just the Dunning–Kruger effect
Just like when you "resolved" the interpretation of quantum mechanics? Well, good thing that you are never going to make anything close to an AGI and that AGI risk is probably overrated, otherwise it wouldn't end well...