Wei_Dai comments on The Level Above Mine - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (387)
As I asked in the linked comment, if you're the only philosopher in the team, how will others catch your mistakes? I would not trust that when you stop feeling confused, the problem has been correctly solved, or that your feelings of confusion are a reliable indicator of problems existing in the first place.
Having Paul involved certainly makes me feel better, and if you do succeed in recruiting more philosophy talent, then the issue may be moot. But I'm still concerned about your readiness to go it alone, and what that implies about your views, not only of how hard the problems are, but also how much one needs to worry about making philosophical mistakes.
Do you have some reliable way of recruiting? What's the policy alternative? You do what you gotta do, if ends up being just you, nonetheless, you do what you gotta do. Zero people won't make fewer mistakes than one person.
Quoting Carl Shulman from about a year ago:
I'm not sure if he had both math and philosophy in mind when he wrote that or just math, but in any case surely the same principle applies to the philosophy. If you don't reach a high confidence that the philosophy behind some FAI design is correct, then you shouldn't move forward with that design, and if there is only one philosopher on the team, you just can't reach high confidence in the philosophy.
This does not sound correct to me. Resolutions of simple confusions usually look pretty obvious in retrospect. Or do you mean something broader by "philosophy" than trying to figure out free will?
Did you read the rest of that thread where I talked about how in cryptography we often used formalizations of "security" that were discovered to be wrong years later, and that's despite having hundreds of people in the research community constantly trying to attack each other's ideas? I don't see how formalizing Friendliness could be not just easier and less error prone than formalizing security, but so much so that just one person is enough to solve all the problems with high confidence of correctness.
I mean questions like your R1 and R2, your "nonperson predicate", how to distinguish between moral progress and moral error / value drift, anthropic reasoning / "reality fluid". Generally, all the problems that need to be solved for building an FAI besides the math and the programming.
Yes, formalizing Friendliness is not the sort of thing you'd want one person doing. I agree. I don't consider that "philosophy", and it's the sort of thing other FAI team members would have to be able to check. We probably want at least one high-grade actual cryptographer.
Of the others, the nonperson predicate and the moral-progress parts are the main ones where it'd be unusually hard to solve and then tell that it had been solved correctly. I would expect both of those to be factorable-out, though - that all or most of the solution could just be published outright. (Albeit recent experience with trolls makes me think that no insight enabling conscious simulations should ever be published; people would write suffering conscious simulations and run them just to show off... how confident they were that the consciousness theory was wrong, or something. I have a newfound understanding of the utter... do-anything-ness of trolls. This potentially makes it hard to publicly check some parts of the reasoning behind a nonperson predicate.) Anthropic reasoning / "reality fluid" is the sort of thing I'd expect to be really obvious in retrospect once solved. R1 and R2 should be both obvious in retrospect, and publishable.
I have hopes that an upcoming post on the Lob Problem will offer a much more concrete picture of what some parts of the innards of FAI development and formalizing look like.
In principle, creating a formalization of Friendliness consists of two parts, conceptualizing Friendliness, and translating the concept into mathematical language. I'm using "philosophy" and "formalizing Friendliness" interchangeably to refer to both of these parts, whereas you seem to be using "philosophy" to refer to the former and "formalizing Friendliness" for the latter.
I guess this is because you think you can do the first part, then hand off the second part to others. But in reality, constraints about what kinds of concepts can be expressed in math and what proof techniques are available means that you have to work from both ends at the same time, trying to jointly optimize for philosophical soundness and mathematical feasibility, so there is no clear boundary between "philosophy" and "formalizing".
(I'm inferring this based on what happens in cryptography. The people creating new security concepts, the people writing down the mathematical formalizations, and the people doing the proofs are usually all the same, I think for the above reason.)
My experience to date has been a bit difference - the person asking the right question needs to be a high-grade philosopher, the people trying to answer it only need enough high-grade philosophy to understand-in-retrospect why that exact question is being asked. Answering can then potentially be done with either math talent or philosophy talent. The person asking the right question can be less good at doing clever advanced proofs but does need an extremely solid understanding of the math concepts they're using to state the kind-of-lemma they want. Basically, you need high math and high philosophy on both sides but there's room for S-class-math people who are A-class philosophers but not S-class-philosophers, being pointed in the right direction by S-class-philosophers who are A-class-math but not S-class-math. If you'll pardon the fuzzy terminology.
What happened (if you don't mind sharing)?
I get the impression that you have something different in mind as far as 'trolls' go than fools who create stereotypical conflicts on the internet. What kind of trolls are these?
The kind who persuade depressed people to commit suicide. The kind who post people's address on the internet. The kind that burn the Koran in public.
My psychological model says that all trolls are of that kind; some trolls just work harder than others. They all do damage in exchange for attention and the joy of seeing others upset, while exercising the limitless human ability to persuade themselves it's okay. If you make it possible for them to do damage on their home computers with no chance of being arrested and other people being visibly upset about it, a large number will opt to do so. The amount of suffering they create can be arbitrarily great, so long as they can talk themselves into believing it doesn't matter for <stupid reason> and other people are being visibly upset to give them the attention-reward.
4chan would have entire threads devoted to building worse hells. Yes. Seriously. They really would. And then they would instantiate those hells. So if you ever have an insight that constitutes incremental progress toward being able to run lots of small, stupid, suffering conscious agents on a home computer, shut up. And if somebody actually does it, don't be upset on the Internet.
In case anyone doubts this, as a long-time observer of the 4chan memeplex, I concur.
They really would at that. It seems you are concerned here about malicious actual trolls specifically. I suppose if the technology and knowledge was disseminated to that degree (before something actually foomed) then that would be the most important threat. My first thoughts had gone towards researchers with the capabilities and interest to research this kind of technology themselves who are merely callous and who are indifferent to the suffering of their simulated conscious 'guinea pigs' for the aforementioned <stupid reasons>.
At what level of formalization does this kind of 'incremental progress' start to count? I ask because your philosophical essays on reductionism, consciousness and zombies is something that seems to be incremental progress towards that end (but which I certainly wouldn't consider a mistake to publish or a net risk).
"The Sims" is often heralded as the best-selling videogame of all time, and it attracts players of all ages, races and genders from all across the world and from all walks of life.[citation needed]
Now imagine if the toons in the game could actually feel what was happening to them and react believably to their environment and situation and events?
I'm sure I don't need to quote the Rules of Acquisition; everyone here should know where this leads if word of such a technique gets out.
Most any incremental progress towards AGI, or even "just" EMs, would be dual use (if not centuple use) and could be (ab)used for helping achieve such enterta ... vile and nefarious purposes.
In fact, it is hard to imagine realistic technological progress that can solely be used to run lots of small, stupid, suffering conscious agents but not as a stepping stone towards more noble pursuits (... such as automated poker playing agents).
You know, I want to say you're completely and utterly wrong. I want to say that it's safe to at least release The Actual Explanation of Consciousness if and when you should solve such a thing.
But, sadly, I know you're absolutely right re the existence of trolls which would make a point of using that to create suffering. Not just to get a reaction, but some would do it specifically to have a world they could torment beings.
My model is not that all those trolls are identical (In that I've seen some that will explicitly unambiguously draw the line and recognize that egging on suicidal people is something that One Does Not Do, but I also know (seen) that all too many gleefully do do that.)
Wishing I could disagree with you, and, suspiciously, I find myself believeing that there would be enough vigilante justice to discourage hellmaking - after all, the trolls are doing it for the attention, and if that attention comes in the form of people posting your details and other people breaking into your house to steal your computer and/or murder you (for the greater good) then I doubt there will be many takers.
I just wish I could trust that doubt.*
*(Not expressing a wish for trust pills.)
EDIT: Animal experimentation and factory farming are still popular, but they have financial incentive ... and I vaguely recall that some trolls kicked a dog across a football field or something and were punished by Anonymous. That's where the analogy comes from, anyway, so I'd be interested if someone knows more.
I sometimes wonder if this does not already exist, except for the suffering and consciousness being merely simulated. That is, computer games in which the entire purpose is to inflict unspeakable acts on powerless NPCs, acts whose depiction in prose or pictures would be grossly illegal almost everywhere. But I've never heard of such a thing actually existing.
That stupid reason is, at core, nihilistic solipsism - and it's not as stupid as you'd think. I'm not saying it's right, but it does happen to be the one inescapable meme-trap of philosophy.
To quote your own fic, their reason is "why not?" - and their consciousness was not grown such that your impassioned defense of compassion and consideration have any intrinsic factor in their utility function.
At least for now, it'd take a pretty determined troll who could build an em for the sole purpose of being a terrible person. Not saying some humanity-first movement mightn't pull it off, but by that point you could hopefully have legal recognition (assuming there's no risk or accidental fooming and they pass the Turing test.)
I don't think we're talking ems, we're talking conscious algorithms which aren't necessarily humanlike or even particularly intelligent.
And as for the Turing Test, one oughtn't confuse consciousness with intelligence. A 6-year old human child couldn't pass off as an adult human, but we still believe the child to be conscious, and my own memories indicate that I indeed was at that age.
Well, I think consciousness, intelligence and personhood are sliding scales anyway, so I may be imagining the output of a Nonperson Predicate somewhat differently to LW norm. OTOH, I guess it's not a priori impossible that a simple human-level AI could fit on something avvailable to the public, and such an insight would be ... risky, yeah. Upvoted.
First of all, I also believe that consciousness is most probably a sliding scale.
Secondly, again you just used "human-level" without specifying human-level at what, at intelligence or at consciousness; as such I'm not sure whether I actually communicated adequately my point that we're not discussing intelligence here, but just consciousness.
Re non-person predicates, do you even have a non-sharp (but non-trivial) lower bound for it? How do you know that the Sims from the namesake game aren't persons? How do we know that Watson is not suffering indescribably when losing a round of Jeopardy? And that imagining someone (whose behavior you can predict with high accuracy) suffering is not as bad as "actually" making someone suffer? If this bound has been definitively established, I'd appreciate a link.
It's unclear where our intuitions on the subject come from or how they work, and they are heavily .... distorted ... by various beliefs and biases. OTOH, it seems unlikely that rocks are conscious and we just haven't extrapolated far enough to realize. It's also unclear whether personhood is binary or there's some kind of sliding scale. Nevertheless, it seems clear that a fly is not worth killing people over.
Even a person who has never introspected about their moral beliefs can still know that murder is wrong. They're more likely to make mistakes, but still.
How are these related? One is epistemology and one is ontology.
Can you give some more examples of this, besides "free will"? (I don't understand where your intuitions comes from that certain problems will turn out to have solutions that are obvious in retrospect, and that such feelings of obviousness are trustworthy. Maybe it would help me see your perspective if I got some more past examples.)
A tree falls in a forest with no-one to hear it. Does it make a sound?
I don't class that as a problem that is discussed by professional philosophers. It's more of a toy question that introduces the nature of phil. problems -- and the importance of asking "it depends on what you mean..." -- to laypeople.
I agree, but that's not what I was aiming for. It's an example of obviousness after the fact, not philosophers being wrong/indecisive.
It's not an example that lends much credence to the idea that all problems can be solved that way, even apart from the generalisation-from-one-example issue.
I'm not claiming it proves anything, and I'm not taking sides in this discussion. Someone asked for an example of something - something which varies from person to person depending on whether they've dissolved the relevant confusions - and I provided what I thought was the best example. It is not intended to prove anyone's point; arguments are not soldiers.
And the other example being generalised from isnt that good
Do you have an example in mind where a certain philosophical question claimed to have been solved or dissolved by Eliezer turned out to be not solved after all, or the solution was wrong?
Also, instances where Eliezer didn't seem to realize that a problem existed until someone pointed it out to him:
Order-dependence and butterfly effects - knew about this and had it in mind when I wrote CEV, I think it should be in the text.
Counterfactual Mugging - check, I don't think I was calling TDT a complete solution before then but the Counterfactual Mugging was a class of possibilities I hadn't considered. (It does seem related to Parfit's Hitchhiker which I knew was a problem.)
Solomonoff Induction - again, I think you may be overestimating how much weight I put on that in the first place. It's not a workable AI answer for at least two obvious reasons I'm pretty sure I knew about from almost-day-one, (a) it's uncomputable and (b) it can't handle utility functions over the environment. However, your particular contributions about halting-oracles-shouldn't-be-unimaginable did indeed influence me in toward my current notion of second-order logical natural induction over possible models of axioms in which you could be embedded. Albeit I stand by my old reply that Solomonoff Induction would encompass any computable predictions or learning you could do about halting oracles in the environment. (The problem of porting yourself onto any environmental object is something I already knew AIXI would fail at.)
Ok, I checked the CEV writeup and you did mention these briefly. But that makes me unsure why you claimed to have solved metaethics. What should you do if your FAI comes back and says that your EV shows no coherence due to order dependence and butterfly effects (assuming it's not some kind of implementation error)? If you're not sure the answer is "nothing", and you don't have another answer, doesn't that mean your solution (about the meaning of "should") is at least incomplete, and possibly wrong?
You said that TDT solves Parfit's Hitchhiker, so I don't know if you would have kept looking for more problems related to Parfit's Hitchhiker and eventually come upon Counterfactual Mugging.
Both of these can be solved without also solving halting-oracles-shouldn't-be-unimaginable. For (a), solve logical uncertainty. For (b), switch to UDT-with-world-programs.
Also, here is another problem that maybe you weren't already aware of.
Wouldn't that kind of make moral reasoning impossible?
Both.