Wei_Dai comments on The Level Above Mine - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (387)
Did you read the rest of that thread where I talked about how in cryptography we often used formalizations of "security" that were discovered to be wrong years later, and that's despite having hundreds of people in the research community constantly trying to attack each other's ideas? I don't see how formalizing Friendliness could be not just easier and less error prone than formalizing security, but so much so that just one person is enough to solve all the problems with high confidence of correctness.
I mean questions like your R1 and R2, your "nonperson predicate", how to distinguish between moral progress and moral error / value drift, anthropic reasoning / "reality fluid". Generally, all the problems that need to be solved for building an FAI besides the math and the programming.
Yes, formalizing Friendliness is not the sort of thing you'd want one person doing. I agree. I don't consider that "philosophy", and it's the sort of thing other FAI team members would have to be able to check. We probably want at least one high-grade actual cryptographer.
Of the others, the nonperson predicate and the moral-progress parts are the main ones where it'd be unusually hard to solve and then tell that it had been solved correctly. I would expect both of those to be factorable-out, though - that all or most of the solution could just be published outright. (Albeit recent experience with trolls makes me think that no insight enabling conscious simulations should ever be published; people would write suffering conscious simulations and run them just to show off... how confident they were that the consciousness theory was wrong, or something. I have a newfound understanding of the utter... do-anything-ness of trolls. This potentially makes it hard to publicly check some parts of the reasoning behind a nonperson predicate.) Anthropic reasoning / "reality fluid" is the sort of thing I'd expect to be really obvious in retrospect once solved. R1 and R2 should be both obvious in retrospect, and publishable.
I have hopes that an upcoming post on the Lob Problem will offer a much more concrete picture of what some parts of the innards of FAI development and formalizing look like.
In principle, creating a formalization of Friendliness consists of two parts, conceptualizing Friendliness, and translating the concept into mathematical language. I'm using "philosophy" and "formalizing Friendliness" interchangeably to refer to both of these parts, whereas you seem to be using "philosophy" to refer to the former and "formalizing Friendliness" for the latter.
I guess this is because you think you can do the first part, then hand off the second part to others. But in reality, constraints about what kinds of concepts can be expressed in math and what proof techniques are available means that you have to work from both ends at the same time, trying to jointly optimize for philosophical soundness and mathematical feasibility, so there is no clear boundary between "philosophy" and "formalizing".
(I'm inferring this based on what happens in cryptography. The people creating new security concepts, the people writing down the mathematical formalizations, and the people doing the proofs are usually all the same, I think for the above reason.)
My experience to date has been a bit difference - the person asking the right question needs to be a high-grade philosopher, the people trying to answer it only need enough high-grade philosophy to understand-in-retrospect why that exact question is being asked. Answering can then potentially be done with either math talent or philosophy talent. The person asking the right question can be less good at doing clever advanced proofs but does need an extremely solid understanding of the math concepts they're using to state the kind-of-lemma they want. Basically, you need high math and high philosophy on both sides but there's room for S-class-math people who are A-class philosophers but not S-class-philosophers, being pointed in the right direction by S-class-philosophers who are A-class-math but not S-class-math. If you'll pardon the fuzzy terminology.
What happened (if you don't mind sharing)?
I get the impression that you have something different in mind as far as 'trolls' go than fools who create stereotypical conflicts on the internet. What kind of trolls are these?
The kind who persuade depressed people to commit suicide. The kind who post people's address on the internet. The kind that burn the Koran in public.
My psychological model says that all trolls are of that kind; some trolls just work harder than others. They all do damage in exchange for attention and the joy of seeing others upset, while exercising the limitless human ability to persuade themselves it's okay. If you make it possible for them to do damage on their home computers with no chance of being arrested and other people being visibly upset about it, a large number will opt to do so. The amount of suffering they create can be arbitrarily great, so long as they can talk themselves into believing it doesn't matter for <stupid reason> and other people are being visibly upset to give them the attention-reward.
4chan would have entire threads devoted to building worse hells. Yes. Seriously. They really would. And then they would instantiate those hells. So if you ever have an insight that constitutes incremental progress toward being able to run lots of small, stupid, suffering conscious agents on a home computer, shut up. And if somebody actually does it, don't be upset on the Internet.
In case anyone doubts this, as a long-time observer of the 4chan memeplex, I concur.
Related:
How often does 4chan torture animals? That's pretty easy to pull off. Are they doing it all the time and I haven't noticed, or is there some additional force preventing it (e.g. Anonymous would hunt them down and post their details online, or 4chan all just like animals.)
I remember that once, a Facebook page was hacked into (I guess) and started posting pictures and stories about tortured animals. Everybody went WTF and the page was shut down a few days later.
I've never been there, but plenty of people on the internet do. Facebook pages against vivisection etc. seem to get way more likes than those in favour of it, the meme that humanity had better become extinct because wildlife would be better off is quite widespread, and some people even rejoice when a hunter dies (though this is a minority stance).
Not often. Hurting animals is generally considered Not OK on 4chan, to the extent that anything is Not OK on 4chan.
There are a few pictures and stories that get passed around (some kids kicking a cat against a wall like a football, shoveldog, etc), but many fewer than the human gore pictures. 4channers mostly aggregate this stuff from all over and post it to be edgy and drive people who aren't edgy enough away from 4chan.
And yeah, to the extent that people do torture animals in current events (as opposed to past stories), vast hordes of moralfags and raiders from 4chan tend to hunt them down and ruin their lives.
I wonder if this might happen to people running hells too? I lack the domain expertise to judge if this is ludicrous or impossible to predict or what.
Really depends on whether the beings in the hell are cute and empathetic.
Humans don't like to hurt things that are cute and empathetic, and don't like them getting hurt. Otherwise we don't care.
They really would at that. It seems you are concerned here about malicious actual trolls specifically. I suppose if the technology and knowledge was disseminated to that degree (before something actually foomed) then that would be the most important threat. My first thoughts had gone towards researchers with the capabilities and interest to research this kind of technology themselves who are merely callous and who are indifferent to the suffering of their simulated conscious 'guinea pigs' for the aforementioned <stupid reasons>.
At what level of formalization does this kind of 'incremental progress' start to count? I ask because your philosophical essays on reductionism, consciousness and zombies is something that seems to be incremental progress towards that end (but which I certainly wouldn't consider a mistake to publish or a net risk).
Related.
(I'm not a huge fan of SCP in general, but I like a few stories with the "infohazard" tag, and I'm amused by how LW-ish those can get.)
Eliezer could argue that the incremental progress towards stopping the risk outweighs the danger, same as with the general FAI/uFAI secrecy debate.
I can't find the quote on that page. Is it from somewhere else (or an earlier version) or am I missing something?
White text. (Apparently there's a few more hidden features in the entry, but I only found this one.)
Ah, thanks.
I, um, still can't find it. This white text is on the page you linked to, yes? About the videos that are probably soultraps?
EDIT: Nevermind, got it.
"The Sims" is often heralded as the best-selling videogame of all time, and it attracts players of all ages, races and genders from all across the world and from all walks of life.[citation needed]
Now imagine if the toons in the game could actually feel what was happening to them and react believably to their environment and situation and events?
I'm sure I don't need to quote the Rules of Acquisition; everyone here should know where this leads if word of such a technique gets out.
There have always been those who would pull the wings off flies, stomp on mice, or torture kittens. Setting roosters, fish, or dogs to fight each other to death remains a well-known spectacle in many rural parts of the world. In Shakespeare's day, Londoners enjoyed watching dogs slowly kill bulls or bears, or be killed by them; in France they set bushels of cats on fire to watch them burn. Public executions and tortures, gladiatorial combat among slaves, and other nonconsensual "blood sports" have been common in human history.
What's the difference?
Scale.
The average individual could not hold private gladiatorial contests, on a whim, at negligible cost. Killing a few innocents by torture, as public spectacle, is significantly less than repeatedly torturing large groups, as private entertainment, for as little as the average individual would have paid for their ticket to the cockfight.
Also, some people reckon the suffering of animals doesn't matter. They're wrong, but they wouldn't care about most of your examples (or at least they would claim it's because they increase the risk you'll do the same to humans, which is a whole different kettle of fish.)
Not to mention the sizeable fraction of car drives who will swerve in order to hit turtles. What the hell is wrong with my species?
Link is broken.
... seriously? Poor turtles >:-(
It was mentioned recently on Yvain's blog and a few months ago on LW (can't find it right now).
Previous discussion of this on LW
How do you know that they don't?
Why do you always have to ask subtly hard questions? I can just see see your smug face, smiling that smug smile of yours with that slight tilt of the head as we squirm trying to rationalize something up quick.
Here's my crack at it: They don't have what we currently think is the requisite code structure to "feel" in a meaningful way, but of course we are too confused to articulate the reasons much further.
Thank you, I'm flattered. I have asked Eliezer the same question, not sure if anyone will reply. I hoped that there is a simple answer to this, related to the complexity of information processing in the substrate, like the brain or a computer, but I cannot seem to find any discussions online. Probably using wrong keywords.
Information integration theory seems relevant.
Not directly related. I think it has a lot to do with being roughly isomorphic to how a human thinks, which requires large complexity, but a particular complexity.
When I evaluate such questions IRL, like in the case of helping out an injured bird, or feeding my cat, I notice that my decisions seem to depend on whether I feel empathy for the thing. That is, do my algorithms recognize it as a being, or as a thing.
But then empathy can be hacked or faulty (see for example pictures of african children, cats and small animals, ugly disfigured people, far away people, etc), so I think of a sort of "abstract empathy" that is doing the job of recognizing morally valuable beings without all the bugs of my particular implementation of it.
In other words, I think it's a matter of moral philosophy, not metaphysics.
Well, I can't speak for the latest games, but I've personally read (some of) the core AI code for the toons in the first game of the series, and there was nothing in there that made a model of said code or attempted any form of what I'd even call "reasoning" throughout. No consciousness or meta-awareness.
By being simulated by the code simulating the game in which they "are", they could to some extent be said to be "aware" of certain values like their hunger level, if you really want to stretch wide the concept of "awareness". However, there seems to be no consciousness anywhere to be 'aware' (in the anthropomorphized sense) of this.
Since my priors are such that I consider it extremely unlikely that consciousness can exist without self-modeling and even more unlikely that consciousness is nonphysical, I conclude that there is a very low chance that they can be considered a "mind" with a consciousness that is aware of the pain and stimuli they receive.
The overall system is also extremely simple, in relative terms, considering the kind of AI code that's normally discussed around these parts.
Why would them feeling it help them "react believably to their environment and situation and events"? If they're dumb enough to "run lots of small, stupid, suffering conscious agents on a home computer", I mean.
Of course, give Moore time and this objection will stop applying.
We're already pretty close to making game characters have believable reactions, but only through clever scripting and a human deciding that situation X warrants reaction Y, and then applying mathematically-complicated patterns of light and prerecorded sounds onto the output devices of a computer.
If we can successfully implement a system that has that-function-we-refer-to-when-we-say-"consciousness" and that-f-w-r-t-w-w-s-"really feel pain", then it seems an easy additional step to implement the kind of events triggering the latter function and the kind of outputs from the former function that would be believable and convincing to human players. I may be having faulty algorithmic intuitions here though.
Well, if they were as smart as humans, sure. Even as smart as dogs, maybe. But if they're running lots of 'em on a home PC, then I must have been mistaken about how smart you have to be for consciousness.
I used to torture my own characters to death a lot, back in the day.
EDIT: Not to mention what I did when playing Roller Coaster Tycoon.
The favourite Sim household of my housemate was based on "Buffy the Vampire Slayer". Complete with a graveyard constructed in the backyard. Through the judicial application of "remove ladder" from the swimming pool.
And this is all without any particular malice!
Most any incremental progress towards AGI, or even "just" EMs, would be dual use (if not centuple use) and could be (ab)used for helping achieve such enterta ... vile and nefarious purposes.
In fact, it is hard to imagine realistic technological progress that can solely be used to run lots of small, stupid, suffering conscious agents but not as a stepping stone towards more noble pursuits (... such as automated poker playing agents).
You know, I want to say you're completely and utterly wrong. I want to say that it's safe to at least release The Actual Explanation of Consciousness if and when you should solve such a thing.
But, sadly, I know you're absolutely right re the existence of trolls which would make a point of using that to create suffering. Not just to get a reaction, but some would do it specifically to have a world they could torment beings.
My model is not that all those trolls are identical (In that I've seen some that will explicitly unambiguously draw the line and recognize that egging on suicidal people is something that One Does Not Do, but I also know (seen) that all too many gleefully do do that.)
It's worth noting that private torture chambers seem different to trolling, but a troll can still set up a torture chamber - they just care about people's reaction to it, not the torture itself.
Wishing I could disagree with you, and, suspiciously, I find myself believeing that there would be enough vigilante justice to discourage hellmaking - after all, the trolls are doing it for the attention, and if that attention comes in the form of people posting your details and other people breaking into your house to steal your computer and/or murder you (for the greater good) then I doubt there will be many takers.
I just wish I could trust that doubt.*
*(Not expressing a wish for trust pills.)
EDIT: Animal experimentation and factory farming are still popular, but they have financial incentive ... and I vaguely recall that some trolls kicked a dog across a football field or something and were punished by Anonymous. That's where the analogy comes from, anyway, so I'd be interested if someone knows more.
I sometimes wonder if this does not already exist, except for the suffering and consciousness being merely simulated. That is, computer games in which the entire purpose is to inflict unspeakable acts on powerless NPCs, acts whose depiction in prose or pictures would be grossly illegal almost everywhere. But I've never heard of such a thing actually existing.
What sort of acts are we talking here? Because I'm genuinely having trouble thinking of any "acts whose depiction in prose or pictures would be grossly illegal almost everywhere" except maybe pedophilia. Censorship and all that.
And there are some fairly screwed-up games out there, although probably not as bad as they could be if designed with that in mind (as opposed to, y'know, the enjoyment of the player.)
Well would you, if it was grossly illegal to describe the contents?
I didn't want to be explicit, but you thought of the obvious example.
For the sort of 4chan people Eliezer mentioned, these would be completely congruent.
It is well known that illegal pornography exists on non-interactive media. For interactive media, all I've ever heard of is 18-rated sex scenes.
I can't think of any other examples, though.
... maaaybe. Again, I'm not sure exactly what you have in mind.
Good point. Indeed, it's well known that child porn exists on some level.
In fact ... I do vaguely recall something about a Japanese game about rape causing a moral panic of some kind, so ...
EDIT: In fact, it featured kids too! RapeLay. It's ... fairly horrible, although I think someone with the goal of pure horribleness would do .. better? Worse? Whatever.
This link seems not to answer the comment ,:-. is this mistaken or did EY use that fallacy?
That stupid reason is, at core, nihilistic solipsism - and it's not as stupid as you'd think. I'm not saying it's right, but it does happen to be the one inescapable meme-trap of philosophy.
To quote your own fic, their reason is "why not?" - and their consciousness was not grown such that your impassioned defense of compassion and consideration have any intrinsic factor in their utility function.
At least for now, it'd take a pretty determined troll who could build an em for the sole purpose of being a terrible person. Not saying some humanity-first movement mightn't pull it off, but by that point you could hopefully have legal recognition (assuming there's no risk or accidental fooming and they pass the Turing test.)
I don't think we're talking ems, we're talking conscious algorithms which aren't necessarily humanlike or even particularly intelligent.
And as for the Turing Test, one oughtn't confuse consciousness with intelligence. A 6-year old human child couldn't pass off as an adult human, but we still believe the child to be conscious, and my own memories indicate that I indeed was at that age.
Well, I think consciousness, intelligence and personhood are sliding scales anyway, so I may be imagining the output of a Nonperson Predicate somewhat differently to LW norm. OTOH, I guess it's not a priori impossible that a simple human-level AI could fit on something avvailable to the public, and such an insight would be ... risky, yeah. Upvoted.
First of all, I also believe that consciousness is most probably a sliding scale.
Secondly, again you just used "human-level" without specifying human-level at what, at intelligence or at consciousness; as such I'm not sure whether I actually communicated adequately my point that we're not discussing intelligence here, but just consciousness.
Well, they do seem to be correlated in any case. However, I was referring to consciousness (whatever that is.)
Re non-person predicates, do you even have a non-sharp (but non-trivial) lower bound for it? How do you know that the Sims from the namesake game aren't persons? How do we know that Watson is not suffering indescribably when losing a round of Jeopardy? And that imagining someone (whose behavior you can predict with high accuracy) suffering is not as bad as "actually" making someone suffer? If this bound has been definitively established, I'd appreciate a link.
It's unclear where our intuitions on the subject come from or how they work, and they are heavily .... distorted ... by various beliefs and biases. OTOH, it seems unlikely that rocks are conscious and we just haven't extrapolated far enough to realize. It's also unclear whether personhood is binary or there's some kind of sliding scale. Nevertheless, it seems clear that a fly is not worth killing people over.
Even a person who has never introspected about their moral beliefs can still know that murder is wrong. They're more likely to make mistakes, but still.
How are these related? One is epistemology and one is ontology.