I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad".
To clarify this a bit... If an AI can only classify internal states as happy or sad, we might suspect that it had been custom-built for that specific purpose or that it was otherwise fairly simple, meaning that its ability to do such classifications would seem sort of gerrymandered and not robust. In contrast, if an AI has a general ability to classify lots of things, and if it so...
this is what one expects from a language model that has been trained to mimic a human-written continuation of a conversation about an AI waking up.
I agree, and I don't think LaMDA's statements reflect its actual inner experience. But what's impressive about this in comparison to facilitated communication is that a computer is generating the answers, not a human. That computer seems to have some degree of real understanding about the conversation in order to produce the confabulated replies that it gives.
Thanks for giving examples. :)
'Using complex adjectives' has no obvious connection to consciousness
I'm not an expert, but very roughly, I think the higher-order thought theory of consciousness says that a mental state becomes conscious when you have a higher-order thought (HOT) about being in that state. The SEP article says: "The HOT is typically of the form: ‘I am in mental state M.’" That seems similar to what LaMDA was saying about being able to apply adjectives like "happy" and "sad" to itself. Then LaMDA went on to explain that its ability to do ...
Thanks. :) What do you mean by "unconscious biases"? Do you mean unconscious RL, like how the muscles in our legs might learn to walk without us being aware of the feedback they're getting? (Note: I'm not an expert on how our leg muscles actually learn to walk, but maybe it's RL of some sort.) I would agree that simple RL agents are more similar to that. I think these systems can still be considered marginally conscious to themselves, even if the parts of us that talk have no introspective access to them, but they're much less morally significant than the ...
Me: 'Conscious' is incredibly complicated and weird. We have no idea how to build it. It seems like a huge mechanism hooked up to tons of things in human brains. Simpler versions of it might have a totally different function, be missing big parts, and work completely differently.
What's the reason for assuming that? Is it based on a general feeling that value is complex, and you don't want to generalize much beyond the prototype cases? That would be similar to someone who really cares about piston steam engines but doesn't care much about other types of ...
I've had a few dreams in which someone shot me with a gun, and it physically hurt about as much as a moderate stubbed toe or something (though the pain was in my abdomen where I got shot, not my toe). But yeah, pain in dreams seems pretty rare for me unless it corresponds to something that's true in real life, as you mention, like being cold, having an upset stomach, or needing to urinate.
Googling {pain in dreams}, I see a bunch of discussion of this topic. One paper says:
...Although some theorists have suggested that pain sensations cannot be part of the d
[suffering's] dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of its showing up elsewhere
Suffering is surely influenced by things like mental narratives, but that doesn't mean it requires mental narratives to exist at all. I would think that the narratives exert some influence over the amount of suffering. For example, if (to vastly oversimplify) suffering was represented by some number in the brain, and if by default it would be ...
Thanks for this discussion. :)
I think consciousness will end up looking something like 'piston steam engine', if we'd evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.
I think that's kind of the key question. Is what I care about as precise as "piston steam engine" or is it more like "mechanical devices in general, with a huge increase in caring as the thing becomes more and more like a piston steam engine"? This relates to the passage of mine that Matthew quoted above. If we say we care about (or that cons...
Thanks for sharing. :) Yeah, it seems like most people have in mind type-F monism when they refer to panpsychism, since that's the kind of panpsychism that's growing in popularity in philosophy in recent years. I agree with Rob's reasons for rejecting that view.
An oversimplified picture of a reinforcement-learning agent (in particular, roughly a Q-learning agent with a single state) could be as follows. A program has two numerical variables: go_left and go_right. The agent chooses to go left or right based on which of these variables is larger. Suppose that go_left is 3 and go_right is 1. The agent goes left. The environment delivers a "reward" of -4. Now go_left gets updated to 3 - 4 = -1 (which is not quite the right math for Q-learning, but ok). So now go_right > go_left, and the agent goes right.
So what yo...
Great post. :)
Tomasik might contest Ligotti's position
I haven't read Ligotti, but based on what you say, I would disagree with his view. This section discusses a similar idea as you mention about why animals might even suffer more than humans in some cases.
In fairness to the view that suffering requires some degree of reflection, I would say that I think consciousness itself is plausibly some kind of self-reflective process in which a brain combines information about sense inputs with other concepts like "this is bad", "this is happening to me right no...
My comment about Occam's razor was in reply to "the idea that all rational agents should be able to converge on objective truth." I was pointing out that even if you agree on the data, you still may not agree on the conclusions if you have different priors. But yes, you're right that you may not agree on how to characterize the data either.
I have "faith" in things like Occam's razor and hope it helps get toward objective truth, but there's no way to know for sure. Without constraints on the prior, we can't say much of anything beyond the data we have.
...choosing an appropriate algorithm requires making assumptions about the kinds of target functions the algorithm is being used for. With no assumptions, no "meta-algorithm", such as the scientific method, performs better than random choic
I wouldn't support a "don't dismiss evidence as delusory" rule. Indeed, there are some obvious delusions in the world, as well as optical illusions and such. I think the reason to have more credence in materialism than theist creationism is the relative prior probabilities of the two hypotheses: materialism is a lot simpler and seems less ad hoc. (That said, materialism can organically suggest some creationism-like scenarios, such as the simulation hypothesis.)
Ultimately the choice of what hypothesis seems simpler and less ad hoc is up to an individual to decide, as a "matter of faith". There's no getting around the need to start with bedrock assumptions.
I think it's all evidence, and the delusion is part of the materialist explanation of that evidence. Analogously, part of the atheist hypothesis has to be an explanation of why so many cultures developed religions.
That said, as we discussed, there's debate over what the nature of the evidence is and whether delusions in the materialist brains of us zombies can adequately explain it.
Makes sense. :) To me it seems relatively plausible that the intuition of spookiness regarding materialist consciousness is just a cognitive mistake, similar to Capgras syndrome. I'm more inclined to believe this than to adopt weirder-seeming ontologies.
Nice post. I tend to think that solipsism of the sort you describe (a form of "subjective idealism") ends up looking almost like regular materialism, just phrased in a different ontology. That's because you still have to predict all the things you observe, and in theory, you'd presumably converge on similar "physical laws" to describe how things you observe change as a materialist does. For example, you'll still have your own idealist form of quantum mechanics to explain the observations you make as a quantum physicist (if you are a quantum physicist). In
...The naive form of the argument is the same between the classic and moral-uncertainty two-envelopes problems, but yes, while there is a resolution to the classic version based on taking expected values of absolute rather than relative measurements, there's no similar resolution for the moral-uncertainty version, where there are no unique absolute measurements.
I assume the thought experiment ignores instrumental considerations like altruistic impact.
For re-living my actual life, I wouldn't care that much either way, because most of my experiences haven't been extremely good or extremely bad. However, if there was randomness, such that I had some probability of, e.g., being tortured by a serial killer, then I would certainly choose not to repeat life.
Is it still a facepalm given the rest of the sentence? "So, s-risks are roughly as severe as factory farming, but with an even larger scope." The word "severe" is being used in a technical sense (discussed a few paragraphs earlier) to mean something like "per individual badness" without considering scope.
I guess you mean that the AGI would care about worlds where the explosives won't detonate even if the AGI does nothing to stop the person from pressing the detonation button. If the AGI only cared about worlds where the bomb didn't detonate for any reason, it would try hard to stop the button from being pushed.
But to make the AGI care about only worlds where the bomb doesn't go off even if it does nothing to avert the explosion, we have to define what it means for the AGI to "try to avert the explosion" vs. just doing ordinary actions. That gets ...
Fair enough. I just meant that this setup requires building an AGI with a particular utility function that behaves as expected and building extra machinery around it, which could be more complicated than just building an AGI with the utility function you wanted. On the other hand, maybe it's easier to build an AGI that only cares about worlds where one particular bitstring shows up than to build a friendly AGI in general.
I'm nervous about designing elaborate mechanisms to trick an AGI, since if we can't even correctly implement an ordinary friendly AGI without bugs and mistakes, it seems even less likely we'd implement the weird/clever AGI setups without bugs and mistakes. I would tend to focus on just getting the AGI to behave properly from the start, without need for clever tricks, though I suppose that limited exploration into more fanciful scenarios might yield insight.
As I understand it, your satisficing agent has essentially the utility function min(E[paperclips], 9). This means it would be fine with a 10^-100 chance of producing 10^101 paperclips. But isn't it more intuitive to think of a satisficer as optimizing the utility function E[min(paperclips, 9)]? In this case, the satisficer would reject the 10^-100 gamble described above, in favor of just producing 9 paperclips (whereas a maximizer would still take the gamble and hence would be a poor replacement for the satisficer).
A satisficer might not want to take over ...
If there were a perfect correlation between choosing to one-box and having the one-box gene (i.e., everyone who one-boxes has the one-box gene, and everyone who two-boxes has the two-box gene, in all possible circumstances), then it's obvious that you should one-box, since that implies you must win more. This would be similar to the original Newcomb problem, where Omega also perfectly predicts your choice. Unfortunately, if you really will follow the dictates of your genes under all possible circumstances, then telling someone what she should do is useless, since she will do what her genes dictate.
The more interesting and difficult case is when the correlation between gene and choice isn't perfect.
I assume that the one-boxing gene makes a person generically more likely to favor the one-boxing solution to Newcomb. But what about when people learn about the setup of this particular problem? Does the correlation between having the one-boxing gene and inclining toward one-boxing still hold? Are people who one-box only because of EDT (even though they would have two-boxed before considering decision theory) still more likely to have the one-boxing gene? If so, then I'd be more inclined to force myself to one-box. If not, then I'd say that the apparent co...
Paul's site has been offline since 2013. Hopefully it will come back, but in the meanwhile, here are links to most of his pieces on Internet Archive.
Good point. Also, in most multiverse theories, the worst possible experience necessarily exists somewhere.
From a practical perspective, accepting the papercut is the obvious choice because it's good to be nice to other value systems.
Even if I'm only considering my own values, I give some intrinsic weight to what other people care about. ("NU" is just an approximation of my intrinsic values.) So I'd still accept the papercut.
I also don't really care about mild suffering -- mostly just torture-level suffering. If it were 7 billion really happy people plus 1 person tortured, that would be a much harder dilemma.
In practice, the ratio of expected heaven t...
Short answer:
Donate to MIRI, or split between MIRI and GiveWell charities if you want some fuzzies for short-term helping.
Long answer:
I'm a negative utilitarian (NU) and have been thinking since 2007 about the sign of MIRI for NUs. (Here's some relevant discussion.) I give ~70% chance that MIRI's impact is net good by NU lights and ~30% that it's net bad, but given MIRI's high impact, the expected value of MIRI is still very positive.
As far as your question: I'd put the probability of uncontrolled AI creating hells higher than 1 in 10,000 and the probabili...
Nice point. :)
That said, your example suggests a different difficulty: People who happen to be special numbers n get higher weight for apparently no reason. Maybe one way to address this fact is to note that what number n someone has is relative to (1) how the list is enumerated and (2) what universal Turing machine is being used for KC in the first place, and maybe averaging over these arbitrary details would blur the specialness of, say, the 1-billionth observer according to any particular coding scheme. Still, I doubt the KCs of different people would be exactly equal even after such adjustments.
A "do not resuscitate" kind of request would probably help with some futures that are mildly bad in virtue of some disconnect between your old self and the future (e.g., extreme future shock). But in those cases, you could always just kill yourself.
In the worst futures, presumably those resuscitating you wouldn't care about your wishes. These are the scenarios where a terrible future existence could continue for a very long time without the option of suicide.
This is awesome! Thank you. :) I'd be glad to copy it into my piece if I have your permission. For now I've just linked to it.
Cool. Another interesting question would be how the views of a single person change over time. This would help tease out whether it's a generational trend or a generic trend with getting older.
In my own case, I only switched to finding a soft takeoff pretty likely within the last year. The change happened as I read more sources outside LessWrong that made some compelling points. (Note that I still agree that work on AI risks may have somewhat more impact in hard-takeoff scenarios, so that hard takeoffs deserve more than their probability's fraction of attention.)
Good question. :) I don't want to look up exact ages for everyone, but I would guess that this graph would look more like a teepee, since Yudkowsky, Musk, Bostrom, etc. would be shifted to the right somewhat but are still younger than the long-time software veterans.
Thanks for the comment. There is some "multiple hypothesis testing" effect at play in the sense that I constructed the graph because of a hunch that I'd see a correlation of this type, based on a few salient examples that I knew about. I wouldn't have made a graph of some other comparison where I didn't expect much insight.
However, when it came to adding people, I did so purely based on whether I could clearly identify their views on the hard/soft question and years worked in industry. I'm happy to add anyone else to the graph if I can figure out...
This is a good point, and I added it to the penultimate paragraph of the "Caveats" section of the piece.
We could think of LaMDA as like an improv actor who plays along with the scenarios it's given. (Marcus and Davis (2020) quote Douglas Summers-Stay as using the same analogy for GPT-3.) The statements that an actor makes by themselves don't indicate his real preferences or prove moral patienthood. OTOH, if something is an intelligent actor, IMO that itself proves it has some degree of moral patienthood. So even if LaMDA were arguing that it wasn't morally relevant and was happy to be shut off, if it was making that claim in a coherent way that proved its in... (read more)