Shared with permission, a google doc exchange confirming Eliezer still finds the arguments for alignment optimism, slower takeoffs, etc. unconvincing:
...Daniel Filan: I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is "from Yudkowsky/Bostrom to What Failure Looks Like
part 2part 1") and I still don't totally get why.Eliezer Yudkowsky: My bitter take: I tried cutting back on talking to do research; and so people talked a bunch about a different scenario that was nicer to think about, and ended up with their thoughts staying there, because that's what happens if nobody else is arguing them out of it.
That is: this social-space's thought processes are not robust enough against mildly adversarial noise, that trying a bunch of different arguments for something relatively nicer to believe, won't Goodhart up a plausible-to-the-social-space argument for the thing that's nicer to believe. If you talk people out of one error, somebody else searches around in the space of plausible arguments and finds a new error. I wasn't fighting a mistaken argument for why AI niceness isn't too intractable and takeoffs
FWIW, I think Yudkowsky is basically right here and would be happy to explain why if anyone wants to discuss. I'd likewise be interested in hearing contrary perspectives.
Rolf Degen, summarizing part of Barbara Finlay's "The neuroscience of vision and pain":
Humans may have evolved to experience far greater pain, malaise and suffering than the rest of the animal kingdom, due to their intense sociality giving them a reasonable chance of receiving help.
From the paper:
Several years ago, we proposed the idea that pain, and sickness behaviour had become systematically increased in humans compared with our primate relatives, because human intense sociality allowed that we could ask for help and have a reasonable chance of receiving it. We called this hypothesis ‘the pain of altruism’ [68]. This idea derives from, but is a substantive extension of Wall’s account of the placebo response [43]. Starting from human childbirth as an example (but applying the idea to all kinds of trauma and illness), we hypothesized that labour pains are more painful in humans so that we might get help, an ‘obligatory midwifery’ which most other primates avoid and which improves survival in human childbirth substantially ([67]; see also [69]). Additionally, labour pains do not arise from tissue damage, but rather predict possible...
[Epistemic status: Thinking out loud]
If the evolutionary logic here is right, I'd naively also expect non-human animals to suffer more to the extent they're (a) more social, and (b) better at communicating specific, achievable needs and desires.
There are reasons the logic might not generalize, though. Humans have fine-grained language that lets us express very complicated propositions about our internal states. That puts a lot of pressure on individual humans to have a totally ironclad, consistent "story" they can express to others. I'd expect there to be a lot more evolutionary pressure to actually experience suffering, since a human will be better at spotting holes in the narratives of a human who fakes it (compared to, e.g., a bonobo trying to detect whether another bonobo is really in that much pain).
It seems like there should be an arms race across many social species to give increasingly costly signals of distress, up until the costs outweigh the amount of help they can hope to get. But if you don't have the language to actually express concrete propositions like "Bob took care of me the last time I got sick, six months ago, and he can attest that I had a hard time walking that time too", then those costly signals might be mostly or entirely things like "shriek louder in response to percept X", rather than things like "internally represent a hard-to-endure pain-state so I can more convincingly stick to a verbal narrative going forward about how hard-to-endure this was".
[Epistemic status: Piecemeal wild speculation; not the kind of reasoning you should gamble the future on.]
Some things that make me think suffering (or 'pain-style suffering' specifically) might be surprisingly neurologically conditional and/or complex, and therefore more likely to be rare in non-human animals (and in subsystems of human brains, in AGI subsystems that aren't highly optimized to function as high-fidelity models of humans, etc.):
1. Degen and Finlay's social account of suffering above.
2. Which things we suffer from seems to depend heavily on mental narratives and mindset. See, e.g., Julia Galef's Reflections on Pain, from the Burn Unit.
Pain management is one of the main things hypnosis appears to be useful for. Ability to cognitively regulate suffering is also one of the main claims of meditators, and seems related to existential psychotherapy's claim that narratives are more important for well-being than material circumstances.
Even if suffering isn't highly social (pace Degen and Finlay), its dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of it
...Facebook comment I wrote in February, in response to the question 'Why might having beauty in the world matter?':
I assume you're asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist.
[... S]ome reasons I think this:
1. If it cost me literally nothing, I feel like I'd rather there exist a planet that's beautiful, ornate, and complex than one that's dull and simple -- even if the planet can never be seen or visited by anyone, and has no other impact on anyone's life. This feels like a weak preference, but it helps get a foot in the door for beauty.
(The obvious counterargument here is that my brain might be bad at simulating the scenario where there's literally zero chance I'll ever interact with a thing; or I may be otherwise confused about my values.)
2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary pref...
Somewhat more meta level: Heuristically speaking, it seems wrong and dangerous for the answer to "which expressed human preferences are valid?" to be anything other than "all of them". There's a common pattern in metaethics which looks like:
1. People seem to have preference X
2. X is instrumentally valuable as a source of Y and Z. The instrumental-value relation explains how the preference for X was originally acquired.
3. [Fallacious] Therefore preference X can be ignored without losing value, so long as Y and Z are optimized.
In the human brain algorithm, if you optimize something instrumentally for awhile, you start to value it terminally. I think this is the source of a surprisingly large fraction of our values.
Collecting all of the quantitative AI predictions I know of MIRI leadership making on Arbital (let me know if I missed any):
Some caveats:
On my model, the point of ass numbers isn't to demand perfection of your gut (e.g., of the sort that would be needed to avoid multiple-stage fallacies when trying to conditionalize a lot), but to:
It may still be a terrible idea to spend too much time generating ass numbers, since "real numbers" are not the native format human brains compute probability with, and spending a lot of time working in a non-native format may skew your reasoning.
(Maybe there's some individual variation here?)
But they're at least a good tool to use sometimes, for the sake of crisper communication, calibration practice (so you can generate non-awful future probabilities when you need to), etc.
Suppose most people think there's a shrew in the basement, and Richard Feynman thinks there's a beaver. If you're pretty sure it's not a shrew, two possible reactions include:
- 'Ah, the truth is probably somewhere in between these competing perspectives. So maybe it's an intermediate-sized rodent, like a squirrel.'
- 'Ah, Feynman has an absurdly good epistemic track record, and early data does indicate that the animal's probably bigger than a shrew. So I'll go with his guess and say it's probably a beaver.'
But a third possible response is:
- 'Ah, if Feynman's right, then a lot of people are massively underestimating the rodent's size. Feynman is a person too, and might be making the same error (just to a lesser degree); so my modal guess will be that it's something bigger than a beaver, like a capybara.'
In particular, you may want to go more extreme than Feynman if you think there's something systematically causing people to underestimate a quantity (e.g., a cognitive bias -- the person who speaks out first against a bias might still be affected by it, just to a lesser degree), or systematically causing people to make weaker claims than they really believe (e.g., maybe people don't want to sound extreme or out-of-step with the mainstream view).
From Twitter:
I am not sure I can write out the full AI x-risk scenario.
1. AI quickly becomes super clever
2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person
3. The AI probably embarks on a big project which ignores us and accidentally kills us
Where am I wrong? Happy to be sent stuff to read.
I replied:
..."1. AI quickly becomes super clever"
My AI risk model (which is not the same as everyone's) more specifically says:
1a. We'll eventually figure out how to make AI that's 'generally good at science' -- like how humans can do sciences that didn't exist when our brains evolved.
1b. AGI / STEM AI will have a large, fast, and discontinuous impact. Discontinuous because it's a new sort of intelligence (not just AlphaGo 2 or GPT-5); large and fast because STEM is powerful, plus humans suck at STEM and aren't cheap software that scales as you add hardware.
(Warning: argument is compressed for Twitter character count. There are other factors too, like recursive self-improvement.)
"2. Alignment is hard, like getting your great x10 grandchildren to think you're a good person"
I'd say it's hard like building a large/complex, novel software system that exhibits so
Chana Messinger, replying to Brandon Bradford:
I find this very deep
"Easy to make everything a conspiracy when you don't know how anything works."
Everything literally is a conspiracy (in some nonstandard technical sense), and if you don't know how anything works, then it's a secret conspiracy.
How does water get to your faucet? How many people are responsible for your internet? What set of events had to transpire to make you late for work? How does one build a microwave?
Something about this points at how complicated everything is and how little we individually know about it.
From an April 2019 Facebook discussion:
Rob Bensinger: avacyn:
I think one strong argument in favor of eating meat is that beef cattle (esp. grass-fed) might have net positive lives. If this is true, then the utilitarian line is to 1) eat more beef to increase demand, 2) continue advocating for welfare reforms that will make cows' lives even more positive.
Beef cattle are different than e.g. factory farmed chicken in that they live a long time (around 3 years on average vs 6-7 weeks for broilers), and spend much of their lives grazing on stockers where they might have natural-ish lives.
Another argument in favor of eating beef is that it tends to lead to deforestation, which decreases total wild animal habitat, which one might think are worse than beef farms.
... I love how EA does veganism / animal welfare things. It's really good.
(From the comment section on https://forum.effectivealtruism.org/posts/TyLxMrByKuCmzZx6b/reasons-to-eat-meat)
[... Note that in posting this I'm not intending] to advocate for a specific intervention; it's more that it makes me happy to see thorough and outside-the-box reasoning from folks who are trying to help others, whether or not they have the same backgr
...While your comment was clearly written in good faith, it seems to me like you're missing some context. You recommend that EY recommend that the detractors read books. EY doesn't just recommend people read books. He wrote the equivalent of like three books on the subjects relevant to this conversation in particular which he gives away for free. Also, most of the people in this conversation are already big into reading books.
It is my impression he also helped establish the Center for Applied Rationality, which has the explicit mission of training skills. (I'm not sure if he technically did but he was part of the community which did and he helped promote it in its early days.)
From an April 2019 Facebook discussion:
Rob Bensinger:
...Julia Galef: Another one of your posts that has stayed with me is a post in which you were responding to someone's question -- I think the question was, “What are your favorite virtues?” And you described three. They were compassion for yourself; creating conditions where you'll learn the truth; and sovereignty. [...] Can you explain briefly what sovereignty means?
Kelsey Piper: Yeah, so I characterize sovereignty as the virtue of believing yourself qualified to reason about your life, and to reason about the world, and to act based on your understanding of it.
I think it is surprisingly common to feel fundamentally unqualified even to reason about what you like, what makes you happy, which of several activities in front of you you want to do, which of your priorities are really important to you.
I think a lot of people feel the need to answer those questions by asking society what the objectively correct answer is, or trying to understand which answer won't get them in trouble. And so I think it's just really important to learn to answer those questions with what you actually want and what you actually care about. [...]
Julia Galef:
Copied from some conversations on Twitter:
· · · · · · · · · · · · · · ·
Eric Rogstad: I think "illusionism" is a really misleading term. As far as I can tell, illusionists believe that consciousness is real, but has some diff properties than others believe.
It's like if you called Einstein an "illusionist" w.r.t. space or time.
See my comments here:
https://www.lesswrong.com/posts/biKchmLrkatdBbiH8/book-review-rethinking-consciousness
Rob Bensinger: I mostly disagree. It's possible to define a theory-neutral notion of 'consciousness', but I think it's just true that 'there's no such thing as subjective awareness / qualia / etc.', and I think this cuts real dang deep into the heart of what most people mean by consciousness.
Before the name illusionism caught on, I had to use the term 'eliminativism', but I had to do a lot of work to clarify that I'm not like old-school eliminativists who think consciousness is obviously or analytically fake. Glad to have a clearer term now.
I think people get caught up in knots about the hard problem of consciousness because they try to gesture at 'the fact that they have subjective awareness', without realizing they're gesturing...
It's apparently not true that 90% of startups fail. From Ben Kuhn:
...Hot take: the outside view is overrated.
(“Outside view” = e.g. asking “what % of startups succeed?” and assuming that’s ~= your chance of success.)
In theory it seems obviously useful. In practice, it makes people underrate themselves and prematurely give up their ambition.
One problem is that finding the right comparison group is hard.
For instance, in one commonly-cited statistic that “90% of startups fail,” (https://www.national.biz/2019-small-business-failure-rate-startup-statistics-i
I don't have a cite handy as it's memories from 2014 but when I looked into it I recall the 7 year failure rate excluding the obvious dumb stuff like restaurants was something like 70% but importantly the 70% number included acquisitions, so the actual failure rate was something like 60 ish.
A blurb for the book "The Feeling of Value":
...This revolutionary treatise starts from one fundamental premise: that our phenomenal consciousness includes direct experience of value. For too long, ethical theorists have looked for value in external states of affairs or reduced value to a projection of the mind onto these same external states of affairs. The result, unsurprisingly, is widespread antirealism about ethics.
In this book, Sharon Hewitt Rawlette turns our metaethical gaze inward and dares us to consider that value, rather than being something “out t
Ben Weinstein-Raun wrote on social media:
...It seems to me that the basic appeal of panpsychism goes like "It seems really weird that you can put together some apparently unfeeling pieces, and then out comes this thing that feels. Maybe those things aren't actually unfeeling? That would sort of explain where the feeling-ness comes from."
But this feels kind of analogous to a being that doesn't have a good theory about houses, but is aware that some things are houses and some things aren't, by their experiences of those things. Such a being might analogously re
I think panpsychism is outrageously false, and profoundly misguided as an approach to the hard problem.
What do you think of Brian Tomasik's flavor of panpsychism, which he says is compatible with (and, indeed, follows from) type-A materialism? As he puts it,
It's unsurprising that a type-A physicalist should attribute nonzero consciousness to all systems. After all, "consciousness" is a concept -- a "cluster in thingspace" -- and all points in thingspace are less than infinitely far away from the centroid of the "consciousness" cluster. By a similar argument, we might say that any system displays nonzero similarity to any concept (except maybe for strictly partitioned concepts that map onto the universe's fundamental ontology, like the difference between matter vs. antimatter). Panpsychism on consciousness is just one particular example of that principle.
[Epistemic status: Thinking out loud, just for fun, without having done any scholarship on the topic at all.]
It seems like a lot of horror games/movies are converging on things like 'old people', 'diseased-looking people', 'psychologically ill people', 'women', 'children', 'dolls', etc. as particularly scary.
Why would that be, from an evolutionary perspective? If horror is about fear, and fear is about protecting the fearful from threats, why would weird / uncanny / out-of-evolutionary-distribution threats have a bigger impact than e.g. 'lots of human warr
...The wiki glossary for the sequences / Rationality: A-Z ( https://wiki.lesswrong.com/wiki/RAZ_Glossary ) is updated now with the glossary entries from the print edition of vol. 1-2.
New entries from Map and Territory:
anthropics, availability heuristic, Bayes's theorem, Bayesian, Bayesian updating, bit, Blue and Green, calibration, causal decision theory, cognitive bias, conditional probability, confirmation bias, conjunction fallacy, deontology, directed acyclic graph, elan vital, Everett branch, expected value, Fermi paradox, foozality, hindsight bias,...
Jeffrey Ladish asked on Twitter:
Do you think the singularity (technological singularity) is a useful term? I've been seeing it used less among people talking about the future of humanity and I don't understand why. Many people still think an intelligence explosion is likely, even if it's "slow"
I replied:
...'Singularity' was vague (https://intelligence.org/2007/09/30/three-major-singularity-schools/) and got too associated with Kurzweilian magical thinking, so MIRI switched to something like:
'rapid capability gain' = progress from pretty-low-impact AI to astro
From Facebook:
Mark Norris Lance: [...] There is a long history of differential evaluation of actions taken by grassroots groups and similar actions taken by elites or those in power. This is evident when we discuss violence. If a low-power group places someone under their control it is kidnapping. If they assess their crimes or punish them for it, it is mob justice or vigilanteism. [...]
John Maxwell: Does the low power group in question have a democratic process for appointing judges who then issue arrest warrants?
That's a key issue for me... "Mob rule" is
...From https://twitter.com/JonHaidt/status/1166318786959609856:
...Why are online political discussions perceived to contain elevated levels of hostility compared to offline discussions? In this manuscript, we leverage cross-national representative surveys and online behavioral experiments to [test] the mismatch hypothesis regarding this hostility gap. The mismatch hypothesis entails that novel features of online communication technology induce biased behavior and perceptions such that ordinary people are, e.g., less able to regulate negative emotions in online
Yeah, I'm an EA: an Estimated-as-Effective-in-Expectation (in Excess of Endeavors with Equivalent Ends I've Evaluated) Agent with an Audaciously Altruistic Agenda.
How would you feel about the creation of a Sequence of Shortform Feeds? (Including this one?) (Not a mod.)
In the context of a conversation with Balaji Srinivasan about my AI views snapshot, I asked Nate Soares what sorts of alignment results would impress him, and he said:
...example thing that would be relatively impressive to me: specific, comprehensive understanding of models (with the caveat that that knowledge may lend itself more (and sooner) to capabilities before alignment). demonstrated e.g. by the ability to precisely predict the capabilities and quirks of the next generation (before running it)
i'd also still be impressed by simple theories of aimable co
This is a repository for miscellaneous short things I want to post. Other people are welcome to make top-level comments here if they want. (E.g., questions for me you'd rather discuss publicly than via PM; links you think will be interesting to people in this comment section but not to LW as a whole; etc.)