What's the chance that AI doesn't have that much of an impact on the world by 2100?
Honestly, that one belongs in the settled-questions bin next to theism. Making intellectual progress requires having spaces where the basics can be taken for granted, for a definition of "the basics" that's for people trying to contribute at the intellectual frontier, rather than for the human population at large.
How well does the epistemic process on LW work? Are there any changes you would make to LW's epistemic processes?
This is never going to be perfect, anywhere, and people should always be on the lookout for epistemic problems. But there's a pretty strong outside-view reason to think LW's epistemics will outperform those of the rest of the world: it's full of people investing heavily in improving their epistemics, and having abstract discussions about them.
What's the chance that we do have massive impacts, but alignment is so easy that standard ML techniques work?
I think this is the core question, but is a slightly incorrect framing. I also think this is the core point of disagreement between the AGI Ruin perspective and the AI Accelerationist perspective.
How hard alignment is, is a continuous variable, not a boolean. The edges of the range are "it's borderline impossible to solve before time runs out" and "it's trivial and will solve itself". The same applies to framing specific research as capabilities research or as alignment research: a lot of things live in the border research, where it makes more sense to think in terms of things having a ratio between those two.
I don't think the people leading and working in AGI research programs think alignment is easy. I do think that they think that it's easier, by a large enough amount to change their view of the cost-benefit of accelerating the timelines. And because this is a continuous variable with a lot of inputs, expanding it out doesn't yield a single large crux that distinguishes the two camps, but rather a large number of smaller, unshared differences in belief and intuition.
(I lean more towards the "it's hard" side, but am decidedly not on the edge of the scale; I think it's likely to be close enough that individual insights into alignment, and modest changes to research timelines, could potentially be decisive. I also think that my difficulty-estimation could move substantially in either direction without changing my beliefs about the correct course of action, due to a "playing to outs" argument.)
Honestly, that one belongs in the settled-questions bin next to theism. Making intellectual progress requires having spaces where the basics can be taken for granted, for a definition of "the basics" that's for people trying to contribute at the intellectual frontier, rather than for the human population at large.
Strong downvoted for tone on this: the reason it belongs in the settled bin is because it's really easy to answer the question. Simply, AI has already had an enormous impact, and more of the same would be pretty damn world-changing.
Agree voted.
This probability is the probability of non AI apocalypse. (Large asteroid impacts, nuclear war, alien invasion, vacuum collapse, etc). Basically assuming nothing stopping humans from continuing to improve AI, the chance of "not much impact" is precisely 0. It's 0 because either it already had an impact or will in the very near future with just slight and obvious improvements to the AI systems that already exist. What sort of future history would have "no significant impact" and HOW? This is like asking after the first Trinity fission weapon test what the probability by 2022 there would be "no significant impact" from nuclear weapons. It's 0 - already the atmosphere of the earth was contaminated we just didn't know it.
This is very possible. Complex deception and unstoppable plans to conquer the planet and so on require specific setups for the agent, like "long term reward". Actual models have myopia inherently, due to how they are trained and limitations on their computational resources. This means a "paperclip production agent" is probably more likely to spend all it's compute optimizing for small variables like air temperature differences and other parameters to accelerate the robots producing paperclips than to invest in a multi year plan to take over the planet that will let it tile the solar system in paperclip plants after it wins a world war.
I think it isn't productive to say "let's not talk of how we would improve capabilities.". Modeling how future systems are likely to actually work helps to model how you might restrict their behavior effectively.
What sort of future history would have "no significant impact" and HOW? This is like asking after the first Trinity fission weapon test what the probability by 2022 there would be "no significant impact" from nuclear weapons. It's 0 - already the atmosphere of the earth was contaminated we just didn't know it.
Zero is not a probability. What if Japan had surrendered before the weapons could be deployed, and the Manhattan project had never been completed? I could totally believe in a one in one hundred thousand probability that nuclear weapons just never saw proliferation, maybe more.
If we have short term myopic misaligned AI is still misaligned. It looks like social media algorithms promoting clickbait, like self driving cars turning themselves off half a second before an inevitable crash. Like chatbot recommendation systems telling you what you want to hear, never mind if it's true.
This is a world where AI is widely used, and is full of non-world-destroying bugs. Self driving cars have been patched and twiddled until they usually work. But on Tuesdays when the moon is waning, they will tend to move to change lanes the rightmost lane, and no one knows why.
I think that it is wrong. If instead of dropping nukes on mostly wooden cities they used them against enemy troops (or ships, or even cities which aren't built out of bamboo), the conclusion would be that a nuke is a not that powerful and cost-inefficient weapon.
As for "significant impact" - what impact counts as "significant"? Here are some technologies which on my opinion had no significant impact so far:
It is totally possible that AI goes into the same bag.
Imagine LessWrong started with an obsessive focus on the dangers of time-travel.
Because the writers are persuasive there are all kinds of posts filled with references that are indeed very persuasive regarding the idea that time-travel is ridiculously dangerous, will wipe out all human life and we must make all attempts to stop time-travel.
So we see some new quantum entanglement experiment treated with a kind of horror. People would breathlessly "update their horizon" like this matters at all. Physicists completing certain problems or working in certain areas would be mentioned by name and some people would try to reach out to them to convince them how dangerous time-travel and what they're doing is.
Meanwhile, from someone not taken in by very persuasive writing, vast holes are blindingly obvious. When those vast holes are discussed... well, they're not discussed. They get nil traction, are ignored, aren't treated with any seriousness.
Examples of magical thinking (they're going to find unobtainium and that'll be it, they'll have a working time-machine within five years) are rife but rarely challenged.
I view a lot of LessWrong like this.
I'll provide two examples.
For 1 - we don't have any examples of this in nature. We have evolution over enormous timelines which has eventually produced intelligence in humans and varying degrees of it in other species. We don't have any strong examples of computers improving code which in turn improves code which in turn improves code. ChatGPT for all the amazing things it can do -- okay, so here's the source code for Winzip, make compression better. I do agree "this slow thing but done faster" is possible but it is an extraordinarily weak claim that self-improvement can exist at all. Just because learning exists, does not mean fundamental architecture upgrades can be made self-recursively.
For 2 - AI seems to always be given near godlike magical powers. It will be able to "hack" any computer system. Oh, so it worked out how to break all cryptography? It will be able to take over manufacturing to make things to kill people? How exactly? It'll be able to work up a virus to kill all humans and then hire some lab to make it... are we really sure about this?
I wrote about the "reality of the real world" recently. So many technologies and processes aren't written down. They're stored in meat minds, not in patents, and embodied in plant equipment and vast, intricate supply chains. Just trying to take over Taiwan chip manufacturing would be near impossible because they're so far out on the the cutting edge they jealously guard their processes.
I love sci-fi but there are more than a few posts that are pretty close to sci-fi fan fiction than actual real problems.
The risk of humans using ChatGPT and so on to distort narratives, destroy opponents, screw with political processes and so on seems vastly more deadly and serious than an AI will self-improve and kill us all.
Going back to the idea of LessWrong obsessed with time-travel - what would you think of such a place? It would have all the predictions, and persuasive posts, and people very dedicated to it... and they could all just be wrong.
For what it's worth, I strongly support the premise that anything possible in nature is possible for humans to replicate with technology. X-rays exist, we learn how to make and use them. Fusion exists, we will learn how to make fusion. Intelligence/sentience/sapience exists - we will learn how to do this. But I rarely see anyone touch on the idea of "what if we only make something as smart as us?"
For 1 - we don’t have any examples of this in nature.
We don't have any examples of steam engines, supersonic aircraft or transistors in nature either. Saying that something can't happen because it hasn't evolved in nature is an extraordinarily poor argument.
1) True, we don't have any examples of this in nature. Would we expect them?
Lets say that to improve something, it is necessary and sufficient to understand it and have some means to modify it. Plenty of examples, most of the complicated ones are with humans understanding some technology and designing a better version.
At the moment, the only minds able to understand complicated things are humans, and we haven't got much human self improvement because neuroscience is hard.
I think it is fairly clear that there is a large in practice gap b...
But I rarely see anyone touch on the idea of "what if we only make something as smart as us?"
But why would intelligence reach human level and then halt there? There's no reason to think there's some kind of barrier or upper limit at that exact point.
Even in the weird case where that were true, aren't computers going to carry on getting faster? Just running a human level AI on a very powerful computer would be a way of creating a human scientist that can think at 1000x speed, create duplicates of itself, modify it's own brain. That's already a superintelligence isn't it?
A helpful way of thinking about 2 is imagining something less intelligent than humans trying to predict how humans will overpower it.
You could imagine a gorilla thinking "there's no way a human could overpower us. I would just punch it if it came into my territory."
The actual way a human would overpower it is literally impossible for the gorilla to understand (invent writing, build a global economy, invent chemistry, build a tranquilizer dart gun...)
The AI in the AI takeover scenario is that jump of intelligence and creativity above us. There's literally no way a puny human brain could predict what tactics it would use. I'd imagine it almost definitely involves inventing new branches of science.
This is approximately my experience of this place.
That, and the apparent runaway cult generation machine that seems to have started.
Seriously, it is apparent that over the last few years the mental health of people involved with this space has collapsed and started producing multiple outright cults. People should stay out of this fundamentally broken epistemic environment. I come closer to expecting a Heaven's Gate event every week when I learn about more utter insanity.
the chance that [...] alignment is so easy that standard ML techniques work
I think this is probably true for LLM AGIs at least in the no-extinction sense, but has essentially no bearing on transitive AI risk (danger of AI tech that comes after first AGIs, developed by them or their successors). Consequently P(extinction) by 2100 only improves through alignment of first AGIs if they manage to set up reliable extinction risk governance, otherwise they are just going to build some more AGIs that don't have the unusual property of being aligned by default.
And there is no indication that LLM AGIs would be in a much better position to delay AGI capability research until alignment theory makes it safe than we are, though the world order disruption from change in serial speed of thought probably gives them a chance to set this up.
Presumably we will build ML AGIs because they are safe and they won't build unsafe non-ML AGI for the same reason we didn't - because it wouldn't be safe. So the idea is that alignment is so easy it actually transitive.
Presumably we will build ML AGIs because they are safe
I don't see anything in the structure of humanity's AGI-development process that would ensure this property. LLM human imitations are only plausibly aligned because they are imitations of humans. There are other active lines of research vying with them for the first AGI, with no hope for their safety.
For the moment, LLM characters have the capability advantage of wielding human faculties, not needing to reinvent alternatives for them from scratch. This is an advantage for crossing the AGI threshold, which humans already crossed, but not for improving further than that. There is nothing in this story that predicates the outcome on safety.
I'm not sure, but Nate's recent post updated me towards this opinion significantly in many ways. I still think there's significant risk, but I trust the cultural ensemble a lot more after reading nate's post.
There are a lot of highly respected researchers who have similar opinions, though.
and it's not like machine learning has consensus on much in the domain of speculative predictions, even ones by highly skilled researchers with track records are doubted by significant portions of the field.
science is hard yo.
I will say, people who think the rationality sphere has bad epistemics, very fair, but people who think the rationality sphere on less wrong has bad epistemics, come fight me on less wrong! let's argue about it! people here might not change their minds as well as they think they do, but the software is much better for intense discussions than most other places I've found.
I think the lesswrong community is wrong about x-risk and many of the problems about ai, and I've got a draft longform with concrete claims that I'm working on...
But I'm sure it'll be downvoted because the bet has goalpost-moving baked in, and lots of goddamn swearing, so that makes me hesitant to post it.
if you think it's low quality, post it, and warn that you think it might be low quality, but like, maybe in less self-dismissive phrasing than "I'm sure it'll be downvoted". I sometimes post "I understand if this gets downvoted - I'm not sure how high quality it is" types of comments. I don't think those are weird or bad, just try to be honest in both directions, don't diss yourself unnecessarily.
And anyway, this community is a lot more diverse than you think. it's the rationalist ai doomers who are rationalist ai doomers - not the entire lesswrong alignment community. Those who are paying attention to the research and making headway on the problem, eg wentworth, seem considerably more optimistic. The alarmists have done a good job being alarmists, but there's only so much being an alarmist to do before you need to come back down to being uncertain and try to figure out what's actually true, and I'm not impressed with MIRI lately at all.
Thanks. fyi, i tried making the post i alluded to:
"the bet" -- what bet?
A word of advice: don't post any version of it that says "I'm sure this will be downvoted". Saying that sort of thing is a reliable enough signal of low quality that if your post is actually good then it will get a worse reception than it deserves because of it.
don't post any version of it that says "I'm sure this will be downvoted"
For sure. The actual post I make will not demonstrate my personal insecurities.
what bet?
I will propose a broad test/bet that will shed light on my claims or give some places to examine.
Good news is that it mostly doesn't matter for the question of what should be done - even if doom scenarios are unlikely, researchers definitely don't have enough certainty to justly ignore them and continue developing ML.
Why is this being downvoted?
From what I am seeing people here are focusing way too much on having a precisely calibrated P(doom) value.
It seems that even if P(doom) is 1% the doom scenario should be taken very seriously and alignment research pursued to the furthest extent possible.
The probability that after much careful calibration and research you would come up with a P(doom) value less than 1% seems very unlikely to me. So why invest time into refining your estimate?
because it fails to engage with the key point: that the low predictiveness of the dynamics of ai risk makes it hard for people to believe there's a significant risk at all. I happen to think there is; that's why I clicked agree vote. but I clicked karma downvote because of failing to engage with the key epistemic issue at hand.
I find Eliezer and Nates' arguments compelling but I do downgrade my p(doom) somewhat (-30% maybe?) because there are intelligent people (inside and outside of LW/EA) who disagree with them.
I had some issues with the quote
Will continue to exist regardless of how well you criticize any one part of it.
I'd say LW folk are unusually open to criticism. I think if there were strong arguments they really would change people's minds here. And especially arguments that focus on one small part at a time.
But have there been strong arguments? I'd love to read them.
There's basically little reason to engage with it. These are all also evidence that there's something epistemically off with what is going on in the field.
For me the most convincing evidence that LW is doing something right epistemically is how they did better than basically everyone on Covid. Granted that's not the alignment forum but it was some of the same people and the same weird epistemic culture at work.
There are intelligent people who disagree, but I was under the impression there was a shortage of intelligent disagreement. Most of the smart disagreement sounds like smart people who haven't thought in great depth about AI risk in particular, and are often shooting down garbled misunderstandings of the case for AI risk.
I think that's true of people like: Steven Pinker and Neil deGrasse Tyson. They're intelligent but clearly haven't engaged with the core arguments because they're saying stuff like "just unplug it" and "why would it be evil?"
But there's also people like...
Robin Hanson. I don't really agree with him but he is engaging with the AI risk arguments, has thought about it a lot and is a clever guy.
Will MacAskill. One of the most thoughtful thinkers I know of, who I'm pretty confident will have engaged seriously with the AI Risk arguments. His p(doom) is far lower than Eliezer's. I think he says 3% in What We Owe The Future.
Other AI Alignment experts who are optimistic about our chances of solving alignment and put p(doom) lower (I don't know enough about the field to name people.)
And I guess I am reserving some small amount of probability for "most of the world's most intelligent computer scientists, physicists, mathematicians aren't worried about AI Risk, could I be missing something?" My intuitions from playing around on prediction markets is you have to adjust your bets slightly for those kind of considerations.
Robin Hanson is weird. He paints a picture of a grim future where all nice human values are eroded away, replaced with endless frontier replicators optimized and optimizing only for more replication. And then he just accepts it as if that was fine.
Will Macaskill seems to think AI risk is real. He just thinks alignment is easy. He has a specific proposal involving making anthropomorphic AI and raising it like a human child that he seems keen on.
I just posted a detailed explanation of why I am very skeptical of the traditional deceptive alignment story. I'd love to hear what you think of it!
This question is inspired by 1a3orn's comment on how there are troubling signs of epistemic issues in LW's Alignment field.
I'll quote the comment here to tell you what I mean:
So I want to ask a question: How seriously should we take the hypothesis that LW is totally wrong on AI?
Specifically, this splits into several subquestions:
What's the chance that AI doesn't have that much of an impact on the world by 2100?
What's the chance that we do have massive impacts, but alignment is so easy that standard ML techniques work?
How well does the epistemic process on LW work? Are there any changes you would make to LW's epistemic processes?
I welcome all answers, and I especially welcome any critics of LW/negative answers to at least answer one of the questions I have.
Edit: For people that don't have a specific scenario in mind, I'll ask a specific question. It doesn't have to be answered, but any answers on this question are appreciated, especially from critics of the "AI is significant" idea.
1a. What probability will the Explosion or Collapse scenario from Cold Takes happen by 2100?
Link to the scenarios below:
https://www.cold-takes.com/this-cant-go-on/