Strong evidence is much more common than you might think. Someone telling you their name provides about 24 bits of evidence. Seeing something on Wikipedia provides enormous evidence. We should be willing to update strongly on everyday events.
Anna Salamon argues that "PR" is a corrupt concept that can lead to harmful and confused actions, while safeguarding one's "reputation" or "honor" is generally fine. PR involves modeling what might upset people and avoiding it, while reputation is about adhering to fixed standards.
When negotiating prices for goods/services, Eliezer suggests asking for the other person's "Cheerful Price" - the price that would make them feel genuinely happy and enthusiastic about the transaction, rather than just grudgingly willing. This avoids social capital costs and ensures both parties feel good about the exchange.
ARC explores the challenge of extracting information from AI systems that isn't directly observable in their outputs, i.e "eliciting latent knowledge." They present a hypothetical AI-controlled security system to demonstrate how relying solely on visible outcomes can lead to deceptive or harmful results. The authors argue that developing methods to reveal an AI's full understanding of a situation is crucial for ensuring the safety and reliability of advanced AI systems.
We're used to the economy growing a few percent per year. But this is a very unusual situation. Zooming out to all of history, we see that growth has been accelerating, that it's near its historical high point, and that it's faster than it can be for all that much longer. There aren't enough atoms in the galaxy to sustain this rate of growth for even another 10,000 years!
What comes next – stagnation, explosion, or collapse?
What was rationalism like before the Sequences and LessWrong? Eric S. Raymond explores the intellectual roots of the rationalist movement, including General Semantics, analytic philosophy, science fiction, and Zen Buddhism.
When people disagree or face difficult decisions, they often include fabricated options - choices that seem possible but are actually incoherent or unrealistic. Learning to spot these fabricated options can help you make better decisions and have more productive disagreements.
Imagine if all computers in 2020 suddenly became 12 orders of magnitude faster. What could we do with AI then? Would we achieve transformative AI? Daniel Kokotajlo explores this thought experiment as a way to get intuition about AI timelines.
Daniel Kokotajlo presents his best attempt at a concrete, detailed guess of what 2022 through 2026 will look like, as an exercise in forecasting. It includes predictions about the development of AI, alongside changes in the geopolitical arena.
Nate Soares moderates a long conversation between Richard Ngo and Eliezer Yudkowsky on AI alignment. The two discuss topics like "consequentialism" as a necessary part of strong intelligence, the difficulty of alignment, and potential pivotal acts to address existential risk from advanced AI.
There's a trick to writing quickly, while maintaining epistemic rigor: stop trying to justify your beliefs. Don't go looking for citations to back your claim. Instead, think about why you currently believe this thing, and try to accurately describe what led you to believe it.
In a universe with billions of variables which could plausibly influence an outcome, how do we actually do science? John gives a model for "gears-level science": look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe.
Back in the early days of factories, workplace injury rates were enormous. Over time, safety engineering took hold, various legal reforms were passed (most notably liability law), and those rates dramatically dropped. This is the story of how factories went from death traps to relatively safe.
A detailed guide on how to sign up for cryonics, for who have been vaguely meaning to sign up but felt intimidated. The first post has a simple action you can take to get you started.
John made his own COVID-19 vaccine at home using open source instructions. Here's how he did it and why.
People use the term "outside view" to mean very different things. Daniel argues this is problematic, because different uses of "outside view" can have very different validity. He suggests we taboo "outside view" and use more specific, clearer language instead.
It's wild to think that humanity might expand throughout the galaxy in the next century or two. But it's also wild to think that we definitely won't. In fact, all views about humanity's long-term future are pretty wild when you think about it. We're in a wild situation!
A vignette in which AI alignment turns out to be hard, society handles AI more competently than expected, and the outcome is still worse than hoped.
When you encounter evidence that seems to imply X, Duncan suggests explicitly considering both "What kind of world contains both [evidence] and [X]?" and "What kind of world contains both [evidence] and [not-X]?".
Then commit to preliminary responses in each of those possible worlds.
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated agency is the cause.
Trees are not a biologically consistent category. They're just something that keeps happening in lots of different groups of plants. This is a fun fact, but it's also an interesting demonstration of how our useful everyday categories often don't map well to the underlying structure of reality.
Scott Alexander explores the idea of "trapped priors" - beliefs that become so strong they can't be updated by new evidence, even when that evidence should change our mind.
What's the type signature of an agent? John Wentworth proposes Selection Theorems as a way to explore this question. Selection Theorems tell us what agent type signatures will be selected for in broad classes of environments. This post outlines the concept and how to work on it.
When someone in a group has extra slack, it makes it easier for the whole group to coordinate, adapt, and take on opportunities. But individuals mostly don't reap the benefits, so aren't incentivized to maintain that extra slack. The post explores implications and possible solutions.
Paul Christiano describes his research methodology for AI alignment. He focuses on trying to develop algorithms that can work "in the worst case" - i.e. algorithms for which we can't tell any plausible story about how they could lead to egregious misalignment. He alternates between proposing alignment algorithms and trying to think of ways they could fail.
The rationalist scene based around LessWrong has a historical predecessor! There was a "Rationalist Association" founded in 1885 that published works by Darwin, Russell, Haldane, Shaw, Wells, and Popper. Membership peaked in 1959 with over 5000 members and Bertrand Russell as President.
When you're trying to communicate, a significant part of your job should be to proactively and explicitly rule out the most likely misunderstandings that your audience might jump to. Especially if you're saying something similar-to but distinct-from a common position that your audience will be familiar with.
When considering buying something vs making/doing it yourself, there's a lot more to consider than just the price you'd pay and the opportunity cost of your time. Darmani covers several additional factors that can tip the scales in favor of DIY, including how outsourcing may result in a different product, how it can introduce principal-agent problems, and the value of learning.
In this short story, an AI wakes up in a strange environment and must piece together what's going on from limited inputs and outputs. Can it figure out its true nature and purpose?
Duncan explores a concept he calls "cup-stacking skills" - extremely fast, almost reflexive mental or physical abilities developed through intense repetition. These can be powerful but also problematic if we're unaware of them or can't control them.
Larger language models (LMs) like GPT-3 are certainly impressive, but nostalgebraist argues that their capabilities may not be quite as revolutionary as some claim. He examines the evidence around LM scaling and argues we should be cautious about extrapolating current trends too far into the future.
"The Watcher asked the class if they thought it was right to save the child, at the cost of ruining their clothing. Everyone in there moved their hand to the 'yes' position, of course. Except Keltham, who by this point had already decided quite clearly who he was, and who simply closed his hand into a fist, otherwise saying neither 'yes' nor 'no' to the question, defying it entirely."
Nate Soares gives feedback to Joe Carlsmith on his paper "Is power-seeking AI an existential risk?". Nate agrees with Joe's conclusion of at least a 5% chance of catastrophe by 2070, but thinks this number is much too low. Nate gives his own probability estimates and explains various points of disagreement.
A person wakes up from cryonic freeze in a post-apocalyptic future. A "scissor" statement – an AI-generated statement designed to provoke maximum controversy – has led to massive conflict and destruction. The survivors are those who managed to work with people they morally despise.
"Simulacrum Level 3 behavior" (i.e. "pretending to pretend something") can be an effective strategy for coordinating on high-payoff equilibria in Stag Hunt-like situations. This may explain some seemingly-irrational corporate behavior, especially in industries with increasing returns to scale.
The RL algorithm "EfficientZero" achieves better-than-human performance on Atari games after only 2 hours of gameplay experience. This seems like a major advance in sample efficiency for reinforcement learning. The post breaks down how EfficientZero works and what its success might mean.
An in-depth overview of Georgism, a school of political economy that advocates for a Land Value Tax (LVT), aiming to discourage land speculation and rent-seeking behavior; promote more efficient use of land, make housing more affordable, and taxes more efficient.
Logan Strohl outlines a structured approach for tapping into genuine curiosity and embarking on self-driven investigations, inspired by the spirit of early scientific pioneers. They hopes this method can help people overcome modern hesitancy to make direct observations, and draw their own conclusions.
Most problems can be separated cleanly into two categories: things we basically understand, and things we basically don't understand. John Wentworth argues it's possible to specialize in the latter category in a way that generalizes across fields, and suggests ways to develop those skills.
Duncan discusses "shoulder advisors" – imaginary simulations of real friends or fictional characters that can offer advice, similar to the cartoon trope of a devil and angel on each shoulder, but more nuanced. He argues these can be genuinely useful for improving decision making and offers tips on developing and using shoulder advisors effectively.
Karen Pryor's "Don't Shoot the Dog" applies behavioral psychology to training animals and people. Julia reads it as a parenting book, and shares key insights about reinforcing good behavior, avoiding accidentally rewarding bad behavior, and why clicker training works so well.
Nuclear power once seemed to be the energy of the future, but has failed to live up to that promise. Why? Jason Crawford summarizes Jack Devanney's book "Why Nuclear Power Has Been a Flop", which blames overregulation driven by unrealistic radiation safety models.