The original draft of Ayeja's report on biological anchors for AI timelines. The report includes quantitative models and forecasts, though the specific numbers were still in flux at the time. Ajeya cautions against wide sharing of specific conclusions, as they don't yet reflect Open Philanthropy's official stance.
A collection of 11 different proposals for building safe advanced AI under the current machine learning paradigm. There's a lot of literature out there laying out various different approaches, but a lot of that literature focuses primarily on outer alignment at the expense of inner alignment and doesn't provide direct comparisons between approaches.
As resources become abundant, the bottleneck shifts to other resources. Power or money are no longer the limiting factors past a certain point; knowledge becomes the bottleneck. Knowledge can't be reliably bought, and acquiring it is difficult. Therefore, investments in knowledge (e.g. understanding systems at a gears-level) become the most valuable investments.
How much COVID risk do you take when you go to the grocery store? When you see a friend outdoors? This calculator helps you estimate your risk from common activities in microcovids - units of 1-in-a-million chance of getting COVID.
What if we don't need to solve AI alignment? What if AI systems will just naturally learn human values as they get more capable? John Wentworth explores this possibility, giving it about a 10% chance of working. The key idea is that human values may be a "natural abstraction" that powerful AI systems learn by default.
The Solomonoff prior is a mathematical formalization of Occam's razor. It's intended to provide a way to assign probabilities to observations based on their simplicity. However, the simplest programs that predict observations well might be universes containing intelligent agents trying to influence the predictions. This makes the Solomonoff prior "malign" - its predictions are influenced by the preferences of simulated beings.
In early 2020, COVID-19 was spreading rapidly, but many people seem hesitant to take precautions or prepare. Jacob Falkovich explores why people often wait for social permission before reacting to potential threats, even when the evidence is clear. He argues we should be willing to act on our own judgment rather than waiting for others.
Pain is often treated as a measure of effort. "No pain, no gain". But this attitude can be toxic and counterproductive. alkjash argues that if something hurts, you're probably doing it wrong, and that you're not trying your best if you're not happy.
An optimizing system is a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system.
Zvi explores the four "simulacra levels" of communication and action, using the COVID-19 pandemic as an example: 1) Literal truth. 2) Trying to influence behavior 3) Signaling group membership, and 4) Pure power games. He examines how these levels interact and different strategies people use across them.
Money can buy a lot of things, but it can't buy expertise. In fields where performance is hard to judge, simply throwing money at the problem won't guarantee good results – it's too easy to be fooled. Even kings and governments can't necessarily buy their way to the best solutions.
Richard Ngo lays out the core argument for why AGI could be an existential threat: we might build AIs that are much smarter than humans, that act autonomously to pursue large-scale goals, whose goals conflict with ours, leading them to take control of humanity's future. He aims to defend this argument in detail from first principles.
Human values are functions of latent variables in our minds. But those variables may not correspond to anything in the real world. How can an AI optimize for our values if it doesn't know what our mental variables are "pointing to" in reality? This is the Pointers Problem - a key conceptual barrier to AI alignment.
Many of the most profitable jobs and companies are primarily about solving coordination problems. This suggests "coordination problems" are an unusually tight bottleneck for productive economic activity. John explores implications of looking at the world through this lens.
AI researcher Paul Christiano discusses the problem of "inaccessible information" - information that AI systems might know but that we can't easily access or verify. He argues this could be a key obstacle in AI alignment, as AIs may be able to use inaccessible knowledge to pursue goals that conflict with human interests.
In the span of a few years, some minor European explorers (later known as the conquistadors) encountered, conquered, and enslaved several huge regions of the world. Daniel Kokotajlo argues this shows the plausibility of a small AI system rapidly taking over the world, even without overwhelming technological superiority.
Steve Byrnes lays out his 7 guiding principles for understanding how the brain works computationally. He argues the neocortex uses a single general learning algorithm that starts as a blank slate, while the subcortex contains hard-coded instincts and steers the neocortex toward biologically adaptive behaviors.
Inner alignment refers to the problem of aligning a machine learning model's internal goals (mesa-objective) with the intended goals we are optimizing for externally (base objective). Even if we specify the right base objective, the model may develop its own misaligned mesa-objective through the training process. This poses challenges for AI safety.
GDP isn't a great metric for AI timelines or takeoff speed because the relevant events (like AI alignment failure or progress towards self-improving AI) could happen before GDP growth accelerates visibly. Instead, we should focus on things like warning shots, heterogeneity of AI systems, risk awareness, multipolarity, and overall "craziness" of the world.
Aging, which kills 100,000 people per day, may be solvable. Here's a summary of the most promising anti-aging research, including parabiosis, metabolic manipulation, senolytics, and cellular reprogramming.
The structure of things-humans-want does not always match the structure of the real world, or the structure of how-other-humans-see-the-world. When structures don't match, someone or something needs to serve as an interface, translating between the two. Interfaces between complex systems and human desires are often a scarce resource.
Most Prisoner's Dilemmas are actually Stag Hunts in the iterated game, and most Stag Hunts are actually "Schelling games." You have to coordinate on a good equilibrium, but there are many good equilibria to choose from, which benefit different people to different degrees. This complicates the problem of cooperating.
Abram argues against assuming that rational agents have utility functions over worlds (which he calls the "reductive utility" view). Instead, he points out that you can have a perfectly valid decision theory where agents just have preferences over events, without having to assume there's some underlying utility function over worlds.
Success is supposed to open doors and broaden horizons. But often it can do the opposite - trapping people in narrow specialties or roles they've outgrown. This post explores how success can sometimes be the enemy of personal freedom and growth, and how to maintain flexibility as you become more successful.
Vanessa and diffractor introduce a new approach to epistemology / decision theory / reinforcement learning theory called Infra-Bayesianism, which aims to solve issues with prior misspecification and non-realizability that plague traditional Bayesianism.
Dogmatic probabilism is the theory that all rational belief updates should be Bayesian updates. Radical probabilism is a more flexible theory which allows agents to radically change their beliefs, while still obeying some constraints. Abram examines how radical probabilism differs from dogmatic probabilism, and what implications the theory has for rational agents.
Crawford looks back on past celebrations of achievements like the US transcontinental railroad, the Brooklyn Bridge, electric lighting, the polio vaccine, and the Moon landing. He then asks: Why haven't we celebrated any major achievements lately? He explores some hypotheses for this change.
Andrew Critch lists several research areas that are seem important to AI existential safety, and evaluates them for direct helpfulness, educational value, and neglect. Along the way, he argues that the main way he sees present-day technical research helping is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise later.
How is it that we solve engineering problems? What is the nature of the design process that humans follow when building an air conditioner or computer program? How does this differ from the search processes present in machine learning and evolution?This essay studies search and design as distinct approaches to engineering, arguing that establishing trust in an artifact is tied to understanding how that artifact works, and that a central difference between search and design is the comprehensibility of the artifacts produced.
People often ask "Can you keep this confidential?" without really checking if the person has the skills to do so. Raemon argues we need to be more careful about how we handle confidential informationm, and have explicit conversations about privacy practices.
AI Impacts investigated dozens of technological trends, looking for examples of discontinuous progress (where more than a century of progress happened at once). They found ten robust cases, such as the first nuclear weapons, and the Great Eastern steamship.
They hope the data can inform expectations about discontinuities in AI development.
The path to explicit reason is fraught with challenges. People often don't want to use explicit reason, and when they try to use it, they fail. Even if they succeed, they're punished socially. The post explores various obstacles on this path, including social pressure, strange memeplexes, and the "valley of bad rationality".
The neocortex has been hypothesized to be uniformly composed of general-purpose data-processing modules. What does the currently available evidence suggest about this hypothesis? Alex Zhu explores various pieces of evidence, including deep learning neural networks and predictive coding theories of brain function. [tweet]
You've probably heard the advice "to be a good listener, reflect back what people tell you." Ben Kuhn argues this is cargo cult advice that misses the point. The real key to good listening is intense curiosity about the details of the other person's situation.
A counterintuitive concept: Sometimes people choose the worse option, to signal their loyalty or values in situations where that loyalty might be in question. Zvi explores this idea of "motive ambiguity" and how it can lead to perverse incentives.
The felt sense is a concept coined by psychologist Eugene Gendlin to describe a kind of a kind of pre-linguistic, physical sensation that represents some mental content. Kaj gives examples of felt senses, explains why they're useful to pay attention to, and gives tips on how to notice and work with them.
If you know nothing about a thing, the first example or sample gives you a disproportionate amount of information, often more than any subsequent sample. It lets you locate the idea in conceptspace, get a sense of what domain/scale/magnitude you're dealing with, and provides an anchor for further thinking.
You've probably heard that a nuclear war between major powers would cause human extinction. This post argues that while nuclear war would be incredibly destructive, it's unlikely to actually cause human extinction. The main risks come from potential climate effects, but even in severe scenarios some human populations would likely survive.
All sorts of everyday practices in the legal system, medicine, software, and other areas of life involve stating things that aren't true. But calling these practices "lies" or "fraud" seems to be perceived as an attack rather than a straightforward description. This makes it difficult to discuss and analyze these practices without provoking emotional defensiveness.
The Swiss political system is known for its extensive use of direct democracy. This post dives deep into how that system works, exploring the different types of referenda, their history, impacts, and quirks. It's a detailed look at a unique political system that has managed to largely avoid polarization.
Under conditions of perfectly intense competition, evolution works like water flowing down a hill – it can never go up even the tiniest elevation. But if there is slack in the selection process, it's possible for evolution to escape local minima. "How much slack is optimal" is an interesting question, Scott explores in various contexts.
John examines the problem of "how to transport things?" through the lens of "what's the taut constraint on the system?" He asks questions across history, from "how could Alexander the Great's army cross 150 miles of desert?", to how modern supply chains work, to what would happen in a future world with teleportation.
Instead, it's the point of no return—the day we AI risk reducers lose the ability to significantly reduce AI risk. This might happen years before classic milestones like "World GWP doubles in four years" and "Superhuman AGI is deployed."
Eliezer Yudkowsky recently criticized the OpenPhil draft report on AI timelines. Holden Karnofsky thinks Eliezer misunderstood the report in important ways, and defends the report's usefulness as a tool for informing (not determining) AI timelines.
The practice of extrapolating AI timelines based on biological analogies has a long history of not working. Eliezer argues that this is because the resource gets consumed differently, so base-rate arguments from resource consumption end up quite unhelpful in real life.
Timelines are inherently very difficult to predict accurately, until we are much closer to AGI.