The Best of LessWrong

LESSWRONG
LW

The Best of LessWrong — LessWrong

21DirectedEvolution

The central point of this article was that conformism was causing society to treat COVID-19 with insufficient alarm. Its goal was to give its readership social sanction and motivation to change that pattern. One of its sub-arguments was that the media was succumbing to conformity. This claim came with an implication that this post was ahead of the curve, and that it was indicative of a pattern of success among rationalists in achieving real benefits, both altruistically (in motivating positive social change) and selfishly (in finding alpha). I thought it would be useful to review 2020 COVID-19 media coverage through the month of February, up through Feb. 27th, which is when this post was published on Putanumonit. I also want to take a look at the stock market crash relative to the publication of this article. Let's start with the stock market. The S&P500 fell about 13% from its peak on Feb. 9th to the week of Feb. 23rd-Mar. 1st, which is when this article was published. Jacob sold 10% of his stocks on Feb. 17th, which was still very early in the crash. The S&P500 went on to fall a total of 32% from that Feb. 9th peak until it bottomed out on Mar. 15th. At least some gains would be made if stocks had been repurchased in the 5 months between Feb. 17th and early August 2020. I don't know how much profit Jacob realized, presuming he eventually reinvested. But this looks to me like a convincing story of Jacob finding alpha in an inefficient market, rather than stumbling into profits by accident. He didn't do it via insider knowledge or obsessive interest in some weird corner of the financial system. He did it by thinking about the basic facts of a situation that had the attention of the entire world, and being right where almost everybody else was making the wrong bet. Let's focus on the media. The top US newspapers by circulation and with a national primary service area are USA Today, the Wall Street Journal, and the New York Times. I'm going to focus on coverage in

11Yoav Ravid

I remember this post very fondly. I often thought back to it and it inspired some thoughts of my own about rationality (which I had trouble writing down and are waiting in a draft to be written fully some day). I haven't used any of the phrases introduced here (Underperformance Swamp, Sinkholes of Sneer, Valley of Disintegration...), and I'm not sure whether it was the intention. The post starts with the claim that rationalists "basically got everything about COVID-19 right and did so months ahead of the majority of government officials, journalists, and supposed experts". Since it's not the point of the post I won't review this claim in depth, but it seems basically true to me. Elizabeth's review here gives a few examples. This post is about the difficulty and even danger in becoming a rationalist, or more generally, in using explicit reasoning (Intuition and Social Cognition being the alternatives). The first difficulty is that explicit reasoning alone often fails to outperform intuition and social cognition where those perform well. I think this is true, and as the rationality community evolved it came to appreciate intuition and social cognition more, without devaluing explicit reason. The second is persevering through the sneer and social pressure that comes from trying to use explicit reason to do things, often coming to very different approaches from other people, and often also failing. The third is navigating the strange status hierarchy in the community, which mostly doesn't depend on regular things like attractiveness and more often on our ability to apply explicit reason effectively, as well as being scared by strange memes like AI risk and cryonics. I don't know to what extent the first part is true in the physical communities, but it definitely is in the virtual community. The fourth is where the danger comes in. When you're in the Valley of Bad Rationality your life can get worse, and if you don't get out of it some way it might stay worse. So

12Daniel Kokotajlo

Ajeya's timelines report is the best thing that's ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart: 1. Have you read Ajeya's report? --If yes, launch into a conversation about the distribution over 2020's training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI. --If no, launch into a conversation about Ajeya's framework and why it's the best and why all discussion of AI timelines should begin there. So, why do I think it's the best? Well, there's a lot to say on the subject, but, in a nutshell: Ajeya's framework is to AI forecasting what actual climate models are to climate change forecasting (by contrast with lower-tier methods such as "Just look at the time series of temperature over time / AI performance over time and extrapolate" and "Make a list of factors that might push the temperature up or down in the future / make AI progress harder or easier," and of course the classic "poll a bunch of people with vaguely related credentials." There's something else which is harder to convey... I want to say Ajeya's model doesn't actually assume anything, or maybe it makes only a few very plausible assumptions. This is underappreciated, I think. People will say e.g. "I think data is the bottleneck, not compute." But Ajeya's model doesn't assume otherwise! If you think data is the bottleneck, then the model is more difficult for you to use and will give more boring outputs, but you can still use it. (Concretely, you'd have 2020's training compute requirements distribution with lots of probability mass way to the right, and then rather than say the distribution shifts to the left at a rate of about one OOM a decade, you'd input whatever trend you think characterizes the likely improvements in data gathering.) The upsho

12Daniel Kokotajlo

This is one of those posts, like "pain is not the unit of effort," that combines a memorable and informative and very useful and important slogan with a bunch of argumentation and examples to back up that slogan. I think this type of post is great for the LW review. When I first read this post, I thought it was boring and unimportant: trivially, there will be some circumstances where knowledge is the bottleneck, because for pretty much all X there will be some circumstances where X is the bottleneck. However, since then I've ended up saying the slogan "when money is abundant, knowledge is the real wealth" probably about a dozen separate times when explaining my career decisions, arguing with others at CLR about what our strategy should be, and even when deliberating to myself about what to do next. I guess longtermist EAs right now do have a surplus of money and a shortage of knowledge (relative to how much knowledge is needed to solve the problems we are trying to solve...) so in retrospect it's not surprising that this slogan was practically applicable to my life so often. I do think there are ways the post could be expanded and improved. Come to think of it, I'll make a mini-comment right here to gesture at the stuff I would add to it if I could: 1. List of other ideas for how to invest in knowledge. For example, building a community with good epistemic norms. Or paying a bunch of people to collect data / info about various world developments and report on them to you. Or paying a bunch of people to write textbooks and summaries and explainer videos and make diagrams illustrating cutting-edge knowledge (yours and others'). 2. Arguments that in fact, right now, longtermist EAs and/or AI-risk-reducers are bottlenecked on knowledge (rather than money, or power/status) --My own experience doing cost-benefit analyses is that interventions/plans vary in EV by OOMs and that it's common to find new considerations or updated models that flip the sign entirely, or ad

15Steven Byrnes

I’ll set aside what happens “by default” and focus on the interesting technical question of whether this post is describing a possible straightforward-ish path to aligned superintelligent AGI. The background idea is “natural abstractions”. This is basically a claim that, when you use an unsupervised world-model-building learning algorithm, its latent space tends to systematically learn some patterns rather than others. Different learning algorithms will converge on similar learned patterns, because those learned patterns are a property of the world, not an idiosyncrasy of the learning algorithm. For example: Both human brains and ConvNets seem to have a “tree” abstraction; neither human brains nor ConvNets seem to have a “head or thumb but not any other body part” concept. I kind of agree with this. I would say that the patterns are a joint property of the world and an inductive bias. I think the relevant inductive biases in this case are something like: (1) “patterns tend to recur”, (2) “patterns tend to be localized in space and time”, and (3) “patterns are frequently composed of multiple other patterns, which are near to each other in space and/or time”, and maybe other things. The human brain definitely is wired up to find patterns with those properties, and ConvNets to a lesser extent. These inductive biases are evidently very useful, and I find it very likely that future learning algorithms will share those biases, even more than today’s learning algorithms. So I’m basically on board with the idea that there may be plenty of overlap between the world-models of various different unsupervised world-model-building learning algorithms, one of which is the brain. (I would also add that I would expect “natural abstractions” to be a matter of degree, not binary. We can, after all, form the concept “head or thumb but not any other body part”. It would just be extremely low on the list of things that would pop into our head when trying to make sense of something we’

52Vanessa Kosoy

This post is a review of Paul Christiano's argument that the Solomonoff prior is malign, along with a discussion of several counterarguments and countercounterarguments. As such, I think it is a valuable resource for researchers who want to learn about the problem. I will not attempt to distill the contents: the post is already a distillation, and does a a fairly good job of it. Instead, I will focus on what I believe is the post's main weakness/oversight. Specifically, the author seems to think the Solomonoff prior is, in some way, a distorted model of reasoning, and that the attack vector in question can attributed to this, at least partially. This is evident in phrases such as "unintuitive notion of simplicity" and "the Solomonoff prior is very strange". This is also why the author thinks the speed prior might help and that "since it is difficult to compute the Solomonoff prior, [the attack vector] might not be relevant in the real world". In contrast, I believe that the attack vector is quite robust and will threaten any sufficiently powerful AI as long as it's cartesian (more on "cartesian" later). Formally analyzing this question is made difficult by the essential role of non-realizability. That is, the attack vector arises from the AI reasoning about "possible universes" and "simulation hypotheses" which are clearly phenomena that are computationally infeasible for the AI to simulate precisely. Invoking Solomonoff induction dodges this issue since Solomonoff induction is computationally unbounded, at the cost of creating the illusion that the conclusions are a symptom of using Solomonoff induction (and, it's still unclear how to deal with the fact Solomonoff induction itself cannot exist in the universes that Solomonoff induction can learn). Instead, we should be using models that treat non-realizability fairly, such as infra-Bayesiansim. However, I will make no attempt to present such a formal analysis in this review. Instead, I will rely on painting an in

23johnswentworth

This post is an excellent distillation of a cluster of past work on maligness of Solomonoff Induction, which has become a foundational argument/model for inner agency and malign models more generally. I've long thought that the maligness argument overlooks some major counterarguments, but I never got around to writing them up. Now that this post is up for the 2020 review, seems like a good time to walk through them. In Solomonoff Model, Sufficiently Large Data Rules Out Malignness There is a major outside-view reason to expect that the Solomonoff-is-malign argument must be doing something fishy: Solomonoff Induction (SI) comes with performance guarantees. In the limit of large data, SI performs as well as the best-predicting program, in every computably-generated world. The post mentions that: ... but in the large-data limit, SI's guarantees are stronger than just that. In the large-data limit, there is no computable way of making better predictions than the Solomonoff prior in any world. Thus, agents that are influencing the Solomonoff prior cannot gain long-term influence in any computable world; they have zero degrees of freedom to use for influence. It does not matter if they specialize in influencing worlds in which they have short strings; they still cannot use any degrees of freedom for influence without losing all their influence in the large-data limit. Takeaway of this argument: as long as we throw enough data at our Solomonoff inductor before asking it for any outputs, the malign agent problem must go away. (Though note that we never know exactly how much data that is; all we have is a big-O argument with an uncomputable constant.) ... but then how the hell does this outside-view argument jive with all the inside-view arguments about malign agents in the prior? Reflection Breaks The Large-Data Guarantees There's an important gotcha in those guarantees: in the limit of large data, SI performs as well as the best-predicting program, in every compu

34Vanessa Kosoy

In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications. The key paragraph, which summarizes the definition itself, is the following: In fact, "continues to exhibit this tendency with respect to the same target configuration set despite perturbations" is redundant: clearly as long as the perturbation doesn't push the system out of the basin, the tendency must continue. This is what is known as "attractor" in dynamical systems theory. For comparison, here is the definition of "attractor" from the Wikipedia: The author acknowledges this connection, although he also makes the following remark: I find this remark confusing. An attractor that operates along a subset of the dimension is just an attractor submanifold. This is completely standard in dynamical systems theory. Given that the definition itself is not especially novel, the post's main claim to value is via the applications. Unfortunately, some of the proposed applications seem to me poorly justified. Specifically, I want to talk about two major examples: the claimed relationship to embedded agency and the claimed relations to comprehensive AI services. In both cases, the main shortcoming of the definition is that there is an essential property of AI that this definition doesn't capture at all. The author does acknowledge that "goal-directed agent system" is a distinct concept from "optimizing systems". However, he doesn't explain how are they distinct. One way to formulate the difference is as follows: agency = optimization + learning. An agent is not just capable of steering a particular universe towards a certain outc

12Raemon

This is the post that first spelled out how Simulacra levels worked in a way that seemed fully comprehensive, which I understood. I really like the different archetypes (i.e. Oracle, Trickster, Sage, Lawyer, etc). They showcased how the different levels blend together, while still having distinct properties that made sense to reason about separately. Each archetype felt very natural to me, like I could imagine people operating in that way. The description Level 4 here still feels a bit inarticulate/confused. This post is mostly compatible with the 2x2 grid version, but it makes the additional claim that Level 4 don't know how to make plans, and are 'particularly hard to grok.' It bundles in some worldview from Immoral Mazes / Raoian Sociopaths. For me, a big outstanding question re: Simulacra is "does it actually make sense to bundle the Kafkaesque sociopath who can't make plans as an explicit part of Level 4?" I think this is a kinda empirical question. An example of the sort of evidence that'd persuade me are "among politicians or middle managers who spend most of their time optimizing for power, interacting with facts and tribal affiliations as a game, what proportion of them actually lose their ability to make plans, or otherwise become more... lovecraftian or whatever?" Is it more like "70%", "50%", "10%"?. It's plausible to me that there's a relatively small number of actors who stand out as particularly extreme (and then get focused on for toxoplasma of rage reasons) Or, rather: if I simply describe Primarily Level 4 people as "holding social-signaling as object", am I actually missing anything? Do they tend to have any attributes? What? ... I do this post is among the best intro to the Simulacra Levels concept, and think it's worth polishing up slightly. I assume Zvi has thought a bit more about Level 4 by now. If it still seems like there's something Importantly, Confusingly Up With Them, I'm hoping that can be spelled out a bit more. (I think my fav

12Raemon

I haven't had time to reread this sequence in depth, but I wanted to at least touch on how I'd evaluate it. It seems to be aiming to be both a good introductory sequence, while being a "complete and compelling case I can for why the development of AGI might pose an existential threat". The question is who is this sequence for, what is it's goal, and how does it compare to other writing targeting similar demographics. Some writing that comes to mind to compare/contrast it with includes: * Scott Alexander's Superintelligence FAQ. This is the post I've found most helpful for convincing people (including myself), that yes, AI is just actually a big deal and an extinction risk. It's 8000 words. It's written fairly entertainingly. What I find particularly compelling here are a bunch of factual statements about recent AI advances that I hadn't known about at the time. * Tim Urban's Road To Superintelligence series. This is even more optimized for entertainingness. I recall it being a bit more handwavy and making some claims that were either objectionable, or at least felt more objectionable. It's 22,000 words. * Alex Flint's AI Risk for Epistemic Minimalists. This goes in a pretty different direction – not entertaining, and not really comprehensive either . It came to mind because it's doing a sort-of-similar thing of "remove as many prerequisites or assumptions as possible". (I'm not actually sure it's that helpful, the specific assumptions it's avoiding making don't feel like issues I expect to come up for most people, and then it doesn't make a very strong claim about what to do) (I recall Scott Alexander once trying to run a pseudo-study where he had people read a randomized intro post on AI alignment, I think including his own Superintelligence FAQ and Tim Urban's posts among others, and see how it changed people's minds. I vaguely recall it didn't find that big a difference between them. I'd be curious how this compared) At a glance, AGI Safety From First P

15Vanessa Kosoy

This post states a subproblem of AI alignment which the author calls "the pointers problem". The user is regarded as an expected utility maximizer, operating according to causal decision theory. Importantly, the utility function depends on latent (unobserved) variables in the causal network. The AI operates according to a different, superior, model of the world. The problem is then, how do we translate the utility function from the user's model to the AI's model? This is very similar to the "ontological crisis" problem described by De Blanc, only De Blanc uses POMDPs instead of causal networks, and frames it in terms of a single agent changing their ontology, rather than translation from user to AI. The question the author asks here is important, but not that novel (the author himself cites Demski as prior work). Perhaps the use of causal networks is a better angle, but this post doesn't do much to show it. Even so, having another exposition of an important topic, with different points of emphasis, will probably benefit many readers. The primary aspect missing from the discussion in the post, in my opinion, is the nature of the user as a learning agent. The user doesn't have a fixed world-model: or, if they do, then this model is best seen as a prior. This observation hints at the resolution of the apparent paradox wherein the utility function is defined in terms of a wrong model. But it still requires us to explain how the utility is defined s.t. it is applicable to every hypothesis in the prior. (What follows is no longer a "review" per se, inasmuch as a summary of my own thoughts on the topic.) Here is a formal model of how a utility function for learning agents can work, when it depends on latent variables. Fix A a set of actions and O a set of observations. We start with an ontological model which is a crisp infra-POMPD. That is, there is a set of states Sont, an initial state s0ont∈Sont, a transition infra-kernel Tont:Sont×A→□(Sont×O) and a reward functio

12johnswentworth

Why This Post Is Interesting This post takes a previously-very-conceptually-difficult alignment problem, and shows that we can model this problem in a straightforward and fairly general way, just using good ol' Bayesian utility maximizers. The formalization makes the Pointers Problem mathematically legible: it's clear what the problem is, it's clear why the problem is important and hard for alignment, and that clarity is not just conceptual but mathematically precise. Unfortunately, mathematical legibility is not the same as accessibility; the post does have a wide inductive gap. Warning: Inductive Gap This post builds on top of two important pieces for modelling embedded agents which don't have their own posts (to my knowledge). The pieces are: * Lazy world models * Lazy utility functions (or value functions more generally) In hindsight, I probably should have written up separate posts on them; they seem obvious once they click, but they were definitely not obvious beforehand. Lazy World Models One of the core conceptual difficulties of embedded agency is that agents need to reason about worlds which are bigger than themselves. They're embedded in the world, therefore the world must be as big as the entire agent plus whatever environment the world includes outside of the agent. If the agent has a model of the world, the physical memory storing that model must itself fit inside of the world. The data structure containing the world model must represent a world larger than the storage space the data structure takes up. That sounds tricky at first, but if you've done some functional programming before, then data structures like this actually pretty run-of-the-mill. For instance, we can easily make infinite lists which take up finite memory. The trick is to write a generator for the list, and then evaluate it lazily - i.e. only query for list elements which we actually need, and never actually iterate over the whole thing. In the same way, we can represent

15DirectedEvolution

If coordination services command high wages, as John predicts, this suggests that demand is high and supply is limited. Here are some reasons why this might be true: 1. Coordination solutions scale linearly (because the problem is a general one) or exponentially (due to networking effects). 2. Coordination is difficult, unpleasant, risky work. 3. Coordination relies on further resources that are themselves in limited supply or on information that has a short life expectancy, such as involved personal relationships, technical knowhow that depends on a lot of implicit knowledge, familiarity with language and culture, access to user bases and communities, access to restricted communication channels and information, trust, credentials, charisma, money, land, or legal privileges. 4. Coordination is most intensively needed in innovative, infrastructure-development work, which is a high-risk area with long-term payoffs. 5. Coordination is neglected due to systematic biases on an individual and/or institutional level. Perhaps coordination is easy to learn, but is difficult to train in an educational context, and as such is frequently neglected by the educational system. Students are therefore mis-incentivized and don’t engage in developing their coordination skills to anywhere near the possible and optimal level. Alternatively, it might be that we teach coordination in the context of centrally coordination-focused careers (MBAs, for example), but that many other careers less obviously centrally focused on coordination (bench scientists) would also benefit - a problem of interdisciplinary neglect. Note that, if the argument in my review of interfaces as scarce resources is correct, then coordination can also be viewed as a subtype of interface - a way of translating between what a user wants and how they express that desire, into the internal language or structure of a complex system. This makes sense. Google translates natural-language queries into the PageRank algo

11Daniel Kokotajlo

(I am the author) I still like & endorse this post. When I wrote it, I hadn't read more than the wiki articles on the subject. But then afterwards I went and read 3 books (written by historians) about it, and I think the original post held up very well to all this new info. In particular, the main critique the post got -- that disease was more important than I made it sound, in a way that undermined my conclusion -- seems to have been pretty wrong. (See e.g. this comment thread, these follow up posts) So, why does it matter? What contribution did this post make? Well, at the time -- and still now, though I think I've made a dent in the discourse -- quite a lot of people I respect (such as people at OpenPhil) seemed to think unaligned AGI would need god-like powers to be able to take over the world -- it would need to be stronger than the rest of the world combined! I think this is based on a flawed model of how takeover/conquest works, and history contains plenty of counterexamples to the model. The conquistadors are my favorite counterexample from my limited knowledge of history. (The flawed model goes by the name of "The China Argument," at least in my mind. You may have heard the argument before -- China is way more capable than the most capable human, yet it can't take over the world; therefore AGI will need to be way way more capable than the most powerful human to take over the world.) Needless to say, this is a somewhat important crux, as illustrated by e.g. Joe Carlsmith's report, which assigns a mere 40% credence to unaligned APS-AI taking over the world even conditional on it escaping and seeking power and managing to cause at least a trillion dollars worth of damage. (I've also gotten feedback from various people at OpenPhil saying that this post was helpful to them, so yay!) I've since written a sequence of posts elaborating on this idea: Takeoff and Takeover in the Past and Future. Alas, I still haven't written the capstone posts in the sequence, t

11Steven Byrnes

I wrote this relatively early in my journey of self-studying neuroscience. Rereading this now, I guess I'm only slightly embarrassed to have my name associated with it, which isn’t as bad as I expected going in. Some shifts I’ve made since writing it (some of which are already flagged in the text): * New terminology part 1: Instead of “blank slate” I now say “learning-from-scratch”, as defined and discussed here. * New terminology part 2: “neocortex vs subcortex” → “learning subsystem vs steering subsystem”, with the former including the whole telencephalon and cerebellum, and the latter including the hypothalamus and brainstem. I distinguish them by "learning-from-scratch vs not-learning-from-scratch". See here. * Speaking of which, I now put much more emphasis on "learning-from-scratch" over "cortical uniformity" when talking about the neocortex etc.—I care about learning-from-scratch more, I talk about it more, etc. I see the learning-from-scratch hypothesis as absolutely central to a big picture of the brain (and AGI safety!), whereas cortical uniformity is much less so. I do still think cortical uniformity is correct (at least in the weak sense that someone with a complete understanding of one part of the cortex would be well on their way to a complete understanding of any other part of the cortex), for what it’s worth. * I would probably drop the mention of “planning by probabilistic inference”. Well, I guess something kinda like planning by probabilistic inference is part of the story, but generally I see the brain thing as mostly different. * Come to think of it, every time the word “reward” shows up in this post, it’s safe to assume I described it wrong in at least some respect. * The diagram with neocortex and subcortex is misleading for various reasons, see notes added to the text nearby. * I’m not sure I was using the term “analysis-by-synthesis” correctly. I think that term is kinda specific to sound processing. And the vision analog is “vision

16DirectedEvolution

What this post does for me is that it encourages me to view products and services not as physical facts of our world, as things that happen to exist, but as the outcomes of an active creative process that is still ongoing and open to our participation. It reminds us that everything we might want to do is hard, and that the work of making that task less hard is valuable. Otherwise, we are liable to make the mistake of taking functionality and expertise for granted. What is not an interface? That's the slipperiest aspect of this post. A programming language is an interface to machine code, a programmer to the language, a company to the programmer, a liaison to the company, a department to the liaison, a chain of command to the department, a stock to the chain of command, an index fund to the stock, an app to the index fund. Matter itself is an interface. An iron bar is an interface to iron. An aliquot is an interface to a chemical. A fruit is an interface, translating between the structure of a chloroplast and the structure of things-animals-can-eat. A janitor is an interface to brooms and buckets, the layout of the building, and other considerations bearing on the task of cleaning. We have lots of words in this concept-cluster: tools, products, goods and services, control systems, and now "interfaces." "As a scarce resource," suggests that there are resources that are not interfaces. After all, the implied value prop of this post is that it's suggesting a high-value area for economic activity. But if all economic activity is interface design, then a more accurate title is "Scarce Resources as Interfaces," or "Goods Are Hard To Make And Services Are Hard To Do." The value I get out of this post is that it shifts my thinking about a tool or service away from the mechanism, and toward the value prop. It's also a useful reminder for an early-career professional that their value prop is making a complex system easier to use for somebody else, rather than ticking the bo

24Bucky

A short note to start the review that the author isn’t happy with how it is communicated. I agree it could be clearer and this is the reason I’m scoring this 4 instead of 9. The actual content seems very useful to me. AllAmericanBreakfast has already reviewed this from a theoretical point of view but I wanted to look at it from a practical standpoint. *** To test whether the conclusions of this post were true in practice I decided to take 5 examples from the Wikipedia page on the Prisoner’s dilemma and see if they were better modeled by Stag Hunt or Schelling Pub: * Climate negotiations * Relationships * Marketing * Doping in sport * Cold war nuclear arms race Detailed analysis of each is at the bottom of the review. Of these 5, 3 (Climate, Relationships, Arms race) seem to me to be very well modeled by Schelling Pub. Due to the constraints on communication allowed between rival companies it is difficult to see marketing (where more advertising = defect) as a Schelling Pub game. There probably is an underlying structure which looks a bit like Schelling Pub but it is very hard to move between Nash Equilibria. As a result I would say that Prisoner’s Dilemma is a more natural model for marketing. The choice of whether to dope in sport is probably best modeled as a Prisoner’s dilemma with an enforcing authority which punishes defection. As a result, I don’t think any of the 3 games are a particularly good model for any individual’s choice. However, negotiations on setting up the enforcing authority and the rules under which it operates are more like Schelling Pub. Originally I thought this should maybe count as half a point for the post but thinking about it further I would say this is actually a very strong example of what the post is talking about – if your individual choice looks like a Prisoner’s Dilemma then look for ways to make it into a Schelling Pub. If this involves setting up a central enforcement agency then negotiate to make that happen. So I

17DirectedEvolution

The goal of this post is to help us understand the similarities and differences between several different games, and to improve our intuitions about which game is the right default assumption when modeling real-world outcomes. My main objective with this review is to check the game theoretic claims, identify the points at which this post makes empirical assertions, and see if there are any worrisome oversights or gaps. Most of my fact-checking will just be resorting to Wikipedia. Let’s start with definitions of two key concepts. Pareto-optimal: One dimension cannot improve without a second worsening. Nash equilibrium: No player can do better by unilaterally changing their strategy. Here’s the payoff matrix from the one-shot Prisoner’s Dilemma and how it relates to these key concepts. B stays silentB betraysA stays silentPareto-optimal A betrays Nash equilibrium This article outlines three possible relationships between Pareto-optimality and Nash equilibrium. 1. There are no Pareto-optimal Nash equilibria. 2. There is a single Pareto-optimal Nash equilibrium, and another equilibrium that is not Pareto-optimal. 3. There are multiple Pareto-optimal Nash equilibria, which benefit different players to different extents. The author attempts to argue which of these arrangements best describes the world we live in, and makes the best default assumption when interpreting real-world situations as games. The claim is that real-world situations most often resemble iterated PDs, which have multiple Pareto-optimal Nash equilibria benefitting different players to different extents. I will attempt to show that the author’s conclusion only applies when modeling superrational entities, or entities with an unbounded lifespan, and give some examples where this might be relevant. Iterated Prisoner’s Dilemma is a little more complex than the author states. If the players know how many turns the game will be played for, or if the game has a known upper limit of t

25Vanessa Kosoy

In this post, the author presents a case for replacing expected utility theory with some other structure which has no explicit utility function, but only quantities that correspond to conditional expectations of utility. To provide motivation, the author starts from what he calls the "reductive utility view", which is the thesis he sets out to overthrow. He then identifies two problems with the view. The first problem is about the ontology in which preferences are defined. In the reductive utility view, the domain of the utility function is the set of possible universes, according to the best available understanding of physics. This is objectionable, because then the agent needs to somehow change the domain as its understanding of physics grows (the ontological crisis problem). It seems more natural to allow the agent's preferences to be specified in terms of the high-level concepts it cares about (e.g. human welfare or paperclips), not in terms of the microscopic degrees of freedom (e.g. quantum fields or strings). There are also additional complications related to the unobservability of rewards, and to "moral uncertainty". The second problem is that the reductive utility view requires the utility function to be computable. The author considers this an overly restrictive requirement, since it rules out utility functions such as in the procrastination paradox (1 is the button is ever pushed, 0 if the button is never pushed). More generally, computable utility function have to be continuous (in the sense of the topology on the space of infinite histories which is obtained from regarding it as an infinite cartesian product over time). The alternative suggested by the author is using the Jeffrey-Bolker framework. Alas, the author does not write down the precise mathematical definition of the framework, which I find frustrating. The linked article in the Stanford Encyclopedia of Philosophy is long and difficult, and I wish the post had a succinct distillation of the

10DirectedEvolution

There's a lot of attention paid these days to accommodating the personal needs of students. For example, a student with PTSD may need at least one light on in the classroom at all times. Schools are starting to create mechanisms by which a student with this need can have it met more easily. Our ability to do this depends on a lot of prior work. The mental health community had to establish PTSD as a diagnosis; the school had to create a bureaucratic mechanism to normalize accommodations of this kind; and the student had to spend a significant amount of time figuring out what accommodations alleviated their PTSD symptoms and how to get them addressed through the school's bureaucracy. This points in a direction of something like "transitions research," an attempt to identify and economically address the specific barriers that skew individuals toward immediate modest-productivity strategies and away from long-term high-productivity strategies. Imagine if there was a well-known "diagnosis" of "status-loss anxiety," in which a person who's achieved some professional success notices themselves avoiding situations that would be likely to enhance their growth, yet come with a threat of loss of status. It's like the depressed person who resists mental health unseling because it implies there's something wrong with them. Being able to identify that precise reaction, label it, raise awareness of it, and find means and messages to address it would be helpful to overcome a barrier to mental health treatment. In economics jargon, what's going on here is not so much the sunk cost fallacy as a combination of aging, opportunity cost and diminishing returns. Learning takes time, aging us, and this means we have less time to profit off a new long-term investment in skill-building. Increased skill raises the opportunity cost of learning new skills. Diminishing returns means that, if we learn a skill that increases our profit from A + B to A + 2B, that this is less intrinsically valu

17Zvi

This is a long and good post with a title and early framing advertising a shorter and better post that does not fully exist, but would be great if it did. The actual post here is something more like "CFAR and the Quest to Change Core Beliefs While Staying Sane." The basic problem is that people by default have belief systems that allow them to operate normally in everyday life, and that protect them against weird beliefs and absurd actions, especially ones that would extract a lot of resources in ways that don't clearly pay off. And they similarly protect those belief systems in order to protect that ability to operate in everyday life, and to protect their social relationships, and their ability to be happy and get out of bed and care about their friends and so on. A bunch of these defenses are anti-epistemic, or can function that way in many contexts, and stand in the way of big changes in life (change jobs, relationships, religions, friend groups, goals, etc etc). The hard problem CFAR is largely trying to solve in this telling, and that the sequences try to solve in this telling, is to disable such systems enough to allow good things, without also allowing bad things, or to find ways to cope with the subsequent bad things slash disruptions. When you free people to be shaken out of their default systems, they tend to go to various extremes that are unhealthy for them, like optimizing narrowly for one goal instead of many goals, or having trouble spending resources (including time) on themselves at all, or being in the moment and living life, And That's Terrible because it doesn't actually lead to better larger outcomes in addition to making those people worse off themselves. These are good things that need to be discussed more, but the title and introduction promise something I find even more interesting. In that taxonomy, the key difference is that there are games one can play, things one can be optimizing for or responding to, incentives one can creat

12DirectedEvolution

This post is based on the book Moral Mazes, which is a 1988 book describing "the way bureaucracy shapes moral consciousness" in US corporate managers. The central point is that it's possible to imagine relationship and organization structures in which unnecessarily destructive behavior, to self or others, is used as a costly signal of loyalty or status. Zvi titles the post after what he says these behaviors are trying to avoid, motive ambiguity. He doesn't label the dynamic itself, so I'll refer to it here as "disambiguating destruction" (DD). Before proceeding, I want to emphasize that DD is referring to truly pointless destruction for the exclusive purpose of signaling a specific motive, and not to an unavoidable tradeoff. This raises several questions, which the post doesn't answer. 1. Do pointlessly destructive behaviors typically succeed at reducing or eliminating motive ambiguity? 2. Do they do a better job of reducing motive ambiguity than alternatives? 3. How common is DD in particular types of institutions, such as relationships, cultures, businesses, and governments? 4. How do people manage to avoid feeling pressured into DD? 5. What exactly are the components of DD, so that we can know what to look for when deciding whether to enter into a certain organization or relationship? 6. Are there other explanations for the components of DD, and how would we distinguish between DD and other possible interpretations of the component behaviors? We might resort to a couple explanations for (4), the question of how to avoid DD. One is the conjunction of empathy and act utilitarianism. My girlfriend says she wouldn't want to go to a restaurant only she loves, even if the purpose was to show I love her. Part of her enjoyment is my enjoyment of the experience. If she loved the restaurant only she loves so much that she was desperate to go, then she could go with someone else. She finds the whole idea of destructive disambiguation of love to be distinctly unapp

20Bucky

The post claims: This review aims to assess whether having read the post I can conclude the same. The review is split into 3 parts: * Epistemic spot check * Examining the argument * Outside the argument Epistemic spot check Claim: There are 14,000 nuclear warheads in the world. Assessment: True Claim: Average warhead yield <1 Mt, probably closer to 100kt Assessment: Probably true, possibly misleading. Values I found were: * US * W78 warhead: 335-350kt * W87 warhead: 300 or 475 kt * Russia * R-36 missile: 550-750 kt * R29 missile: 100 or 500kt The original claim read to me that 100kT was probably pretty close and 1Mt was a big factor of safety (~x10) but whereas the safety factor was actually less than that (~x3). However that’s the advantage of having a safety factor – even if it’s a bit misleading there still is a safety factor in the calculations. I found the lack of links slightly frustrating here – it would have been nice to see where the OP got the numbers from. Examining the argument The argument itself can be summarized as: 1. Kinetic destruction can’t be big enough 2. Radiation could theoretically be enough but in practice wouldn’t be 3. Nuclear winter not sufficient to cause extinction One assumption in the arguments for 1 & 2 is that the important factor is the average warhead yield and that e.g. a 10Mt warhead doesn’t have an outsized effect. This seems likely and a comment suggests that going over 500kt doesn’t make as much difference as might be thought and that is why warheads are the size that they are. Arguments 1 & 2 seem very solid. We have done enough tests that our understanding of kinetic destruction is likely to be fairly good so I don’t have much concerns there. Similarly, radiation is well understood and dispersal patterns seem kinda predictable in principle and even if these are wrong the total amount of radiation doesn't change, just the where it is. Climate change is less easy to model, especially giv

33DirectedEvolution

The referenced study on group selection on insects is "Group selection among laboratory populations of Tribolium," from 1976. Studies on Slack claims that "They hoped the insects would evolve to naturally limit their family size in order to keep their subpopulation alive. Instead, the insects became cannibals: they ate other insects’ children so they could have more of their own without the total population going up." This makes it sound like cannibalism was the only population-limiting behavior the beetles evolved. According to the original study, however, the low-population condition (B populations) showed a range of population size-limiting strategies, including but not limited to higher cannibalism rates. "Some of the B populations enjoy a higher cannibalism rate than the controls while other B populations have a longer mean developmental time or a lower average fecundity relative to the controls. Unidirectional group selection for lower adult population size resulted in a multivarious response among the B populations because there are many ways to achieve low population size." Scott claims that group selection can't work to restrain boom-bust cycles (i.e. between foxes and rabbits) because "the fox population has no equivalent of the overarching genome; there is no set of rules that govern the behavior of every fox." But the empirical evidence of the insect study he cited shows that we do in fact see changes in developmental time and fecundity. After all, a species has considerable genetic overlap between individuals, even if we're not talking about heavily inbred family members, as we'd be seeing in the beetle study. Wikipedia's article on human genetic diversity cites a Nature article and says "as of 2015, the typical difference between an individual's genome and the reference genome was estimated at 20 million base pairs (or 0.6% of the total of 3.2 billion base pairs)." An explanation here is that the inbred beetles of the study are becoming progressiv

LESSWRONG
LW

LESSWRONG
LW

The Best of LessWrong

Rationality

Optimization

World

Practical

AI Strategy

Technical AI Safety