All of Dawn Drescher's Comments + Replies

Does this include all donors in the calculation or are there hidden donors?

Donors have a switch in their profiles where they can determine whether they want to be listed or not. The top three in the private, complete listing are Jaan Tallinn, Open Phil, and the late Future Fund, whose public grants I've imported. The total ranking lists 92 users. 

But I don't think that's core to understanding the step down. I've gone through the projects around the threshold before I posted my last comment, and I think it's really the 90% cutoff that causes it. Not a ... (read more)

TL;DR: Great question! I think it mostly means that we don't have enough data to say much about these projects. So donors who've made early donations to them, can register them and boost their project score.

  1. The donor score relies on the size of the donations and their earliness in the history of the project (plus the retroactive evaluation). So the top donors in particular have made many early, big, and sometimes public grants to projects that panned out well – hence why they are top donors.
  2. What influences the support score is not the donor score itself bu
... (read more)
1William the Kiwi
Ok so the support score is influenced non-linearly by donor score. Is there a particular donor that has donated to the highest ranked 22 projects, that did not donate to the 23 or lower ranked projects?  I have graphed donor score vs rank for the top GiveWiki donors. Does this include all donors in the calculation or are there hidden donors?

"GiveWiki" as the authority for the picker, to me, implied that this was from a broader universe of giving, and this was the AI Safety subset.

Could be… That's not so wrong either. We rather artificially limited it to AI safety for the moment to have a smaller, more sharply defined target audience. It also had the advantage that we could recruit our evaluators from our own networks. But ideally I'd like to find owners for other cause areas too and then widen the focus of GiveWiki accordingly. The other cause area where I have a relevant network is animal ri... (read more)

It says “AI Safety” later in the title. Do you think I should mention it earlier, like “The AI Safety GiveWiki's Top Picks for the Giving Season of 2023”?

2Dagon
Unsure.  It's probably reasonable to assume around here that it's all AI safety all the time.  "GiveWiki" as the authority for the picker, to me, implied that this was from a broader universe of giving, and this was the AI Safety subset.  No biggie, but I'm sad there isn't more discussion about donations to AI safety research vs more prosaic suffering-reduction in the short term.

Thanks so much for the summary! I'm wondering how this system could be bootstrapped in the industry using less powerful but current-levels-of-general AIs. Building a proof of concept using a Super Mario world is one thing, but what I would find more interesting is a version of the system that can make probabilistic safety guarantees for something like AutoGPT so that it is immediately useful and thus more likely to catch on. 

What I'm thinking of here seems to me a lot like ARC Evals with probably somewhat different processes. Humans doing tasks that s... (read more)

Hiii! You can toggle the “Show all” switch on the projects list to see all publicly listed projects. We try to only rank, and thereby effectively recommend, projects that are currently fundraising, so projects that have any sort of donation page or widget that they direct potential donors to. In some cases this is just a page that says “If you would like to support us with a donation, please get in touch.” When the project owner adds a link to such a page in the “payment URL” field, the project switches from “Not currently accepting donations” to “Acceptin... (read more)

Oh, haha! I'll try to be more concise!

Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I'm actually (I think) much more pessimistic than you seem to be. There's a risk that EV is just completely undefined even in principle and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impo... (read more)

Awww, thanks for the input!

I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.


The second is more interesting but also more speculative. The diagram here, shows both the verifier/auditor/evaluator and the standardization firms. I s... (read more)

1Joe Collman
Thanks for the lengthy response. Pre-emptive apologies for my too-lengthy response; I tried to condense it a little, but gave up! Some thoughts: First, since it may help suggest where I'm coming from: Certainly to some extent, but much less than you're imagining - I'm an initially self-funded AIS researcher who got a couple of LTFF research grants and has since been working with MATS. Most of those coming through MATS have short runways and uncertain future support for their research (quite a few are on PhD programs, but rarely AIS PhDs). Second, I get the impression that you think I'm saying [please don't do this] rather than [please do this well]. My main point throughout is that the lack of reliable feedback makes things fundamentally different, and that we shouldn't expect [great mechanism in a context with good feedback] to look the same as [great mechanism in a context without good feedback]. To be clear, when I say "lack of reliable feedback", I mean relative to what would be necessary - not relative to the best anyone can currently do. Paul Christiano's carefully analyzing each project proposal (or outcome) for two weeks wouldn't be "reliable feedback" in the sense I mean. I should clarify that I'm talking only about technical AIS research when it comes to inadequacy of feedback. For e.g. projects to increase the chance of an AI pause/moratorium, I'm much less pessimistic: I'd characterize these as [very messy, but within a context that's fairly well understood]. I'd expect the right market mechanisms to do decently well at creating incentives here, and for our evaluations to be reasonably calibrated in their inaccuracy (or at least to correct in that direction over time). Third, my concerns become much more significant as things scale - but such scenarios are where you'll get almost all of your expected impact (whether positive or negative). As long as things stay small, you're only risking missing the opportunity to do better, rather than e.g. subst

It would the producer of the public good (e.g. for my project I put up the collateral).

Oh, got it! Thanks!

Possibly? I'm not sure why you'd do that?

I thought you’d be fundraising to offer refund compensation to others to make their fundraisers more likely to succeed. But if the project developer themself put up the compensation, it’s probably also an important signal or selection effect in the game theoretic setup.

I disagree that a Refund Bonus is a security.

Yeah, courts decide that in the end. Howey Test: money: yes; common enterprise: yes; expectation of ... (read more)

1moyamo
Oh cool, that's a good idea. Then you can piggy back off existing crowdfunding platforms instead of making your own one. Do you have a link? It sounds cool. I want to check it out. I think I see your point. I agree that DACs don't solve this type of free-riding.

Wonderful that you’re working on this! I’m with AI Safety Impact Markets, and I suspect that we will need a system like this eventually. We haven’t received a lot of feedback to the effect yet, so I haven’t prioritized it, but there are at least two applications for it (for investors and (one day, speculatively) for impact buyers/retrofunders). We’re currently addressing it with a bonding curve auction of sorts, which incentivizes donors to come in early, so that they’re also not so incentivized to wait each other out. The incentive structures are differen... (read more)

1moyamo
Thanks for you comment, it's very helpful. It would the producer of the public good (e.g. for my project I put up the collateral). Possibly? I'm not sure why you'd do that? I disagree that a Refund Bonus is a security. It's a refund. To me it's when you buy something, but it comes broken, so the store gives you a voucher to make up for your troubles. I'm in South Africa but from what I can tell, if you work with US dollars and do something illegal the FBI will come after you, so I wouldn't be confident that only South African law applies. This is actually a cool idea. I don't know how I'd manage to get people's details for giving refund without co-operating with the fundraising platform, and my impression is that most platforms are hesitant to do things like this. If you know of a platform that would be keen on trying this, please tell me! I don't quite understand this point. You could work on AI Safety and donate to animal charities if you don't want to free-ride. You're right. I didn't want the title to just be "Dominant Assurance Contracts" because I assumed that most people have never heard of them and tried to come up with something more interesting, but maybe enough people on lesswrong have heard of them so I should probably be more straight forward.

Amazing work! So glad it’s finally out in the open!

2Johannes Treutlein
Thank you! :)

My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.

I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use

I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bar... (read more)

Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.

Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.

Woah, thanks! I hadn’t seen it!

No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!

I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.

But more importantly it sounds like you’re contradicting my “tractability“ footnot... (read more)

I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?

I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking ... (read more)

2TekhneMakre
Sounds likely enough from your description. Most things are mostly not about self-fulfilling prophecies, life can just be sad / hard :( I think that the feedback loop thing is a thing that happens; usually in a weakish form. I mean, I think it's the cause of part of some depressions. Separately, even if it doesn't happen much or very strongly, it could also be a thing that people are afraid of in themselves and in others, continuously with things like "trying to cheer someone up". That's my guess, to some extent, but IDK. I think we'd live in different, more hopeful world if you're not (incorrectly) typical-minding here.

I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.

I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight >... (read more)

Huh, thanks! 

The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.

I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3

That sounds promising actually… It has become acceptable over the past decade to suggest that some things ought not to be open-sourced. Maybe it can become acceptable to argue for DRM for certain things too. Since we don’t yet have brain scanning technology, I’d also be interested in an inverse cryonics organization that has all the expertise to really really really make sure that your brain and maybe a lot of your social media activity and whatnot really gets destroyed after your death. (Perhaps even some sorts of mechanism by which suicide and complete s... (read more)

1RedMan
For a suicide switch, a purpose built shaped charge mounted to the back of your skull (a properly engineered detonation wave would definitely pulp your brain, might even be able to do it without much danger to people nearby), raspberry pi with preinstalled 'delete it all and detonate' script on belt, secondary script that executes automatically if it loses contact with you for a set period of time. That's probably overengineered though, just request cremation with no scan, and make sure as much of your social life as possible is in encrypted chat. When you die, the passwords are gone. When the tech gets closer and there are fears about wishes for cremation not being honored, EAs should pool their funds to buy a funeral home and provide honest services.

Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.

In the tractability footnote above I make the case that it should be at least vastly easier than influencing the utility functions of all AIs to make alignment succeed.

3Garrett Baker
Yeah, I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use. You are trying to do better than a superintelligence at a task it is highly incentivized to be good at, so you are not going to beat the superintelligence. Secondly, you need to assume that the pessimization of the superintelligence’s values would be bad, but in fact I expect it to be just as neutral as the optimization. I don’t care about wars between unaligned AIs, even if they do often have them. Their values will be completely orthogonal to my own, so their inverses will also. Even in wars between aligned and unaligned (hitler, for example) humans, suffering which I would trade the world to stop does not happen. Also, wars end, it’d be very weird if you got two AIs warring with each other for eternity. If both knew this was the outcome (of placed some amount of probability on it), why would either of them start the war? People worried about s-risks should be worried about some kinds of partial alignment solutions, where you get the AI aligned enough to care about keeping humans (or other things that are morally relevant) around, but not aligned enough to care if they’re happy (or satisfying any other of a number of values), so you get a bunch of things that can feel pain in moderate pain for eternity.

Interesting take! 

Friend circles of mine – which, I should note, don’t to my knowledge overlap with the s-risks from AI researchers I know – do treat suicide as a perfectly legitimate thing you can do after deliberation, like abortion or gender-affirming surgery. So there’s no particular taboo there. Hence, maybe, why I also don’t recoil from considering that the future might be vastly worse than the present.

But it seems to be like a rationalist virtue not to categorically recoil from certain considerations.

Could you explain the self-fulfilling prophe... (read more)

2TekhneMakre
My impression was that Freudian death wish is aggression in general, (mis)directed at the self. I'm not talking about that. I'm confused what you're saying ,and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way? It's basically like this: my experience is bad. If my experience is this bad, I'd rather not live. Can I make my experience good enough to be worth living? That depends on whether I work really hard or not. I observe that I am not working hard. Therefore I expect that my experience won't get sufficiently better to be worth it. Therefore locally speaking it's not worth it to try hard today to make my life better; I won't keep that work up, and will just slide back. So my prediction that I won't work to make my life better is correct and self-fulfilling. If I thought I would spend many days working to make my life better, then it would become worth it, locally speaking, to work hard today, because that would actually move the needle on chances of making life worth it. Surely you can see that this isn't common, and the normal response is to just be broken until you die.

Thx! Yep, your edit basically captures most of what I would reply. If alignment turns out so hard that we can’t get any semblance of human values encoded at all, then I’d also guess that hell is quite unlikely. But there are caveats, e.g., if there is a nonobvious inner alignment failure, we could get a system that technically doesn’t care about any semblance of human values but doesn’t make that apparent because ostensibly optimizing for human values appears useful for it at the time. That could still cause hell, even with a higher-than-normal probability.

Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”

3MinusGix
I primarily mentioned it because I think people base their 'what is the S-risk outcome' on basically antialigned AGI. The post has 'AI hell' in the title and uses comparisons between extreme suffering versus extreme bliss, calls s-risks more important than alignment (which I think makes sense to a reasonable degree if antialigned s-risk is likely or a sizable portion of weaker dystopias are likely, but I don't think makes sense for antialigned being very unlikely and my considering weak dystopias to also be overall not likely) . The extrema argument is why I don't think that weak dystopias are likely, because I think that - unless we succeed at alignment to a notable degree - then the extremes of whatever values shake out are not something that keeps humans around for very long. So I don't expect weaker dystopias to occur either. I expect that most AIs aren't going to value making a notable deliberate AI hell, whether out of the lightcone or 5% of it or 0.01% of it. If we make an aligned-AGI and then some other AGI says 'I will simulate a bunch of humans in torment unless you give me a planet' then I expect that our aligned-AGI uses a decision-theory that doesn't give into dt-Threats and doesn't give in (and thus isn't threatened, because the other AGI gains nothing from actually simulating humans in that). So, while I do expect that weak dystopias have a noticeable chance of occurring, I think it is significantly unlikely? It grows more likely we'll end up in a weak dystopia as alignment progresses. Like if we manage to get enough of a 'caring about humans specifically' (though I expect a lot of attempts like that to fall apart and have weird extremes when they're optimized over!), then that raises the chances of a weak dystopia. However I also believe that alignment is roughly the way to solve these. To get notable progress on making AGIs avoid specific area, I believe that requires more alignment progress than we have currently. -------------------------------

Interesting take! Obviously that’s different for me and many others, but you’re not alone with that. I even know someone who would be ready to cook in a lava lake forever if it implies continuing to exist. I think that’s also in line with the DALY disability weights, but only because they artificially scale them to the 0–1 interval.

So I imagine you’d never make such a deal as shortening you life by three hours in exchange for not experiencing one hour of the worst pain or other suffering you’ve experienced?

3Charlie Steiner
It's plausible you could catch me on days where I would take the deal, but basically yeah, 3:1 seems like plenty of incentive to choose life, whereas at 1:1 (the lava lake thing), life isn't worth it (though maybe you could catch me on days etc etc).
4tslarm
Sorry for pursuing this tangent (which I'm assuming you'll feel free to ignore), but have they ever indicated how likely they think it is that they would continue to hold that preference while in the lava lake?  (I was aware some people voiced preferences like this, but I haven't directly discussed it with any of them. I've often wondered whether they think they would, in the (eternally repeated) moment, prefer the suffering to death, or whether they are willing to condemn themselves to infinite suffering even though they expect to intensely regret it. In both cases I think they are horribly mistaken, but in quite different ways.)

but some wonkier approaches could be pretty scary.

Yeah, very much agreed. :-/

in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.

Those are some good properties, I think… Not quite sure in the end.

But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up ... (read more)

3Tamsin Leake
yes, the eventual outcome is hard to predict. but by plan looks like the kind of plan that would fail in Xrisky rather than Srisky ways, when it fails. i don't use the Thing-line nomenclature very much anymore and i only use U/X/S. i am concerned about the other paths as well but i'm hopeful we can figure them out within the QACI counterfactuals.

Some promising interventions against s-risks that I’m aware of are:

  1. Figure out what’s going on with bargaining solutions. Nash, Kalai, or Kalai-Smorodinsky? Is there one that is privileged in some impartial way? 
  2. Is there some sort of “leader election” algorithm over bargaining solutions?
  3. Do surrogate goals work, are they cooperative enough?
  4. Will neural-net based AIs be comprehensible to each other, if so, what does the open source game theory say about how conflicts will play out?
  5. And of course CLR’s research agenda.

Interpretability research is probably i... (read more)

9Seth Herd
These suggestions are all completely opaque to me. I don't see how a single one of them would work to reduce s-risk, or indeed understand what the first three are or why the last one matters. That's after becoming conversant with the majority of thinking and terminology around alignment approaches. So maybe that's one reason you don't see people.discussing s-risk much - the few people doing it are not communicating their ideas in a compelling or understandable way. That doesn't answer the main question, but cause-building strategy is one factor in any question of why things are or aren't attended.

I don’t see how any of these actually help reduce s-risk. Like, if we know some bargaining solutions lead to everyone being terrible and others lead to everyone being super happy so what? Its not like we can tremendously influence the bargaining solution our AI & those it meets settles on after reflection.

I also know plenty of cheerful ones. :-3

Interesting. Do I give off that vibe – here or in other writings?

Thx! I’ll probably drop the “more heavily” for stylistic reasons, but otherwise that sounds good to me!

I suppose my shooting range metaphor falls short here. Maybe alignment is like teaching a kid to be an ace race car driver, and s-risks are accidents on normal roads. There it also depends on the details whether the ace race car driver will drive safely on normal roads.

Oh, true! Digital sentience is also an important point! A bit of an intuition pump is that if you consider a certain animal to be sentient (at least with some probability), then an em of that animal’s brain may be sentient with a similar probability. If an AI is powerful enough to run such ems, the question is no longer whether digital sentience is possible but why an AI would run such an em.

The Maslow hierarchy is reverse for me, i.e. rather dead/disempowered than being tortured, but that’s just a personal thing. In the end it’s more important what the acausal moral compromise says, I think.

4Raemon
Yeah to be clear my mainline prediction is that an unfriendly AI goes through some period of simulating lots of humans (less likely to simulate animals IMO) as part of it's strategizing process, kills humanity, and then goes on to do mostly non-sentient things.  There might be a second phase where it does some kind of weird acausal thing, not sure. I don't know that in my mainline prediction the simulation process results in much more negative utility than the extinction part. I think the AI probably has to do much of it's strategizing without enough compute to simulate vast numbers of humans, and I weakly bet against those simulations ending up suffering in a way that ends up outweighing human extinction. There are other moderately likely worlds IMO and yeah I think s-risk is a pretty real concern.

Good point. I can still change it. What title would you vote for? I spent a lot of time vacillating between titles and don’t have a strong opinion. These were the options that I considered:

  1. Why not s-risks? A poll.
  2. Why are we so complacent about AI hell?
  3. Why aren’t we taking s-risks from AI more seriously?
  4. Why do so few people care about s-risks from AI?
  5. Why are we ignoring the risks of AI hell?
  6. What’s holding us back from addressing s-risks from AI?
  7. Why aren’t we doing more to prevent s-risks from AI?
  8. What will it take to get people to care about s-risks from AI?
4Raemon
"Why aren't more people prioritizing work on S-risks more heavily" seems better to me and seems like the question you probably actually care about. Question-titles that are making (in many cases inaccurate) claims about people's motivations seem more fraught and unhelpfully opinionated.

I agree with what Lukas linked. But there are also various versions of the Waluigi Effect, so that alignment, if done wrong, may increase s-risk. Well, and I say in various answers and the in post proper that I’m vastly more optimistic about reducing s-risk than having to resort to anything that would increase x-risk.

Yeah… When it comes to the skill overlap, having alignment research aided by future pre-takeoff AIs seems dangerous. Having s-risk research aided that way seems less problematic to me. That might make it accessible (now or in a year) for people who have struggled with alignment research. I also wonder whether there is maybe still more time for game-theoretic research in s-risks than three is in alignment. The s-risk-related problems might be easier, so they can perhaps still be solved in time. (NNTR, just thinking out loud.)

Oooh, good point! I’ve certainly observed that in myself in other areas.

Like, “No one is talking about something obvious? Then it must be forbidden to talk about and I should shut up too!” Well, no one is freaking out in that example, but if someone were, it would enhance the effect.

Answer by Dawn Drescher32

Too unknown. Finally there’s the obvious reason that people just don’t know enough about s-risks. That seems quite likely to me.

1Dawn Drescher
Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and s-risks.org).
Answer by Dawn Drescher05

Too unpopular. Maybe people are motivated by what topics are in vogue in their friend circles, and s-risks are not?

Answer by Dawn Drescher11

Personal fit. Surely, some people have tried working on s-risks in different roles for some substantial period of time but haven’t found an angle from which they can contribute given their particular skills.

3Lukas_Gloor
Related to the "personal fit" explanation: I'd argue that the skills required to best reduce s-risks have much overlap with the skills to make alignment progress (see here).   At least, I think this goes for directly AI-related s-risks, which I consider most concerning, but I put significantly lower probabilities on them than you do. For s-risks conditioned on humans staying in control over the future, we maybe wouldn't gain much from explicitly modelling AI takeoff and engaging in all the typical longtermist thought. Therefore, some things that reduce future disvalue don't have to look like longtermism? For instance, common sense ways to improve society's rationality, coordination abilities, and values. (Maybe there's a bit of leverage to gain from thinking explicitly about how AI will change things.) The main drawback to those types of interventions is (1) disvalue at stake might be smaller than the disvalue for directly AI-related s-risks conditional on the scenarios playing out, and (2) it only matters how society thinks and what we value if humans actually stay in control over the future, which arguably seems pretty unlikely.

There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.

The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n... (read more)

2lc
I don't understand why you think a multipolar takeoff would run S-risks.
Answer by Dawn Drescher1427

Too unlikely. I’ve heard three versions of this concern. One is that s-risks are unlikely. I simply don’t think it is as explained above, in the post proper. The second version is that it’s 1/10th of extinction, hence less likely, hence not a priority. The third version of this take is that it’s just psychologically hard to be motivated for something that is not the mode of the probability distribution of how the future will turn out (given such clusters as s-risks, extinction, and business as usual). So even if s-risks are much worse and only slightly less likely than extinction, they’re still hard for people to work on.

There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.

The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n... (read more)

That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated.

In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than ... (read more)

Answer by Dawn Drescher10-5

NNTs. Some might argue that “naive negative utilitarians that take ideas seriously” (NNTs) want to destroy the world, so that any admissions that s-risks are morally important in expectation should happen only behind closed doors and only among trusted parties.

5Dawn Drescher
That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated. In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than alignment, as explained in the tractability section above. So concern for s-risks could even be a concomitant of moral cooperativeness and can thus even counter any undemocratic, unilateralist actions by one moral system. Note also that there is a huge chasm between axiology and morality. I have pretty strong axiological intuitions but what morality follows from that (even just assuming the axiology axiomatically – no pun intended) is an unsolved research question that would take decades and whole think tanks to figure out. So even if someone values empty space over earth today, they’re probably still not omnicidal. The suffering-focused EAs I know are deeply concerned about the causal and acausal moral cooperativeness of their actions. (Who wants to miss out on moral gains from trade after all!) And chances are this volume of space will be filled by some grabby aliens eventually, so assured permanent nonexistence is not even on the table.

I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.)

I don’t know if that’s the case, but s-risks can be reframed:

  1. We want to unlock positive-sum trades for the flourishing of our descendants (biological or not).
  2. We want to distribute the progress and welfare gains from AI equitably (i.e. not have some sizable fr
... (read more)
Answer by Dawn Drescher01

Too sad. Some people think that maybe working on s-risks is unpopular because suffering is too emotionally draining to think about, so people prefer to ignore it.

Another version of this concern is that sad topics are not in vogue with the rich tech founders who bankroll our think tanks; that they’re selected to be the sort of people who are excited about incredible moonshots rather than prudent risk management. If these people hear about averting suffering, reducing risks, etc. too often from EA circles, they’ll become uninterested in EA-aligned thinking and think tanks.

3Dawn Drescher
I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.) I don’t know if that’s the case, but s-risks can be reframed: 1. We want to unlock positive-sum trades for the flourishing of our descendants (biological or not). 2. We want to distribute the progress and welfare gains from AI equitably (i.e. not have some sizable fractions of future beings suffer extremely). 3. Our economy only works thanks to trust in institutions and jurisprudence. The flourishing of the AI economy will require that new frameworks be developed that live up to the challenges of the new era! These reframings should of course be followed up with a detailed explanation so as not to be dishonest. Their purpose is just to show that one can pivot one’s thinking about s-risks such that the suffering is not so front and center. This would, if anything, reduce my motivation to work on them, but that’s just me.  
Load More