All of Dawn Drescher's Comments + Replies

AI Safety Impact Markets: Your Charity Evaluator for AI Safety

Oh, haha! I'll try to be more concise!

Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I'm actually (I think) much more pessimistic than you seem to be. There's a risk that EV is just completely undefined even in principle and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impo... (read more)

The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)

Awww, thanks for the input!

I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.

The second is more interesting but also more speculative. The diagram here, shows both the verifier/auditor/evaluator and the standardization firms. I s... (read more)

1Joe Collman2y

Thanks for the lengthy response. Pre-emptive apologies for my too-lengthy response; I tried to condense it a little, but gave up! Some thoughts: First, since it may help suggest where I'm coming from: Certainly to some extent, but much less than you're imagining - I'm an initially self-funded AIS researcher who got a couple of LTFF research grants and has since been working with MATS. Most of those coming through MATS have short runways and uncertain future support for their research (quite a few are on PhD programs, but rarely AIS PhDs). Second, I get the impression that you think I'm saying [please don't do this] rather than [please do this well]. My main point throughout is that the lack of reliable feedback makes things fundamentally different, and that we shouldn't expect [great mechanism in a context with good feedback] to look the same as [great mechanism in a context without good feedback]. To be clear, when I say "lack of reliable feedback", I mean relative to what would be necessary - not relative to the best anyone can currently do. Paul Christiano's carefully analyzing each project proposal (or outcome) for two weeks wouldn't be "reliable feedback" in the sense I mean. I should clarify that I'm talking only about technical AIS research when it comes to inadequacy of feedback. For e.g. projects to increase the chance of an AI pause/moratorium, I'm much less pessimistic: I'd characterize these as [very messy, but within a context that's fairly well understood]. I'd expect the right market mechanisms to do decently well at creating incentives here, and for our evaluations to be reasonably calibrated in their inaccuracy (or at least to correct in that direction over time). Third, my concerns become much more significant as things scale - but such scenarios are where you'll get almost all of your expected impact (whether positive or negative). As long as things stay small, you're only risking missing the opportunity to do better, rather than e.g. subst

The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)

It would the producer of the public good (e.g. for my project I put up the collateral).

Oh, got it! Thanks!

Possibly? I'm not sure why you'd do that?

I thought you’d be fundraising to offer refund compensation to others to make their fundraisers more likely to succeed. But if the project developer themself put up the compensation, it’s probably also an important signal or selection effect in the game theoretic setup.

I disagree that a Refund Bonus is a security.

Yeah, courts decide that in the end. Howey Test: money: yes; common enterprise: yes; expectation of ... (read more)

1moyamo2y

Oh cool, that's a good idea. Then you can piggy back off existing crowdfunding platforms instead of making your own one. Do you have a link? It sounds cool. I want to check it out. I think I see your point. I agree that DACs don't solve this type of free-riding.

Dawn Drescher2y60

Wonderful that you’re working on this! I’m with AI Safety Impact Markets, and I suspect that we will need a system like this eventually. We haven’t received a lot of feedback to the effect yet, so I haven’t prioritized it, but there are at least two applications for it (for investors and (one day, speculatively) for impact buyers/retrofunders). We’re currently addressing it with a bonding curve auction of sorts, which incentivizes donors to come in early, so that they’re also not so incentivized to wait each other out. The incentive structures are differen... (read more)

1moyamo2y

Thanks for you comment, it's very helpful. It would the producer of the public good (e.g. for my project I put up the collateral). Possibly? I'm not sure why you'd do that? I disagree that a Refund Bonus is a security. It's a refund. To me it's when you buy something, but it comes broken, so the store gives you a voucher to make up for your troubles. I'm in South Africa but from what I can tell, if you work with US dollars and do something illegal the FBI will come after you, so I wouldn't be confident that only South African law applies. This is actually a cool idea. I don't know how I'd manage to get people's details for giving refund without co-operating with the fundraising platform, and my impression is that most platforms are hesitant to do things like this. If you know of a platform that would be keen on trying this, please tell me! I don't quite understand this point. You could work on AI Safety and donate to animal charities if you don't want to free-ride. You're right. I didn't want the title to just be "Dominant Assurance Contracts" because I assumed that most people have never heard of them and tried to come up with something more interesting, but maybe enough people on lesswrong have heard of them so I should probably be more straight forward.

Report on modeling evidential cooperation in large worlds

Amazing work! So glad it’s finally out in the open!

2Johannes Treutlein2y

Thank you! :)

My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.

Dawn Drescher2y43

I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use

I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bar... (read more)

Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.

Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.

Woah, thanks! I hadn’t seen it!

No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!

I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.

But more importantly it sounds like you’re contradicting my “tractability“ footnot... (read more)

I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?

I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking ... (read more)

2TekhneMakre2y

Sounds likely enough from your description. Most things are mostly not about self-fulfilling prophecies, life can just be sad / hard :( I think that the feedback loop thing is a thing that happens; usually in a weakish form. I mean, I think it's the cause of part of some depressions. Separately, even if it doesn't happen much or very strongly, it could also be a thing that people are afraid of in themselves and in others, continuously with things like "trying to cheer someone up". That's my guess, to some extent, but IDK. I think we'd live in different, more hopeful world if you're not (incorrectly) typical-minding here.

I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.

I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight >... (read more)

Huh, thanks!

The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.

I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3

Dawn Drescher2y32

That sounds promising actually… It has become acceptable over the past decade to suggest that some things ought not to be open-sourced. Maybe it can become acceptable to argue for DRM for certain things too. Since we don’t yet have brain scanning technology, I’d also be interested in an inverse cryonics organization that has all the expertise to really really really make sure that your brain and maybe a lot of your social media activity and whatnot really gets destroyed after your death. (Perhaps even some sorts of mechanism by which suicide and complete s... (read more)

1RedMan2y

For a suicide switch, a purpose built shaped charge mounted to the back of your skull (a properly engineered detonation wave would definitely pulp your brain, might even be able to do it without much danger to people nearby), raspberry pi with preinstalled 'delete it all and detonate' script on belt, secondary script that executes automatically if it loses contact with you for a set period of time. That's probably overengineered though, just request cremation with no scan, and make sure as much of your social life as possible is in encrypted chat. When you die, the passwords are gone. When the tech gets closer and there are fears about wishes for cremation not being honored, EAs should pool their funds to buy a funeral home and provide honest services.

Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.

Dawn Drescher2y21

In the tractability footnote above I make the case that it should be at least vastly easier than influencing the utility functions of all AIs to make alignment succeed.

3Garrett Baker2y

Yeah, I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use. You are trying to do better than a superintelligence at a task it is highly incentivized to be good at, so you are not going to beat the superintelligence. Secondly, you need to assume that the pessimization of the superintelligence’s values would be bad, but in fact I expect it to be just as neutral as the optimization. I don’t care about wars between unaligned AIs, even if they do often have them. Their values will be completely orthogonal to my own, so their inverses will also. Even in wars between aligned and unaligned (hitler, for example) humans, suffering which I would trade the world to stop does not happen. Also, wars end, it’d be very weird if you got two AIs warring with each other for eternity. If both knew this was the outcome (of placed some amount of probability on it), why would either of them start the war? People worried about s-risks should be worried about some kinds of partial alignment solutions, where you get the AI aligned enough to care about keeping humans (or other things that are morally relevant) around, but not aligned enough to care if they’re happy (or satisfying any other of a number of values), so you get a bunch of things that can feel pain in moderate pain for eternity.

Interesting take!

Friend circles of mine – which, I should note, don’t to my knowledge overlap with the s-risks from AI researchers I know – do treat suicide as a perfectly legitimate thing you can do after deliberation, like abortion or gender-affirming surgery. So there’s no particular taboo there. Hence, maybe, why I also don’t recoil from considering that the future might be vastly worse than the present.

But it seems to be like a rationalist virtue not to categorically recoil from certain considerations.

Could you explain the self-fulfilling prophe... (read more)

2TekhneMakre2y

My impression was that Freudian death wish is aggression in general, (mis)directed at the self. I'm not talking about that. I'm confused what you're saying ,and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way? It's basically like this: my experience is bad. If my experience is this bad, I'd rather not live. Can I make my experience good enough to be worth living? That depends on whether I work really hard or not. I observe that I am not working hard. Therefore I expect that my experience won't get sufficiently better to be worth it. Therefore locally speaking it's not worth it to try hard today to make my life better; I won't keep that work up, and will just slide back. So my prediction that I won't work to make my life better is correct and self-fulfilling. If I thought I would spend many days working to make my life better, then it would become worth it, locally speaking, to work hard today, because that would actually move the needle on chances of making life worth it. Surely you can see that this isn't common, and the normal response is to just be broken until you die.

Thx! Yep, your edit basically captures most of what I would reply. If alignment turns out so hard that we can’t get any semblance of human values encoded at all, then I’d also guess that hell is quite unlikely. But there are caveats, e.g., if there is a nonobvious inner alignment failure, we could get a system that technically doesn’t care about any semblance of human values but doesn’t make that apparent because ostensibly optimizing for human values appears useful for it at the time. That could still cause hell, even with a higher-than-normal probability.

Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”

3MinusGix2y

I primarily mentioned it because I think people base their 'what is the S-risk outcome' on basically antialigned AGI. The post has 'AI hell' in the title and uses comparisons between extreme suffering versus extreme bliss, calls s-risks more important than alignment (which I think makes sense to a reasonable degree if antialigned s-risk is likely or a sizable portion of weaker dystopias are likely, but I don't think makes sense for antialigned being very unlikely and my considering weak dystopias to also be overall not likely) . The extrema argument is why I don't think that weak dystopias are likely, because I think that - unless we succeed at alignment to a notable degree - then the extremes of whatever values shake out are not something that keeps humans around for very long. So I don't expect weaker dystopias to occur either. I expect that most AIs aren't going to value making a notable deliberate AI hell, whether out of the lightcone or 5% of it or 0.01% of it. If we make an aligned-AGI and then some other AGI says 'I will simulate a bunch of humans in torment unless you give me a planet' then I expect that our aligned-AGI uses a decision-theory that doesn't give into dt-Threats and doesn't give in (and thus isn't threatened, because the other AGI gains nothing from actually simulating humans in that). So, while I do expect that weak dystopias have a noticeable chance of occurring, I think it is significantly unlikely? It grows more likely we'll end up in a weak dystopia as alignment progresses. Like if we manage to get enough of a 'caring about humans specifically' (though I expect a lot of attempts like that to fall apart and have weird extremes when they're optimized over!), then that raises the chances of a weak dystopia. However I also believe that alignment is roughly the way to solve these. To get notable progress on making AGIs avoid specific area, I believe that requires more alignment progress than we have currently. -------------------------------

Interesting take! Obviously that’s different for me and many others, but you’re not alone with that. I even know someone who would be ready to cook in a lava lake forever if it implies continuing to exist. I think that’s also in line with the DALY disability weights, but only because they artificially scale them to the 0–1 interval.

So I imagine you’d never make such a deal as shortening you life by three hours in exchange for not experiencing one hour of the worst pain or other suffering you’ve experienced?

3Charlie Steiner2y

It's plausible you could catch me on days where I would take the deal, but basically yeah, 3:1 seems like plenty of incentive to choose life, whereas at 1:1 (the lava lake thing), life isn't worth it (though maybe you could catch me on days etc etc).

4tslarm2y

Sorry for pursuing this tangent (which I'm assuming you'll feel free to ignore), but have they ever indicated how likely they think it is that they would continue to hold that preference while in the lava lake? (I was aware some people voiced preferences like this, but I haven't directly discussed it with any of them. I've often wondered whether they think they would, in the (eternally repeated) moment, prefer the suffering to death, or whether they are willing to condemn themselves to infinite suffering even though they expect to intensely regret it. In both cases I think they are horribly mistaken, but in quite different ways.)

but some wonkier approaches could be pretty scary.

Yeah, very much agreed. :-/

in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.

Those are some good properties, I think… Not quite sure in the end.

But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up ... (read more)

3Tamsin Leake2y

yes, the eventual outcome is hard to predict. but by plan looks like the kind of plan that would fail in Xrisky rather than Srisky ways, when it fails. i don't use the Thing-line nomenclature very much anymore and i only use U/X/S. i am concerned about the other paths as well but i'm hopeful we can figure them out within the QACI counterfactuals.

Dawn Drescher2y*45

Some promising interventions against s-risks that I’m aware of are:

Figure out what’s going on with bargaining solutions. Nash, Kalai, or Kalai-Smorodinsky? Is there one that is privileged in some impartial way?
Is there some sort of “leader election” algorithm over bargaining solutions?
Do surrogate goals work, are they cooperative enough?
Will neural-net based AIs be comprehensible to each other, if so, what does the open source game theory say about how conflicts will play out?
And of course CLR’s research agenda.

Interpretability research is probably i... (read more)

9Seth Herd2y

These suggestions are all completely opaque to me. I don't see how a single one of them would work to reduce s-risk, or indeed understand what the first three are or why the last one matters. That's after becoming conversant with the majority of thinking and terminology around alignment approaches. So maybe that's one reason you don't see people.discussing s-risk much - the few people doing it are not communicating their ideas in a compelling or understandable way. That doesn't answer the main question, but cause-building strategy is one factor in any question of why things are or aren't attended.

Garrett Baker2y124

I don’t see how any of these actually help reduce s-risk. Like, if we know some bargaining solutions lead to everyone being terrible and others lead to everyone being super happy so what? Its not like we can tremendously influence the bargaining solution our AI & those it meets settles on after reflection.

Dawn Drescher2y7-1

I also know plenty of cheerful ones. :-3

Dawn Drescher2y113

Interesting. Do I give off that vibe – here or in other writings?

Thx! I’ll probably drop the “more heavily” for stylistic reasons, but otherwise that sounds good to me!

I suppose my shooting range metaphor falls short here. Maybe alignment is like teaching a kid to be an ace race car driver, and s-risks are accidents on normal roads. There it also depends on the details whether the ace race car driver will drive safely on normal roads.

Dawn Drescher2y-10

Oh, true! Digital sentience is also an important point! A bit of an intuition pump is that if you consider a certain animal to be sentient (at least with some probability), then an em of that animal’s brain may be sentient with a similar probability. If an AI is powerful enough to run such ems, the question is no longer whether digital sentience is possible but why an AI would run such an em.

The Maslow hierarchy is reverse for me, i.e. rather dead/disempowered than being tortured, but that’s just a personal thing. In the end it’s more important what the acausal moral compromise says, I think.

4Raemon2y

Yeah to be clear my mainline prediction is that an unfriendly AI goes through some period of simulating lots of humans (less likely to simulate animals IMO) as part of it's strategizing process, kills humanity, and then goes on to do mostly non-sentient things. There might be a second phase where it does some kind of weird acausal thing, not sure. I don't know that in my mainline prediction the simulation process results in much more negative utility than the extinction part. I think the AI probably has to do much of it's strategizing without enough compute to simulate vast numbers of humans, and I weakly bet against those simulations ending up suffering in a way that ends up outweighing human extinction. There are other moderately likely worlds IMO and yeah I think s-risk is a pretty real concern.

Good point. I can still change it. What title would you vote for? I spent a lot of time vacillating between titles and don’t have a strong opinion. These were the options that I considered:

Why not s-risks? A poll.
Why are we so complacent about AI hell?
Why aren’t we taking s-risks from AI more seriously?
Why do so few people care about s-risks from AI?
Why are we ignoring the risks of AI hell?
What’s holding us back from addressing s-risks from AI?
Why aren’t we doing more to prevent s-risks from AI?
What will it take to get people to care about s-risks from AI?

4Raemon2y

"Why aren't more people prioritizing work on S-risks more heavily" seems better to me and seems like the question you probably actually care about. Question-titles that are making (in many cases inaccurate) claims about people's motivations seem more fraught and unhelpfully opinionated.

Dawn Drescher2y810

I agree with what Lukas linked. But there are also various versions of the Waluigi Effect, so that alignment, if done wrong, may increase s-risk. Well, and I say in various answers and the in post proper that I’m vastly more optimistic about reducing s-risk than having to resort to anything that would increase x-risk.

Yeah… When it comes to the skill overlap, having alignment research aided by future pre-takeoff AIs seems dangerous. Having s-risk research aided that way seems less problematic to me. That might make it accessible (now or in a year) for people who have struggled with alignment research. I also wonder whether there is maybe still more time for game-theoretic research in s-risks than three is in alignment. The s-risk-related problems might be easier, so they can perhaps still be solved in time. (NNTR, just thinking out loud.)

Oooh, good point! I’ve certainly observed that in myself in other areas.

Like, “No one is talking about something obvious? Then it must be forbidden to talk about and I should shut up too!” Well, no one is freaking out in that example, but if someone were, it would enhance the effect.