It displays well for me!
TL;DR: Great question! I think it mostly means that we don't have enough data to say much about these projects. So donors who've made early donations to them, can register them and boost their project score.
"GiveWiki" as the authority for the picker, to me, implied that this was from a broader universe of giving, and this was the AI Safety subset.
Could be… That's not so wrong either. We rather artificially limited it to AI safety for the moment to have a smaller, more sharply defined target audience. It also had the advantage that we could recruit our evaluators from our own networks. But ideally I'd like to find owners for other cause areas too and then widen the focus of GiveWiki accordingly. The other cause area where I have a relevant network is animal ri...
It says “AI Safety” later in the title. Do you think I should mention it earlier, like “The AI Safety GiveWiki's Top Picks for the Giving Season of 2023”?
Thanks so much for the summary! I'm wondering how this system could be bootstrapped in the industry using less powerful but current-levels-of-general AIs. Building a proof of concept using a Super Mario world is one thing, but what I would find more interesting is a version of the system that can make probabilistic safety guarantees for something like AutoGPT so that it is immediately useful and thus more likely to catch on.
What I'm thinking of here seems to me a lot like ARC Evals with probably somewhat different processes. Humans doing tasks that s...
Hiii! You can toggle the “Show all” switch on the projects list to see all publicly listed projects. We try to only rank, and thereby effectively recommend, projects that are currently fundraising, so projects that have any sort of donation page or widget that they direct potential donors to. In some cases this is just a page that says “If you would like to support us with a donation, please get in touch.” When the project owner adds a link to such a page in the “payment URL” field, the project switches from “Not currently accepting donations” to “Acceptin...
Oh, haha! I'll try to be more concise!
Possible crux: I think I put a stronger emphasis on attribution of impact in my previous comment than you do because to me that seems like both a bit of a problem and solveable in most cases. When it comes to impact measurement, I'm actually (I think) much more pessimistic than you seem to be. There's a risk that EV is just completely undefined even in principle and even if that should turn out to be false or we can use something like stochastic dominance instead to make decisions, that still leaves us with a near-impo...
Awww, thanks for the input!
I actually have two responses to this, one from the perspective of the current situation – our system in phase 1, very few donors, very little money going around, most donors don't know where to donate – and the final ecosystem that we want to see if phase 3 comes to fruition one day – lots of pretty reliable governmental and CSR funding, highly involved for-profit investors, etc.
The second is more interesting but also more speculative. The diagram here, shows both the verifier/auditor/evaluator and the standardization firms. I s...
It would the producer of the public good (e.g. for my project I put up the collateral).
Oh, got it! Thanks!
Possibly? I'm not sure why you'd do that?
I thought you’d be fundraising to offer refund compensation to others to make their fundraisers more likely to succeed. But if the project developer themself put up the compensation, it’s probably also an important signal or selection effect in the game theoretic setup.
I disagree that a Refund Bonus is a security.
Yeah, courts decide that in the end. Howey Test: money: yes; common enterprise: yes; expectation of ...
Wonderful that you’re working on this! I’m with AI Safety Impact Markets, and I suspect that we will need a system like this eventually. We haven’t received a lot of feedback to the effect yet, so I haven’t prioritized it, but there are at least two applications for it (for investors and (one day, speculatively) for impact buyers/retrofunders). We’re currently addressing it with a bonding curve auction of sorts, which incentivizes donors to come in early, so that they’re also not so incentivized to wait each other out. The incentive structures are differen...
Amazing work! So glad it’s finally out in the open!
My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.
I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use
I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bar...
Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.
Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.
Woah, thanks! I hadn’t seen it!
No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!
I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.
But more importantly it sounds like you’re contradicting my “tractability“ footnot...
I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?
I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking ...
I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.
I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight >...
Huh, thanks!
The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.
I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3
That sounds promising actually… It has become acceptable over the past decade to suggest that some things ought not to be open-sourced. Maybe it can become acceptable to argue for DRM for certain things too. Since we don’t yet have brain scanning technology, I’d also be interested in an inverse cryonics organization that has all the expertise to really really really make sure that your brain and maybe a lot of your social media activity and whatnot really gets destroyed after your death. (Perhaps even some sorts of mechanism by which suicide and complete s...
Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.
In the tractability footnote above I make the case that it should be at least vastly easier than influencing the utility functions of all AIs to make alignment succeed.
Interesting take!
Friend circles of mine – which, I should note, don’t to my knowledge overlap with the s-risks from AI researchers I know – do treat suicide as a perfectly legitimate thing you can do after deliberation, like abortion or gender-affirming surgery. So there’s no particular taboo there. Hence, maybe, why I also don’t recoil from considering that the future might be vastly worse than the present.
But it seems to be like a rationalist virtue not to categorically recoil from certain considerations.
Could you explain the self-fulfilling prophe...
Thx! Yep, your edit basically captures most of what I would reply. If alignment turns out so hard that we can’t get any semblance of human values encoded at all, then I’d also guess that hell is quite unlikely. But there are caveats, e.g., if there is a nonobvious inner alignment failure, we could get a system that technically doesn’t care about any semblance of human values but doesn’t make that apparent because ostensibly optimizing for human values appears useful for it at the time. That could still cause hell, even with a higher-than-normal probability.
Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”
Interesting take! Obviously that’s different for me and many others, but you’re not alone with that. I even know someone who would be ready to cook in a lava lake forever if it implies continuing to exist. I think that’s also in line with the DALY disability weights, but only because they artificially scale them to the 0–1 interval.
So I imagine you’d never make such a deal as shortening you life by three hours in exchange for not experiencing one hour of the worst pain or other suffering you’ve experienced?
Yeah, very much agreed. :-/
in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.
Those are some good properties, I think… Not quite sure in the end.
But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up ...
Some promising interventions against s-risks that I’m aware of are:
Interpretability research is probably i...
I don’t see how any of these actually help reduce s-risk. Like, if we know some bargaining solutions lead to everyone being terrible and others lead to everyone being super happy so what? Its not like we can tremendously influence the bargaining solution our AI & those it meets settles on after reflection.
I also know plenty of cheerful ones. :-3
Interesting. Do I give off that vibe – here or in other writings?
Thx! I’ll probably drop the “more heavily” for stylistic reasons, but otherwise that sounds good to me!
I suppose my shooting range metaphor falls short here. Maybe alignment is like teaching a kid to be an ace race car driver, and s-risks are accidents on normal roads. There it also depends on the details whether the ace race car driver will drive safely on normal roads.
Oh, true! Digital sentience is also an important point! A bit of an intuition pump is that if you consider a certain animal to be sentient (at least with some probability), then an em of that animal’s brain may be sentient with a similar probability. If an AI is powerful enough to run such ems, the question is no longer whether digital sentience is possible but why an AI would run such an em.
The Maslow hierarchy is reverse for me, i.e. rather dead/disempowered than being tortured, but that’s just a personal thing. In the end it’s more important what the acausal moral compromise says, I think.
Good point. I can still change it. What title would you vote for? I spent a lot of time vacillating between titles and don’t have a strong opinion. These were the options that I considered:
I agree with what Lukas linked. But there are also various versions of the Waluigi Effect, so that alignment, if done wrong, may increase s-risk. Well, and I say in various answers and the in post proper that I’m vastly more optimistic about reducing s-risk than having to resort to anything that would increase x-risk.
Yeah… When it comes to the skill overlap, having alignment research aided by future pre-takeoff AIs seems dangerous. Having s-risk research aided that way seems less problematic to me. That might make it accessible (now or in a year) for people who have struggled with alignment research. I also wonder whether there is maybe still more time for game-theoretic research in s-risks than three is in alignment. The s-risk-related problems might be easier, so they can perhaps still be solved in time. (NNTR, just thinking out loud.)
Oooh, good point! I’ve certainly observed that in myself in other areas.
Like, “No one is talking about something obvious? Then it must be forbidden to talk about and I should shut up too!” Well, no one is freaking out in that example, but if someone were, it would enhance the effect.
Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and s-risks.org).
Too unknown. Finally there’s the obvious reason that people just don’t know enough about s-risks. That seems quite likely to me.
Too unpopular. Maybe people are motivated by what topics are in vogue in their friend circles, and s-risks are not?
Personal fit. Surely, some people have tried working on s-risks in different roles for some substantial period of time but haven’t found an angle from which they can contribute given their particular skills.
There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n...
Too unlikely. I’ve heard three versions of this concern. One is that s-risks are unlikely. I simply don’t think it is as explained above, in the post proper. The second version is that it’s 1/10th of extinction, hence less likely, hence not a priority. The third version of this take is that it’s just psychologically hard to be motivated for something that is not the mode of the probability distribution of how the future will turn out (given such clusters as s-risks, extinction, and business as usual). So even if s-risks are much worse and only slightly less likely than extinction, they’re still hard for people to work on.
There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n...
That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated.
In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than ...
NNTs. Some might argue that “naive negative utilitarians that take ideas seriously” (NNTs) want to destroy the world, so that any admissions that s-risks are morally important in expectation should happen only behind closed doors and only among trusted parties.
I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.)
I don’t know if that’s the case, but s-risks can be reframed:
Too sad. Some people think that maybe working on s-risks is unpopular because suffering is too emotionally draining to think about, so people prefer to ignore it.
Another version of this concern is that sad topics are not in vogue with the rich tech founders who bankroll our think tanks; that they’re selected to be the sort of people who are excited about incredible moonshots rather than prudent risk management. If these people hear about averting suffering, reducing risks, etc. too often from EA circles, they’ll become uninterested in EA-aligned thinking and think tanks.
Donors have a switch in their profiles where they can determine whether they want to be listed or not. The top three in the private, complete listing are Jaan Tallinn, Open Phil, and the late Future Fund, whose public grants I've imported. The total ranking lists 92 users.
But I don't think that's core to understanding the step down. I've gone through the projects around the threshold before I posted my last comment, and I think it's really the 90% cutoff that causes it. Not a ... (read more)