Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it's maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.
I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.
Ha, thanks!
(It was part of the reason. Normally I'd have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I'd take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you've put the full text here I don't find myself very tempted to remove it. :) )
This kind of checks out to me. At least, I agree that it's evidence against treating quantum computers as primitive that humans, despite living in a quantum world, find classical computers more natural.
I guess I feel more like I'm in a position of ignorance, though, and wouldn't be shocked to find some argument that quantum has in some other a priori sense a deep naturalness which other niche physics theories lack.
It's not obvious that open source leads to faster progress. Having high quality open source products reduces the incentives for private investment. I'm not sure in which regimes that will play out that it's overall accelerationist, but I sort of guess that it will be decelerationist during an intense AI race (where the investments needed to push the frontier out are enormous and significantly profit-motivated).
I like the framework.
Conceptual nit: why do you include inhibitions as a type of incentive? It seems to me more natural to group them with internal motivations than external incentives. (I understand that they sit in the same position in the argument as external incentives, but I guess I'm worried that lumping them together may somehow obscure things.)
I actually agree with quite a bit of this. (I nearly included a line about pursuing excellence in terms of time allocation, but — it seemed possibly-redundant with some of the other stuff on not making the perfect the enemy of the good, and I couldn't quickly see how to fit it cleanly into the flow of the post, so I left it and moved on ...)
I think it's important to draw the distinction between perfection and excellence. Broadly speaking, I think people often put too much emphasis on perfection, and often not enough on excellence.
Maybe I shouldn't have led...
Can you say more about why you believe this? At first glance, it seems to be like "fundamental instability" is much more tied to how AI development goes, so I would've expected it to be more tractable [among LW users].
Maybe "simpler" was the wrong choice of word. I didn't really mean "more tractable". I just meant "it's kind of obvious what needs to happen (even if it's very hard to get it to happen)". Whereas with fundamental instability it's more like it's unclear if it's actually a very overdetermined fundamental instability, or what exactly could nudge...
Just a prompt to say that if you've been kicking around an idea of possible relevance to the essay competition on the automation of wisdom and philosophy, now might be the moment to consider writing it up -- entries are due in three weeks.
My take is that in most cases it's probably good to discuss publicly (but I wouldn't be shocked to become convinced otherwise).
The main plausible reason I see for it potentially being bad is if it were drawing attention to a destabilizing technology that otherwise might not be discovered. But I imagine most thoughts are kind of going to be chasing through the implications of obvious ideas. And I think that in general having the basic strategic situation be closer to common knowledge is likely to reduce the risk of war.
(You might think the discussion ...
The way I understand it could work is that democratic leaders with "democracy-aligned AI" would get more effective influence on nondemocratic figures (by fine-tuned persuasion or some kind of AI-designed political zugzwang or etc), thus reducing totalitarian risks. Is my understanding correct?
Not what I'd meant -- rather, that democracies could demand better oversight of their leaders, and so reduce the risk of democracies slipping into various traps (corruption, authoritarianism).
My mainline guess is that information about bad behaviour by Sam was disclosed to them by various individuals, and they owe a duty of confidence to those individuals (where revealing the information might identify the individuals, who might thereby become subject to some form of retaliation).
("Legal reasons" also gets some of my probability mass.)
OK hmm I think I understand what you mean.
I would have thought about it like this:
... but as you say anthropics is confusing, so I might be getting this wrong.
I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I'd be happy to debate if you'd find it productive (although I'm not sure whether I'm disagreeable enough to be a good choice).
I think point 2 is plausible but doesn't super support the idea that it would eliminate the biosphere; if it cared a little, it could be fairly cheap to take some actions to preserve at least a version of it (including humans), even if starlifting the sun.
Point 1 is the argument which I most see as supporting the thesis that misaligned AI would eliminate humanity and the biosphere. And then I'm not sure how robust it is (it seems premised partly on translating our evolved intuitions about discount rates over to imagining the scenario from the perspective of the AI system).
Wait, how does the grabby aliens argument support this? I understand that it points to "the universe will be carved up between expansive spacefaring civilizations" (without reference to whether those are biological or not), and also to "the universe will cease to be a place where new biological civilizations can emerge" (without reference to what will happen to existing civilizations). But am I missing an inferential step?
I think that you're right that people's jobs are a significant thing driving the difference here (thanks), but I'd guess that the bigger impact of jobs is via jobs --> culture than via jobs --> individual decisions. This impression is based on a sense of "when visiting Constellation, I feel less pull to engage in the open-ended idea exploration vs at FHI", as well as "at FHI, I think people whose main job was something else would still not-infrequently spend some time engaging with the big open questions of the day".
I might be wrong about that ¯\_(ツ)_/¯
I feel awkward about trying to offer examples because (1) I'm often bad at that when on the spot, and (2) I don't want people to over-index on particular ones I give. I'd be happy to offer thoughts on putative examples, if you wanted (while being clear that the judges will all ultimately assess things as seem best to them).
Will probably respond to emails on entries (which might be to decline to comment on aspects of it).
(Caveat: it's been a while since I've visited Constellation, so if things have changed recently I may be out of touch.)
I'm not sure that Constellation should be doing anything differently. I think there's a spectrum of how much your culture is like blue-skies thinking vs highly prioritized on the most important things. I think that FHI was more towards the first end of this spectrum, and Constellation is more towards the latter. I think that there are a lot of good things that come with being further in that direction, but I do think it means you're less l...
(I work out of Constellation and am closely connected to the org in a bunch of ways)
I think you're right that most people at Constellation aren't going to seriously and carefully engage with the aliens-building-AGI question, but I think describing it as a difference in culture is missing the biggest factor leading to the difference: most of the people who work at Constellation are employed to do something other than the classic FHI activity of "self-directed research on any topic", so obviously aren't as inclined to engage deeply with it.
I think there also is a cultural difference, but my guess is that it's smaller than the effect from difference in typical jobs.
I completely agree that Oliver is a great fit for leading on research infrastructure (and the default thing I was imagining was that he would run the institute; although it's possible it would be even better if he could arrange to be number two with a strong professional lead, giving him more freedom to focus attention on new initiatives within the institute, that isn't where I'd start). But I was specifically talking about the "research lead" role. By default I'd guess people in this role would report to the head of the institute, but also have a lot of i...
I agree in the abstract with the idea of looking for niches, and I think that several of these ideas have something to them. Nevertheless when I read the list of suggestions my overall feeling is that it's going in a slightly wrong direction, or missing the point, or something. I thought I'd have a go at articulating why, although I don't think I've got this to the point where I'd firmly stand behind it:
It seems to me like some of the central FHI virtues were:
I think FHI was an extremely special place and I was privileged to get to spend time there.
I applaud attempts to continue its legacy. However, I'd feel gut-level more optimistic about plans that feel more grounded in thinking about how circumstances are different now, and then attempting to create the thing that is live and good given that, relative to attempting to copy FHI as closely as possible.
You mention not getting to lean on Bostrom's research taste as one driver of differences, and I think this is correct but ...
Generally agree with most things in this comment. To be clear, I have been thinking about doing something in the space for many years, internally referring to it as creating an "FHI of the West", and while I do think the need for this is increased by FHI disappearing, I was never thinking about this as a clone of FHI, but was always expecting very substantial differences (due to differences in culture, skills, and broader circumstances in the world some of which you characterize above)
I wrote this post mostly because with the death of FHI it seemed to me t...
Multiple entries are very welcome!
[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I'm not sure exactly how we'd handle it if someone did the latter, but we'd aim for something sensible that didn't incentivise people to have been silly about it.]
I think that for most of what I'm saying, the meaning wouldn't change too much if you replaced the word "wholesome" with "virtuous" (though the section contrasting it with virtue ethics would become more confusing to read).
As practical guidance, however, I'm deliberately piggybacking off what people already know about the words. I think the advice to make sure that you pay attention to ways in which things feel unwholesome is importantly different from (and, I hypothesize, more useful than) advice to make sure you pay attention to ways in which thing...
If you personally believe it to be wrong, it's unwholesome. But generically no. See the section on revolutionary action in the third essay.
The most straightforward criterion would probably be "things they themselves feel to be mistakes a year or two later". That risks people just failing to own their mistakes so would only work with people I felt enough trust in to be honest with themselves. Alternatively you could have an impartial judge. (I'd rather defer to "someone reasonable making judgements" than try to define exactly what a mistake is, because the latter would cover a lot of ground and I don't think I'd do a good job of it; also my claims don't feel super sensitive to how mistakes are defined.)
I would certainly update in the direction of "this is wrong" if I heard a bunch of people had tried to apply this style of thinking over an extended period, I got to audit it a bit by chatting to them and it seemed like they were doing a fair job, and the outcome was they made just as many/serious mistakes as before (or worse!).
(That's not super practically testable, but it's something. In fact I'll probably end up updating some from smaller anecdata than that.)
I definitely agree that this fails as a complete formula for assessing what's good or bad. My feeling is that it offers an orientation that can be helpful for people aggregating stuff they think into all-things-considered judgements (and e.g. I would in retrospect have preferred to have had more of this orientation in the past).
If someone were using this framework to stop thinking about things that I thought they ought to consider, I couldn't be confident that they weren't making a good faith effort to act wholesomely, but I at least would think that their actions weren't wholesome by my lights.
Good question, my answer on this is nuanced (and I'm kind of thinking it through in response to your question).
I think that what feels to you to be wholesome will depend on your values. And I'm generally in favour of people acting according to their own feeling of what is wholesome.
On the other hand I also think there would be some choices of values that I would describe as "not wholesome". These are the ones which ignore something of what's important about some dimension (perhaps justifying ignoring it by saying "I just don't value this"), at least as fel...
I agree that "paying attention to the whole system" isn't literally a thing that can be done, and I should have been clearer about what I actually meant. It's more like "making an earnest attempt to pay attention to the whole system (while truncating attention at a reasonable point)". It's not that you literally get to attend to everything, it's that you haven't excluded some important domain from things you care about. I think habryka (quoting and expanding on Ben Pace's thoughts) has a reasonable description of this in a comment.
I definitely don't ...
I think that there is some important unwholesomeness in these things, but that isn't supposed to mean that they're never permitted. (Sorry, I see how it could give that impression; but in the cases you're discussing there would often be greater unwholesomeness in not doing something.)
I discuss how I think my notion of wholesomeness intersects with these kind of examples in the section on visionary thought and revolutionary action in the third essay.
I think that there's something interesting here. One of the people I talked about this with asked me why children seem exceptionally wholesome (it's certainly not because they're unusually good at tracking the whole of things), and I thought the answer was about them being a part of the world where it may be especially important to avoid doing accidental harm, so our feelings of harms-to-children have an increased sense of unwholesomeness. But I'm now thinking that something like "robustly not evil" may be an important part of it.
Now we can trace out some ...
FWIW I quite like your way of pointing at things here, though maybe I'm more inclined towards letting things hang out for a while in the (conflationary?) alliance space to see which seem to be the deepest angles of what's going on in this vicinity, and doing more of the conceptual analysis a little later.
That said, if someone wanted to suggest a rewrite I'd seriously consider adopting it (or using it as a jumping-off point); I just don't think that I'm yet at the place where a rewrite will flow naturally for me.
I largely think that the section of the second essay on "wholesomeness vs expedience" is also applicable here.
Basically I agree that you sometimes have to not look at things, and I like your framing of the hard question of wholesomeness. I think that the full art of deciding when it's appropriate to not think about something be better discussed via a bunch of examples, rather than trying to describe it in generalities. But the individual decisions are ones that you can make wholesomely or not, and I think that's my current best guess approach for how to ha...
I'd be tempted to make it a question, and ask something like "what do you think the impacts of this on [me/person] are?".
It might be that question would already do work by getting them to think about the thing they haven't been thinking about. But it could also elicit a defence like "it doesn't matter because the mission is more important" in which case I'd follow up with an argument that it's likely worth at least understanding the impacts because it might help to find actions which are better on those grounds while being comparably good -- or even better -- for the mission. Or it might elicit a mistaken model of the impacts, in which case I'd follow up by saying that I thought it was mistaken and explaining how.
In this comment (cross-posted from the EA forum) I’ll share a few examples of things I mean as failures of wholesomeness. I don’t really mean to over-index on these examples. I actually feel like a decent majority of what I wish that EA had been doing differently relates to this wholesomeness stuff. However, I’m choosing examples that are particularly easy to talk about — around FTX and around mistakes I've made — because I have good visibility of them, and in order not to put other people on the spot. Alth...
It's been a long time since I read those books, but if I'm remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I'm inclined to think: implausibly so -- if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).