All of owencb's Comments + Replies

It's been a long time since I read those books, but if I'm remembering roughly right: Asimov seems to describe a world where choice is in a finely balanced equilibrium with other forces (I'm inclined to think: implausibly so -- if it could manage this level of control at great distances in time, one would think that it could manage to exert more effective control over things at somewhat less distance).

Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it's maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons.

I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.

  1. Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that's relevant; but also it would be very reasonable to mention them in the footnote.
  2. My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I've now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.
2owencb
Actually, on 1) I think that these consequentialist reasons are properly just covered by the later sections. That section is about reasons it's maybe bad to make the One Ring, ~regardless of the later consequences. So it makes sense to emphasise the non-consequentialist reasons. I think there could still be some consequentialist analogue of those reasons, but they would be more esoteric, maybe something like decision-theoretic, or appealing to how we might want to be treated by future AI systems that gain ascendancy.

Ha, thanks!

(It was part of the reason. Normally I'd have made the effort to import, but here I felt a bit like maybe it was just slightly funny to post the one-sided thing, which nudged against linking rather than posting; and also I thought I'd take the opportunity to see experimentally whether it seemed to lead to less engagement. But those reasons were not overwhelming, and now that you've put the full text here I don't find myself very tempted to remove it. :) )

1habryka
Oops, sorry for ruining your experiment :P

The judging process should be complete in the next few days. I expect we'll write to winners at the end of next week, although it's possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.

I don't see why (1) says you should be very early. Isn't the decrease in measure for each individual observer precisely outweighed by their increasing multitudes?

This kind of checks out to me. At least, I agree that it's evidence against treating quantum computers as primitive that humans, despite living in a quantum world, find classical computers more natural.

I guess I feel more like I'm in a position of ignorance, though, and wouldn't be shocked to find some argument that quantum has in some other a priori sense a deep naturalness which other niche physics theories lack.

You say that quantum computers are more complex to specify, but is this a function of using a classical computer in the speed prior? I'm wondering if it could somehow be quantum all the way down.

6jessicata
This gets into philosophy about reference machines in general. You don't want to make a relativist argument that is too general, because then you could say "my niche physics theory is very simple relative to a reference machine for it, it just looks complicated to you because you are using a different reference machine". With priors I'm looking for a thing that could be figured out without looking at the empirical world. Humans figured out lots of math, including classical computation, before figuring out the math of quantum computation. This is despite living in a quantum world. Quantum mechanics has a reputation for being unintuitive, even though we live in a quantum universe it is descriptively true that human natural prior-like complexity measures encoded in the brain don't find quantum mechanics or quantum computation simple.

It's not obvious that open source leads to faster progress. Having high quality open source products reduces the incentives for private investment. I'm not sure in which regimes that will play out that it's overall accelerationist, but I sort of guess that it will be decelerationist during an intense AI race (where the investments needed to push the frontier out are enormous and significantly profit-motivated).

2mako yass
Okay yeah, I meant quicker progress in expectation, I don't believe that people today are capable of the level of coordination under which privatizing science could lead to faster progress in science. But if we're talking about mixed regimes, that's a different question. Are we? Some do complain of a tilt towards a regime where frontier models will only be had by the private sphere, but it seems unlikely to happen.

I like the framework.

Conceptual nit: why do you include inhibitions as a type of incentive? It seems to me more natural to group them with internal motivations than external incentives. (I understand that they sit in the same position in the argument as external incentives, but I guess I'm worried that lumping them together may somehow obscure things.)

I actually agree with quite a bit of this. (I nearly included a line about pursuing excellence in terms of time allocation, but — it seemed possibly-redundant with some of the other stuff on not making the perfect the enemy of the good, and I couldn't quickly see how to fit it cleanly into the flow of the post, so I left it and moved on ...)

I think it's important to draw the distinction between perfection and excellence. Broadly speaking, I think people often put too much emphasis on perfection, and often not enough on excellence.

Maybe I shouldn't have led... (read more)

No, multi-author submissions are welcome! (There's space to disclose this on the entry form.)

Can you say more about why you believe this? At first glance, it seems to be like "fundamental instability" is much more tied to how AI development goes, so I would've expected it to be more tractable [among LW users].

Maybe "simpler" was the wrong choice of word. I didn't really mean "more tractable". I just meant "it's kind of obvious what needs to happen (even if it's very hard to get it to happen)". Whereas with fundamental instability it's more like it's unclear if it's actually a very overdetermined fundamental instability, or what exactly could nudge... (read more)

Just a prompt to say that if you've been kicking around an idea of possible relevance to the essay competition on the automation of wisdom and philosophy, now might be the moment to consider writing it up -- entries are due in three weeks.

My take is that in most cases it's probably good to discuss publicly (but I wouldn't be shocked to become convinced otherwise).

The main plausible reason I see for it potentially being bad is if it were drawing attention to a destabilizing technology that otherwise might not be discovered. But I imagine most thoughts are kind of going to be chasing through the implications of obvious ideas. And I think that in general having the basic strategic situation be closer to common knowledge is likely to reduce the risk of war

(You might think the discussion ... (read more)

2Bird Concept
If AI ends up intelligent enough and with enough manufacturing capability to threaten nuclear deterrence; I'd expect it to also deduce any conclusions I would. So it seems mostly a question of what the world would do with those conclusions earlier, rather than not at all. A key exception is if later AGI would be blocked on certain kinds of manufacturing to create it's destabilizing tech, and if drawing attention to that earlier starts serially blocking work earlier.
1O O
All our discussions will be repeated ad nauseam in DoD boardrooms with people whose job it is to talk about info hazards. And I also doubt discussion here will move the needle much if Trump and Jake Paul have already digested these ideas.

The way I understand it could work is that democratic leaders with "democracy-aligned AI" would get more effective influence on nondemocratic figures (by fine-tuned persuasion or some kind of AI-designed political zugzwang or etc), thus reducing totalitarian risks. Is my understanding correct? 

Not what I'd meant -- rather, that democracies could demand better oversight of their leaders, and so reduce the risk of democracies slipping into various traps (corruption, authoritarianism).

8Inosen Infinity
Thanks! The idea sounds nice, but practically it may also occur to be a double edged sword. If there is an AI that could significantly help in oversight of decision-makers, then there is almost surely an AI that could help the decision-makers drive public opinion in their desired direction. And since leaders usually have more resources (network, money) than the public, I'd assume that this scenario has larger probability than the successful oversight scenario. Intuitively, way larger. I wonder how we could achieve oversight without getting controlled back in the process. Seems like a tough problem.

My mainline guess is that information about bad behaviour by Sam was disclosed to them by various individuals, and they owe a duty of confidence to those individuals (where revealing the information might identify the individuals, who might thereby become subject to some form of retaliation).

("Legal reasons" also gets some of my probability mass.)

2jacquesthibs
I think this sounds reasonable, but if this is true, why wouldn’t they just say this?

OK hmm I think I understand what you mean.

I would have thought about it like this:

  • "our reference class" includes roughly the observations we make before observing that we're very early in the universe
    • This includes stuff like being a pre-singularity civilization
  • The anthropics here suggest there won't be lots of civs later arising and being in our reference class and then finding that they're much later in universe histories
  • It doesn't speak to the existence or otherwise of future human-observer moments in a post-singularity civilization

... but as you say anthropics is confusing, so I might be getting this wrong.

I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I'd be happy to debate if you'd find it productive (although I'm not sure whether I'm disagreeable enough to be a good choice).

I think point 2 is plausible but doesn't super support the idea that it would eliminate the biosphere; if it cared a little, it could be fairly cheap to take some actions to preserve at least a version of it (including humans), even if starlifting the sun.

Point 1 is the argument which I most see as supporting the thesis that misaligned AI would eliminate humanity and the biosphere. And then I'm not sure how robust it is (it seems premised partly on translating our evolved intuitions about discount rates over to imagining the scenario from the perspective of the AI system).

Wait, how does the grabby aliens argument support this? I understand that it points to "the universe will be carved up between expansive spacefaring civilizations" (without reference to whether those are biological or not), and also to "the universe will cease to be a place where new biological civilizations can emerge" (without reference to what will happen to existing civilizations). But am I missing an inferential step?

3jaan
i might be confused about this but “witnessing a super-early universe” seems to support “a typical universe moment is not generating observer moments for your reference class”. but, yeah, anthropics is very confusing, so i’m not confident in this.

I think that you're right that people's jobs are a significant thing driving the difference here (thanks), but I'd guess that the bigger impact of jobs is via jobs --> culture than via jobs --> individual decisions. This impression is based on a sense of "when visiting Constellation, I feel less pull to engage in the open-ended idea exploration vs at FHI", as well as "at FHI, I think people whose main job was something else would still not-infrequently spend some time engaging with the big open questions of the day".

I might be wrong about that ¯\_(ツ)_/¯

I feel awkward about trying to offer examples because (1) I'm often bad at that when on the spot, and (2) I don't want people to over-index on particular ones I give. I'd be happy to offer thoughts on putative examples, if you wanted (while being clear that the judges will all ultimately assess things as seem best to them). 

Will probably respond to emails on entries (which might be to decline to comment on aspects of it).

I don't really disagree with anything you're saying here, and am left with confusion about what your confusion is about (like it seemed like you were offering it as examples of disagreement?).

(Caveat: it's been a while since I've visited Constellation, so if things have changed recently I may be out of touch.)

I'm not sure that Constellation should be doing anything differently. I think there's a spectrum of how much your culture is like blue-skies thinking vs highly prioritized on the most important things. I think that FHI was more towards the first end of this spectrum, and Constellation is more towards the latter. I think that there are a lot of good things that come with being further in that direction, but I do think it means you're less l... (read more)

(I work out of Constellation and am closely connected to the org in a bunch of ways)

I think you're right that most people at Constellation aren't going to seriously and carefully engage with the aliens-building-AGI question, but I think describing it as a difference in culture is missing the biggest factor leading to the difference: most of the people who work at Constellation are employed to do something other than the classic FHI activity of "self-directed research on any topic", so obviously aren't as inclined to engage deeply with it.

I think there also is a cultural difference, but my guess is that it's smaller than the effect from difference in typical jobs.

I completely agree that Oliver is a great fit for leading on research infrastructure (and the default thing I was imagining was that he would run the institute; although it's possible it would be even better if he could arrange to be number two with a strong professional lead, giving him more freedom to focus attention on new initiatives within the institute, that isn't where I'd start). But I was specifically talking about the "research lead" role. By default I'd guess people in this role would report to the head of the institute, but also have a lot of i... (read more)

3aysja
Huh, I feel confused. I suppose we just have different impressions. Like, I would say that Oliver is exceedingly good at cutting through the bullshit. E.g., I consider his reasoning around shutting down the Lightcone offices to be of this type, in that it felt like a very straightforward document of important considerations, some of which I imagine were socially and/or politically costly to make. One way to say that is that I think Oliver is very high integrity, and I think this helps with bullshit detection: it's easier to see how things don't cut to the core unless you deeply care about the core yourself. In any case, I think this skill carries over to object-level research, e.g., he often seems, to me, to ask cutting-to-the core type questions there, too. I also think he's great at argument: legible reasoning, identifying the important cruxes in conversations, etc., all of which makes it easier to tell the bullshit from the not.  I do not think of Oliver as being afraid to be disagreeable, and ime he gets to the heart of things quite quickly, so much so that I found him quite startling to interact with when we first met. And although I have some disagreements over Oliver's past walled-garden taste, from my perspective it's getting better, and I am increasingly excited about him being at the helm of a project such as this. Not sure what to say about his beacon-ness, but I do think that many people respect Oliver, Lightcone, and rationality culture more generally; I wouldn't be that surprised if there were an initial group of independent researcher types who were down and excited for this project as is. 

Makes sense! My inference was because the discussion at this stage is a high-level one about ways to set things up, but it does seem good to have space to discuss object-level projects that people might get into.

I agree in the abstract with the idea of looking for niches, and I think that several of these ideas have something to them. Nevertheless when I read the list of suggestions my overall feeling is that it's going in a slightly wrong direction, or missing the point, or something. I thought I'd have a go at articulating why, although I don't think I've got this to the point where I'd firmly stand behind it:

It seems to me like some of the central FHI virtues were:

  • Offering a space to top thinkers where the offer was pretty much "please come here and think about
... (read more)
2Chris_Leong
Just thought I'd add a second follow-up comment. You'd have a much better idea of what made FHI successful than I would. At the same time, I would bet that in order to make this new project successful - and be its own thing - it'd likely have to break at least one assumption behind what made old FHI work well.
2Chris_Leong
  I think my list appears more this way then I intended because I gave some examples of projects I would be excited by if they happened. I wasn't intending to stake out a strong position as to whether these projects should projects chosen by the institute vs. some examples of projects that it might be reasonable for a researcher to choose within that particular area.

I think FHI was an extremely special place and I was privileged to get to spend time there. 

I applaud attempts to continue its legacy. However, I'd feel gut-level more optimistic about plans that feel more grounded in thinking about how circumstances are different now, and then attempting to create the thing that is live and good given that, relative to attempting to copy FHI as closely as possible. 

Differences in circumstance

You mention not getting to lean on Bostrom's research taste as one driver of differences, and I think this is correct but ... (read more)

8Zach Stein-Perlman
What is Constellation missing or what should it do? (Especially if you haven't already told the Constellation team this.)
8Adam Scholl
For what it’s worth, my guess is that your pessimism is misplaced. Oliver certainly isn’t as famous as Bostrom, so I doubt he’d be a similar “beacon.” But I’m not sure a beacon is needed—historically, plenty of successful research institutions (e.g. Bells Labs, IAS, the Royal Society in most eras) weren’t led by their star researchers, and the track record of those that were strikes me as pretty mixed. Oliver spends most of his time building infrastructure for researchers, and I think he’s become quite good at it. For example, you are reading this comment on (what strikes me as) rather obviously the best-designed forum on the internet; I think the review books LessWrong made are probably the second-best designed books I’ve seen, after those from Stripe Press; and the Lighthaven campus is an exceedingly nice place to work. Personally, I think Oliver would probably be my literal top choice to head an institution like this.

Generally agree with most things in this comment. To be clear, I have been thinking about doing something in the space for many years, internally referring to it as creating an "FHI of the West", and while I do think the need for this is increased by FHI disappearing, I was never thinking about this as a clone of FHI, but was always expecting very substantial differences (due to differences in culture, skills, and broader circumstances in the world some of which you characterize above)

I wrote this post mostly because with the death of FHI it seemed to me t... (read more)

It's a fair point that wisdom might not be straightforwardly safety-increasing. If someone wanted to explore e.g. assumptions/circumstances under which it is vs isn't, that would certainly be within scope for the competition.

Multiple entries are very welcome!

[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I'm not sure exactly how we'd handle it if someone did the latter, but we'd aim for something sensible that didn't incentivise people to have been silly about it.]

Thanks, yes, I think that you're looking at things essentially the same way that I am. I particularly like your exploration of what the inner motions feel like; I think "unfixation" is a really good word.

I think that for most of what I'm saying, the meaning wouldn't change too much if you replaced the word "wholesome" with "virtuous" (though the section contrasting it with virtue ethics would become more confusing to read). 

As practical guidance, however, I'm deliberately piggybacking off what people already know about the words. I think the advice to make sure that you pay attention to ways in which things feel unwholesome is importantly different from (and, I hypothesize, more useful than) advice to make sure you pay attention to ways in which thing... (read more)

If you personally believe it to be wrong, it's unwholesome. But generically no. See the section on revolutionary action in the third essay.

I think this is essentially correct. The essays (especially the later ones) do contain some claims about ways in which it might or might not be useful; of course I'm very interested to hear counter-arguments or further considerations.

The most straightforward criterion would probably be "things they themselves feel to be mistakes a year or two later". That risks people just failing to own their mistakes so would only work with people I felt enough trust in to be honest with themselves. Alternatively you could have an impartial judge. (I'd rather defer to "someone reasonable making judgements" than try to define exactly what a mistake is, because the latter would cover a lot of ground and I don't think I'd do a good job of it; also my claims don't feel super sensitive to how mistakes are defined.)

I would certainly update in the direction of "this is wrong" if I heard a bunch of people had tried to apply this style of thinking over an extended period, I got to audit it a bit by chatting to them and it seemed like they were doing a fair job, and the outcome was they made just as many/serious mistakes as before (or worse!).

(That's not super practically testable, but it's something. In fact I'll probably end up updating some from smaller anecdata than that.)

1PeterL
Wow, thanks for your willingness to test/falsify your statements, and I apologize for my rash judgment. Your idea just sounded to me to be too good to be true, so I wanted to be cautious. And I would be glad to say I am completely satisfied with your answer. However, that is not the case yet, maybe just because the "mistakes" of the people trying to apply wholesomeness might still need a definition - a criterion according to which something is or is not a mistake.  However, if you provided such a definition, I might be another tester of this style of thinking.

I definitely agree that this fails as a complete formula for assessing what's good or bad. My feeling is that it offers an orientation that can be helpful for people aggregating stuff they think into all-things-considered judgements (and e.g. I would in retrospect have preferred to have had more of this orientation in the past).

If someone were using this framework to stop thinking about things that I thought they ought to consider, I couldn't be confident that they weren't making a good faith effort to act wholesomely, but I at least would think that their actions weren't wholesome by my lights.

Good question, my answer on this is nuanced (and I'm kind of thinking it through in response to your question).

I think that what feels to you to be wholesome will depend on your values. And I'm generally in favour of people acting according to their own feeling of what is wholesome.

On the other hand I also think there would be some choices of values that I would describe as "not wholesome". These are the ones which ignore something of what's important about some dimension (perhaps justifying ignoring it by saying "I just don't value this"), at least as fel... (read more)

1Sam FM
So I've been trying to get a clearer picture of what you mean by wholesomeness. So far I have: * Make an attempt to pay attention to the whole system, but stop at whatever point feels reasonable. * Don't exclude any important domains from things you care about. * Make these judgements based on your own values, and also the values that are felt-to-be-important by a good number of other people. * Wholesomeness is subjective to individual interpretation, so there aren't definitive right answers. * Certain tradeoffs of values are objectively unwholesome. There are definitive wrong answers. I don't think this is a useful model. The devil of all of this is in the interpretations of "reasonable" and "important" and "good." You say it's unwholesome when someone ignores what you think is important by saying "I don't value this". But this is exactly what your model is encouraging: consider everything, but stop whenever you personally feel like you've considered everything you value. The only safeguard against this is just biasing the status quo by labeling things unwholesome if enough people disagree.

I doubt this is very helpful for our carefully-considered ethical notions of what's good.

I think it may be helpful as a heuristic for helping people to more consistently track what's good, and avoid making what they'd later regard as mistakes.

I agree that "paying attention to the whole system" isn't literally a thing that can be done, and I should have been clearer about what I actually meant. It's more like "making an earnest attempt to pay attention to the whole system (while truncating attention at a reasonable point)". It's not that you literally get to attend to everything, it's that you haven't excluded some important domain from things you care about. I think habryka (quoting and expanding on Ben Pace's thoughts) has a reasonable description of this in a comment

I definitely don't ... (read more)

1Sam FM
I interpreted this concept of wholesomeness to be a least somewhat objective, but perhaps that's not the intention. Could you clarify how much wholesomeness is a subjective property relative to one's values, vs being a more objective property that would hold constant under different values? For example, say I lay out a business plan use X natural resources to build Y buildings, that will be used for Z purpose. Would you expect to be able to rate my wholesomeness without knowing how much I value things like nature, humanity, industrial progress, rural/urban lifestyle, etc? (assuming this business plan only covers some of these things, because considering all things isn't possible)

I think that there is some important unwholesomeness in these things, but that isn't supposed to mean that they're never permitted. (Sorry, I see how it could give that impression; but in the cases you're discussing there would often be greater unwholesomeness in not doing something.)

I discuss how I think my notion of wholesomeness intersects with these kind of examples in the section on visionary thought and revolutionary action in the third essay.

2markhank
I don’t agree with “wholesomeness” as a moral guide but I did at least understand it if you were defining it as conformity with the existing system. If I’ve understood you correctly now the maxim is “act wholesomely (conforming with prevailing rules and expectations) unless that wouldn’t in fact be wholesome (which in this context is defined differently, as meaning ‘having consideration for what is good for the whole’).” (Or to use your architectural analogy, build your building in line with the others unless there’s a good reason not to) That’s fine, as far as it goes, although we are asking “wholesome” to do a lot of work there with two meanings and ultimately it still ends up as being a synonym for “good” (or perhaps “good for the whole”). Ultimately if what you’re saying is that acting in line with established expectations is a good rule of thumb unless there’s a good reason not to, and that we should have ethical consideration for the whole (all entities deserving of moral consideration) then that’s hard to argue with. But it doesn’t move us much further on ethics because “good” is still undefined and the scope of those deserving ethical consideration is still undefined.

I think that there's something interesting here. One of the people I talked about this with asked me why children seem exceptionally wholesome (it's certainly not because they're unusually good at tracking the whole of things), and I thought the answer was about them being a part of the world where it may be especially important to avoid doing accidental harm, so our feelings of harms-to-children have an increased sense of unwholesomeness. But I'm now thinking that something like "robustly not evil" may be an important part of it.

Now we can trace out some ... (read more)

FWIW I quite like your way of pointing at things here, though maybe I'm more inclined towards letting things hang out for a while in the (conflationary?) alliance space to see which seem to be the deepest angles of what's going on in this vicinity, and doing more of the conceptual analysis a little later.

That said, if someone wanted to suggest a rewrite I'd seriously consider adopting it (or using it as a jumping-off point); I just don't think that I'm yet at the place where a rewrite will flow naturally for me.

I largely think that the section of the second essay on "wholesomeness vs expedience" is also applicable here.

Basically I agree that you sometimes have to not look at things, and I like your framing of the hard question of wholesomeness. I think that the full art of deciding when it's appropriate to not think about something be better discussed via a bunch of examples, rather than trying to describe it in generalities. But the individual decisions are ones that you can make wholesomely or not, and I think that's my current best guess approach for how to ha... (read more)

DALL·E. I often told it in abstract terms the themes I wanted to include, used prompts including "stylized and slightly abstract", and regenerated a few times till I got something I was happy with.

(There are also a few that I drew, but that's probably obvious.)

I'd be tempted to make it a question, and ask something like "what do you think the impacts of this on [me/person] are?".

It might be that question would already do work by getting them to think about the thing they haven't been thinking about. But it could also elicit a defence like "it doesn't matter because the mission is more important" in which case I'd follow up with an argument that it's likely worth at least understanding the impacts because it might help to find actions which are better on those grounds while being comparably good -- or even better -- for the mission. Or it might elicit a mistaken model of the impacts, in which case I'd follow up by saying that I thought it was mistaken and explaining how.

Maybe consider asking the authors if they'd want to volunteer a ?50? word summary for this purpose, and include summaries for those who do?

2Ben Pace
It's a nice idea to have an optional field on posts for the author to submit a summary with a max-length.
2habryka
My worry was that having summaries only inconsistently adds a lot of mental complexity to track to the page. Now only sometimes when you hover over something do you see some kind of preview, and if you have ~60 items on a single page, adding any kind of indicator for that quickly makes things very cluttered. And you would have to redesign the page quite a bit to have a good place for summaries without adding a huge amount of clutter or flashing or movement on the page.

Examples of EA errors as failures of wholesomeness

In this comment (cross-posted from the EA forum) I’ll share a few examples of things I mean as failures of wholesomeness. I don’t really mean to over-index on these examples. I actually feel like a decent majority of what I wish that EA had been doing differently relates to this wholesomeness stuff. However, I’m choosing examples that are particularly easy to talk about — around FTX and around mistakes I've made — because I have good visibility of them, and in order not to put other people on the spot. Alth... (read more)

Load More