All of Wei Dai's Comments + Replies

If you only care about the real world and you're sure there's only one real world, then the fact that you at time 0 would sometimes want to bind yourself at time 1 (e.g., physically commit to some action or self-modify to perform some action at time 1) seems very puzzling or indicates that something must be wrong, because at time 1 you're in a strictly better epistemic position, having found out more information about which world is real, so what sense does it make that your decision theory makes you-at-time-0 decide to override you-at-time-1's decision?

(I... (read more)

1Anthony DiGiovanni
Right, but 1-me has different incentives by virtue of this epistemic position. Conditional on being at the ATM, 1-me would be better off not paying the driver. (Yet 0-me is better off if the driver predicts that 1-me will pay, hence the incentive to commit.) I'm not sure if this is an instance of what you call "having different values" — if so I'd call that a confusing use of the phrase, and it doesn't seem counterintuitive to me at all.
Wei DaiΩ246238

Better control solutions make AI more economically useful, which speeds up the AI race and makes it even harder to do an AI pause.

When we have controlled unaligned AIs doing economically useful work, they probably won't be very useful for solving alignment. Alignment will still be philosophically confusing, and it will be hard to trust the alignment work done by such AIs. Such AIs can help solve some parts of alignment problems, parts that are easy to verify, but alignment as a whole will still be bottle-necked on philosophically confusing, hard to verify ... (read more)

2Knight Lee
I think if the first powerful unaligned AI remained in control instead of escaping, it might make a good difference, because we can engineer and test alignment ideas on it, rather than develop alignment ideas on an unknown future AI. This assumes at least some instances of it do not hide their misalignment very well.
8Mark Xu
My vague plan along these lines is to attempt as hard as possible to defer all philosophically confusing questions to the "long reflection", and to use AI control as a tool to help produce AIs that can help preserve long term option value (including philosophical option value) as best as possible. I seperately have hope we can solve "the entire problem" at some point, e.g. through ARC's agenda (which I spend most of my time trying to derisk and advance).
7ryan_greenblatt
People interested in a discussion about control with someone who is maybe closer to Wei Dai's perspective might be interested in my dialogue with habyrka.

Better control solutions make AI more economically useful, which speeds up the AI race and makes it even harder to do an AI pause.

[...]

Such AIs will probably be used to solve control problems for more powerful AIs, so the basic situation will continue and just become more fragile, with humans trying to control increasingly intelligent unaligned AIs.

It currently seems unlikely to me that marginal AI control research I'm excited about is very economically useful. I agree that some control or control-adjacent research will end up being at least somewhat ec... (read more)

0Noosphere89
I think a key difference is I do believe the technical alignment/control problem as defined essentially requires no philosophical progress or solving philosophical problems like the hard problem of consciousness, and I believe the reason for this comes down to both a general point and a specific point. In general, one of the reasons I believe philosophy tends not to be a productive area compared to other branches of science is that usually they either solve essentially proven to be intractable problems nowadays, or they straight up tried to solve a problem in far too much generality without doing any experiments, and that's when they aren't straight up solving fictional problems (I believe a whole lot of possible world philosophizing is in that category). This is generally because philosophers do far too much back-chaining compared to front-chaining on a lot of problems. For the specific point of alignment/control agendas, it's because that the problem of AI alignment isn't a problem about what goals you should assign it, but rather whether you can put in goals into the AI system such that the AI will reliably follow your goals at all.

And I agree with Bryan Caplan's recent take that friendships are often a bigger conflict of interest than money, so Open Phil higher-ups being friends with Anthropic higher-ups is troubling.

No kidding. From https://www.openphilanthropy.org/grants/openai-general-support/:

OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.

Wish OpenPhil and EAs in general were more willing to reflect/talk publicly about their mistake... (read more)

To be clear, by "indexical values" in that context I assume you mean indexing on whether a given world is "real" vs "counterfactual," not just indexical in the sense of being egoistic? (Because I think there are compelling reasons to reject UDT without being egoistic.)

I think being indexical in this sense (while being altruistic) can also lead you to reject UDT, but it doesn't seem "compelling" that one should be altruistic this way. Want to expand on that?

1Anthony DiGiovanni
(I might not reply further because of how historically I've found people seem to simply have different bedrock intuitions about this, but who knows!) I intrinsically only care about the real world (I find the Tegmark IV arguments against this pretty unconvincing). As far as I can tell, the standard justification for acting as if one cares about nonexistent worlds is diachronic norms of rationality. But I don't see an independent motivation for diachronic norms, as I explain here. Given this, I think it would be a mistake to pretend my preferences are something other than what they actually are.

Maybe breaking up certain biofilms held together by Ca?

Yeah there's a toothpaste on the market called Livfree that claims to work like this.

IIRC, high EDTA concentration was found to cause significant amounts of erosion.

Ok, that sounds bad. Thanks.

ETA: Found an article that explains how Livfree works in more detail:

Tooth surfaces are negatively charged, and so are bacteria; therefore, they should repel each other. However, salivary calcium coats the negative charges on the tooth surface and bacteria, allowing them to get very close (within 10 nm).

... (read more)
2bhauth
If you just want to make the tooth surface more negatively charged...a salt of poly(acrylic acid) seems better for that. And I think some toothpastes have that.

I actually no longer fully endorse UDT. It still seems a better decision theory approach than any other specific approach that I know, but it has a bunch of open problems and I'm not very confident that someone won't eventually find a better approach that replaces it.

To your question, I think if my future self decides to follow (something like) UDT, it won't be because I made a "commitment" to do it, but because my future self wants to follow it, because he thinks it's the right thing to do, according to his best understanding of philosophy and normativity... (read more)

1Anthony DiGiovanni
Thanks for clarifying!  To be clear, by "indexical values" in that context I assume you mean indexing on whether a given world is "real" vs "counterfactual," not just indexical in the sense of being egoistic? (Because I think there are compelling reasons to reject UDT without being egoistic.)

Any thoughts on edathamil/EDTA or nano-hydroxyapatite toothpastes?

4bhauth
EDTA in toothpaste? It chelates iron and calcium. Binding iron can prevent degradation during storage, so a little bit is often added. Are you talking about adding a lot more? For what purpose? In situations where you can chelate iron to prevent bacterial growth, you can also just kill bacteria with surfactants. Maybe breaking up certain biofilms held together by Ca? EDTA doesn't seem very effective for that for teeth, but also, chelating agents that could strip Ca from biofilms would also strip Ca from teeth. IIRC, high EDTA concentration was found to cause significant amounts of erosion. I wouldn't want to eat a lot of EDTA, anyway. Iminodisuccinate seems less likely to have problematic metabolites.

This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.

Are you imagining that the alignment problem is still unsolved in the future, such that all of these AIs are independent agents unaligned with each other (like humans currently are)? I guess in my imagined world, ASIs will have solved the alignment (or maybe control) problem at least for less intelligent agents, so you'd get large groups of AIs aligned wi... (read more)

It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?

At this point, it becomes ... (read more)

5Matthew Barnett
There are a few key pieces of my model of the future that make me think humans can probably retain significant amounts of property, rather than having it suddenly stolen from them as the result of other agents in the world solving a specific coordination problem. These pieces include: 1. Not all AIs in the future will be superintelligent. More intelligent models appear to require more computation to run. This is both because smarter models are larger (in parameter count) and use more inference time (such as OpenAI's o1). To save computational costs, future AIs will likely be aggressively optimized to only be as intelligent as they need to be, and no more. This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level. 2. As a result of the previous point, your statement that "ASIs produce all value in the economy" will likely not turn out correct. This is all highly uncertain, but I find it plausible that ASIs might not even be responsible for producing the majority of GDP in the future, given the possibility of a vastly more numerous population of less intelligent AIs that automate simpler tasks than the ones ASIs are best suited to do. 3. The coordination problem you described appears to rely on a natural boundary between the "humans that produce ~nothing" and "the AIs that produce everything". Without this natural boundary, there is no guarantee that AIs will solve the specific coordination problem you identified, rather than another coordination problem that hits a different group. Non-uploaded humans will differ from AIs by being biological and by being older, but they will not necessarily differ from AIs by being less intelligent. 4. Therefore, even if future agents decide to solve a specific coordination problem that allows them to steal wealth from unproductive agents, it is not clear that this will take the form of
  1. Why? Perhaps we'd do it out of moral uncertainty, thinking maybe we owe something to our former selves, but future people probably won't think this.
  2. Currently our utility is roughly log in money, partly because we spend money on instrumental goals and there's diminishing returns due to limited opportunities being used up. This won't be true of future utilitarians spending resources on their terminal values. So "one in hundred million fraction" of resources is a much bigger deal to them than to us.
4nc
This is a very strong assertion. Aren't most people on this forum, when making present claims about what they would like to happen in the future, trying to form this contract? (This comes back to the value lock-in debate.)

I have a slightly different take, which is that we can't commit to doing this scheme even if we want to, because I don't see what we can do today that would warrant the term "commitment", i.e., would be binding on our post-singularity selves.

In either case (we can't or don't commit), the argument in the OP loses a lot of its force, because we don't know whether post-singularity humans will decide to do this kind scheme or not.

4avturchin
Young unaligned AI will also not know if post-singularity humans will follow the commitment, so it will estimate its chances as 0.5, and in this case, the young AI will still want to follow the deal.

So the commitment I want to make is just my current self yelling at my future self, that "no, you should still bail us out even if 'you' don't have a skin in the game anymore". I expect myself to keep my word that I would probably honor a commitment like that, even if trading away 10 planets for 1 no longer seems like that good of an idea.

This doesn't make much sense to me. Why would your future self "honor a commitment like that", if the "commitment" is essentially just one agent yelling at another agent to do something the second agent doesn't want to... (read more)

2Anthony DiGiovanni
I strongly agree with this, but I'm confused that this is your view given that you endorse UDT. Why do you think your future self will honor the commitment of following UDT, even in situations where your future self wouldn't want to honor it (because following UDT is not ex interim optimal from his perspective)?
1David Matolcsi
I agree you can't make actually binding commitments. But I think the kid-adult example is actually a good illustration of what I want to do: if a kid makes a solemn commitment to spend one in hundred million fraction of his money on action figures when he becomes a rich adult, I think that would usually work. And that's what we are asking from our future selves.
5Ben Pace
I have known non-zero adults to make such commitments to themselves. (But I agree it is not the typical outcome, and I wouldn't believe most people if they told me they would follow-through.)

Over time I have seen many people assert that “Aligned Superintelligence” may not even be possible in principle. I think that is incorrect and I will give a proof - without explicit construction - that it is possible.

The meta problem here is that you gave a "proof" (in quotes because I haven't verified it myself as correct) using your own definitions of "aligned" and "superintelligence", but if people asserting that it's not possible in principle have different definitions in mind, then you haven't actually shown them to be incorrect.

2Roko
I don't see how anyone could possibly argue with my definitions.

Apparently the current funding round hasn't closed yet and might be in some trouble, and it seems much better for the world if the round was to fail or be done at a significantly lower valuation (in part to send a message to other CEOs not to imitate SamA's recent behavior). Zvi saying that $150B greatly undervalues OpenAI at this time seems like a big unforced error, which I wonder if he could still correct in some way.

What hunches do you currently have surrounding orthogonality, its truth or not, or things near it?

I'm very uncertain about it. Have you read Six Plausible Meta-Ethical Alternatives?

as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what's happening in a way that corrupts thoughts which previously implemented values.

Yeah, agreed that how to safely amplify oneself and reflect for long periods of time may be hard problems that should be solved (... (read more)

As a tangent to my question, I wonder how many AI companies are already using RLAIF and not even aware of it. From a recent WSJ story:

Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.

When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.

So ... (read more)

2abramdemski
yyyep

but we only need one person or group who we’d be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this.

Why do you think this, and how would you convince skeptics? And there are two separate issues here. One is how to know their CEV won't be corrupted relative to what their values really are or should be, and the other is how to know that their real/normative values are actually highly altruistic. It seems hard to know both of these, and perhaps even harder to persuade others who may be very dist... (read more)

4the gears to ascension
some fragments: What hunches do you currently have surrounding orthogonality, its truth or not, or things near it? re: hard to know - it seems to me that we can't get a certifiably-going-to-be-good result from a CEV based ai solution unless we can make it certifiable that altruism is present. I think figuring out how to write down some form of what altruism is, especially altruism in contrast to being-a-pushover, is necessary to avoid issues - because even if any person considers themselves for CEV, how would they know they can trust their own behavior? as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what's happening in a way that corrupts thoughts which previously implemented values. can we find a CEV-grade alignment solution that solves the self-and-other alignment problems in humans as well, such that this CEV can be run on any arbitrary chunk of matter and discover its "true wants, needs, and hopes for the future"?

AI companies don't seem to be shy about copying RLHF though. Llama, Gemini, and Grok are all explicitly labeled as using RLHF.

It's also not clear to me that most of the value of AI will accrue to them. I'm confused about this though.

I'm also uncertain, and its another reason for going long a broad index instead. I would go even broader than S&P 500 if I could, but nothing else has option chains going out to 2029.

If indeed OpenAI does restructure to the point where its equity is now genuine, then $150 billion seems way too low as a valuation

Why is OpenAI worth much more than $150B, when Anthropic is currently valued at only $30-40B? Also, loudly broadcasting this reduces OpenAI's cost of equity, which is undesirable if you think OpenAI is a bad actor.

Apparently the current funding round hasn't closed yet and might be in some trouble, and it seems much better for the world if the round was to fail or be done at a significantly lower valuation (in part to send a message to other CEOs not to imitate SamA's recent behavior). Zvi saying that $150B greatly undervalues OpenAI at this time seems like a big unforced error, which I wonder if he could still correct in some way.

6AnthonyC
If I had to guess, my answer would center around "Microsoft" for OpenAI and "Maybe actually taking commitments seriously enough to impede expectations of future profit and growth," for Anthropic. While this may or may not happen, if we really expect multi-GW data center demand to be a limiting factor in growth, then "Our backer can get Three Mile Island reopened" is a pretty big value add. It's possible Amazon would be able and willing to do that, but they've got ~4x less invested than Microsoft, and Google would presumably prioritize Gemini and DeepMind over Anthropic.

To clarify, I don't actually want you to scare people this way, because I don't know if people can psychologically handle it or if it's worth the emotional cost. I only bring it up myself to counteract people saying things like "AIs will care a little about humans and therefore keep them alive" or when discussing technical solutions/ideas, etc.

Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.

5Zack_M_Davis
This is a difficult topic (in more ways than one). I'll try to do a better job of addressing it in a future post.

My reply to Paul at the time:

If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?

From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of

... (read more)
2Zack_M_Davis
Was my "An important caveat" parenthetical paragraph sufficient, or do you think I should have made it scarier?

I'm thinking that the most ethical (morally least risky) way to "insure" against a scenario in which AI takes off and property/wealth still matters is to buy long-dated far out of the money S&P 500 calls. (The longest dated and farthest out of the money seems to be Dec 2029 10000-strike SPX calls. Spending $78 today on one of these gives a return of $10000 if SPX goes to 20000 by Dec 2029, for example.)

My reasoning here is that I don't want to provide capital to AI industries or suppliers because that seems wrong given what I judge to be high x-risk th... (read more)

3Tao Lin
Do these options have a chance to default / are the sellers stable enough?
3Alexander Gietelink Oldenziel
It seems hard to buy AI companies at the moment. The only way is to buy tech giants like Microsoft, Google, nVidea which are already valuad very highly - seems like's it's somewhat priced in. It's also not clear to me that most of the value of AI will accrue to them. I'm confused about this though.  It would seem one would want to buy Nasdaq rather than SPX? On the other hand, maybe most tech companies will be wiped out by AI - it's the world of atoms that would gain relative value. 
8Joey KL
This probably does help capitalize AI companies a little bit, demand for call options will create demand for the underlying. This is probably a relatively small effect (?), but I'm not confident in my ability to estimate this at all.
Wei DaiΩ17396

What is going on with Constitution AI? Does anyone know why no LLM aside from Claude (at least none that I can find) has used it? One would think that if it works about as well as RLHF (which it seems to), AI companies would be flocking to it to save on the cost of human labor?

Also, apparently ChatGPT doesn't know that Constitutional AI is RLAIF (until I reminded it) and Gemini thinks RLAIF and RLHF are the same thing. (Apparently not a fluke as both models made the same error 2 out of 3 times.)

6Wei Dai
As a tangent to my question, I wonder how many AI companies are already using RLAIF and not even aware of it. From a recent WSJ story: So they detected the cheating that time, but in RLHF how would they know if contractors used AI to select which of two AI responses is more preferred? BTW here's a poem(?) I wrote for Twitter, actually before coming across the above story:

Isn't the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).

8Vladimir_Nesov
These posts might be relevant: * A recipe for frontier model post-training * Futures of the data foundry business model The details of Constitutional AI seem highly contingent, while the general idea is simply automation of data for post-training, so that the remaining external input is the "constitution". In the original paper there are recipes both for instruction tuning data and for preference data. RLAIF is essentially RLHF that runs on synthetic preference data, maybe together with a recipe for generating it. But preference data could also be used to run DPO or something else, in which case RLAIF becomes a misnomer for describing automation of that preference data. Llama 3 report suggests that instruction tuning data can be largely automated, but human preference data is still better. And data foundry business is still alive, so a lot of human data is at least not widely recognized as useless. But it's unclear if future models won't soon do better than humans at labeling, or possibly already do better at some leading labs. Meta didn't have a GPT-4 level model as a starting point before Llama 3, and then there are the upcoming 5e26 FLOPs models, and o1-like reasoning models.
5Nathan Helm-Burger
Maybe others are using it in secret but don't want to admit it for some reason? I can't find any mention of Anthropic having filed a patent on the idea, but maybe other companies would feel too much like it looked like they were second-rate imitators if they said they were copying Anthropic's idea? Just speculating, I don't know. Sure seems like a useful idea to copy.
  1. Once they get into CEV, they may not want to defer to others anymore, or may set things up with a large power/status imbalance between themselves and everyone else which may be detrimental to moral/philosophical progress. There are plenty of seemingly idealistic people in history refusing to give up or share power once they got power. The prudent thing to do seems to never get that much power in the first place, or to share it as soon as possible.
  2. If you're pretty sure you will defer to others once inside CEV, then you might as well do it outside CEV due to #1 in my grandparent comment.
4Tamsin Leake
1. I wonder how much of those seemingly idealistic people retained power when it was available because they were indeed only pretending to be idealistic. Assuming one is actually initially idealistic but then gets corrupted by having power in some way, one thing someone can do in CEV that you can't do in real life is reuse the CEV process to come up with even better CEV processes which will be even more likely to retain/recover their just-before-launching-CEV values. Yes, many people would mess this up or fail in some other way in CEV; but we only need one person or group who we'd be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this. Importantly, to me, this reduces outer alignment to "find someone smart and reasonable and likely to have good goal-content integrity", which is a matter of social & psychology that seems to be much smaller than the initial full problem of formal outer alignment / alignment target design. 2. One of the main reasons to do CEV is because we're gonna die of AI soon, and CEV is a way to have infinite time to solve the necessary problems. Another is that even if we don't die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.

The main asymmetries I see are:

  1. Other people not trusting the group to not be corrupted by power and to reflect correctly on their values, or not trusting that they'll decide to share power even after reflecting correctly. Thus "programmers" who decide to not share power from the start invite a lot of conflict. (In other words, CEV is partly just trying to not take power away from people, whereas I think you've been talking about giving AIs more power than they already have. "the sort of influence we imagine intentionally giving to AIs-with-different-valu
... (read more)
2Tamsin Leake
It doesn't have to be by themselves; they can defer to others inside CEV, or come up with better schemes that their initial CEV inside CEV and then defer to that. Whatever other solutions than "solve everything on your own inside CEV" might exist, they can figure those out and defer to them from inside CEV. At least that's the case in my own attempts at implementing CEV in math (eg QACI).
Wei DaiΩ11230

About a week ago FAR.AI posted a bunch of talks at the 2024 Vienna Alignment Workshop to its YouTube channel, including Supervising AI on hard tasks by Jan Leike.

What do you think about my positions on these topics as laid out in and Six Plausible Meta-Ethical Alternatives and Ontological Crisis in Humans?

My overall position can be summarized as being uncertain about a lot of things, and wanting (some legitimate/trustworthy group, i.e., not myself as I don't trust myself with that much power) to "grab hold of the whole future" in order to preserve option value, in case grabbing hold of the whole future turns out to be important. (Or some other way of preserving option value, such as preserving the status quo / doin... (read more)

6jessicata
re meta ethical alternatives: 1. roughly my view 2. slight change, opens the question of why the deviations? are the "right things to value" not efficient to value in a competitive setting? mostly I'm trying to talk about those things to value that go along with intelligence, so it wouldn't correspond with a competitive disadvantage in general. so it's still close enough to my view 3. roughly Yudkowskian view, main view under which the FAI project even makes sense. I think one can ask basic questions like which changes move towards more rationality on the margin, though such changes would tend to prioritize rationality over preventing value drift. I'm not sure how much there are general facts about how to avoid value drift (it seems like the relevant kind, i.e. value drift as part of becoming more rational/intelligent, only exists from irrational perspectives, in a way dependent on the mind architecture) 4. minimal CEV-realist view. it really seems up to agents how much they care about their reflected preferences. maybe changing preferences too often leads to money pumps, or something? 5. basically says "there are irrational and rational agents, rationality doesn't apply to irrational agents", seems somewhat how people treat animals (we don't generally consider uplifting normative with respect to animals) 6. at this point you're at something like ecology / evolutionary game theory, it's a matter of which things tend to survive/reproduce and there aren't general decision theories that succeed re human ontological crises: basically agree, I think it's reasonably similar to what I wrote. roughly my reason for thinking that it's hard to solve is that the ideal case would be something like a universal algebra homomorphism (where the new ontology actually agrees with the old one but is more detailed), yet historical cases like physics aren't homomorphic to previous ontologies in this way, so there is some warping necessary. you could try putting a metric on the wa

As long as all mature superintelligences in our universe don't necessarily have (end up with) the same values, and only some such values can be identified with our values or what our values should be, AI alignment seems as important as ever. You mention "complications" from obliqueness, but haven't people like Eliezer recognized similar complications pretty early, with ideas such as CEV?

It seems to me that from a practical perspective, as far as what we should do, your view is much closer to Eliezer's view than to Land's view (which implies that alignment ... (read more)

"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.

CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though it's not mathematically formalized, any formalization would be subject to doubt, and even if formalized, we need logical uncertainty over it, and logical induction ... (read more)

I think the relevant implication from the thought experiment is that thinking a bunch about metaethics and so on will in practice change your values

I don't think that's necessarily true. For example some people think about metaethics and decide that anti-realism is correct and they should just keep their current values. I think that's overconfident but it does show that we don't know whether correct thinking about metaethics necessarily leads one to change one's values. (Under some other metaethical possibilities the same is also true.)

Also, even if it ... (read more)

5lumpenspace
I'm trying to understand where the source of disagreement lies, since I don't really see much "overconfidence" - ie, i don't see much of a probabilistic claim at all. Let me know if one of these suggestion points somewhere close to the right direction:   * The texts cited were mostly a response to the putative inevitability of orthogonalism. Once that was (i think effectively) dispatched, one might consider that part of the argument closed. After that, one could excuse him for being less rigorous/have more fun with the rest; the goal there was not to debate but to allow the reader to experience what something akin to will-to-think would be like (im aware this is frowned upon in some circles);  * The crux of the matter, imo, is not that thinking a lot about meta-ethics changes your values. Rather, that an increase in intelligence does - and namely, it changes them in the direction of greater appreciation for complexity and desire for thinking, and this change takes forms unintelligible to those one rung below. Of course, here the argument is either inductive/empirical or kinda neoplatonic. I will spare you the latter version, but the former would look something like: - Imagine a fairly uncontroversial intelligence-sorted line-up, going: thermostat → mosquito → rat(🐭) → chimp → median human  → rat(Ω) - Notice how intelligence grows together with the desire for more complexity, with curiosity, and ultimately with the drive towards increasing intelligence, per se: and notice also how morality evolves to accommodate those drives (one really wouldn't want those on the left of wherever one stands to impose their moral code to those on the right). While I agree these sort of arguments don't cut it for a typical post-analytical, lesswrong-type debate, I still think that, at the very least, Occam's razor should strongly slash their way - unless there's some implicit counterargument i missed. (As for the opportunity cost of deepening your familiarity wi

This made me curious enough to read Land's posts on the orthogonality thesis. Unfortunately I got a pretty negative impression from them. From what I've read, Land tends to be overconfident in his claims and fails to notice obvious flaws in his arguments. Links for people who want to judge for themselves (I had to dig up archive.org links as the original site has disappeared):

... (read more)
4lumpenspace
I'm not sure I agree - in the original thought experiment, it was a given that increasing intelligence would lead to changes in values in ways that the agent, at t=0, would not understand or share. At this point, one could decide whether to go for it or hold back - and we should all consider ourself lucky that our early sapiens predecessors didn't take the second option. (btw, I'm very curious to know what you make of this other Land text: https://etscrivner.github.io/cryptocurrent/

I think the relevant implication from the thought experiment is that thinking a bunch about metaethics and so on will in practice change your values; the pill itself is not very realistic, but thinking can make people smarter and will cause value changes. I would agree Land is overconfident (I think orthogonal and diagonal are both wrong models).

  1. You can also make an argument for not taking over the world on consequentialist grounds, which is that nobody should trust themselves to not be corrupted by that much power. (Seems a bit strange that you only talk about the non-consequentialist arguments in footnote 1.)
  2. I wish this post also mentioned the downsides of decentralized or less centralized AI (such as externalities and race dynamics reducing investment into safety, potential offense/defense imbalances, which in my mind are just as worrisome as the downsides of centralized AI), even if you don'
... (read more)
2owencb
1. Yeah. As well as another consequentialist argument, which is just that it will be bad for other people to be dominated. Somehow the arguments feel less natively consequentialist, and so it seems somehow easier to hold them in these other frames, and then translate them into consequentialist ontology if that's relevant; but also it would be very reasonable to mention them in the footnote. 2. My first reaction was that I do mention the downsides. But I realise that that was a bit buried in the text, and I can see that that could be misleading about my overall view. I've now edited the second paragraph of the post to be more explicit about this. I appreciate the pushback.

I'm actually pretty confused about what they did exactly. From the Safety section of Learning to Reason with LLMs:

Chain of thought reasoning provides new opportunities for alignment and safety. We found that integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles. By teaching the model our safety rules and how to reason about them in context, we found evidence of reasoning capability directly benefiting model robustness: o1-preview achieved substantially im

... (read more)
4RogerDearnaley
I suspect the reason for hiding the chain of thought is some blend of: a) the model might swear or otherwise do a bad thing, hold a discussion with itself, and then decide it shouldn't have done that bad thing, and they're more confident that they can avoid the bad thing getting into the summary than that they can backtrack and figure out exactly which part of the CoT needs to be redacted, and b) they don't want other people (especially open-source fine-tuners) to be able to fine-tine on their CoT and distill their very-expensive-to-train reasoning traces I will be interested to see how fast jailbreakers make headway on exposing either a) or b)
5eggsyntax
I was puzzled by that latter section (my thoughts in shortform here). Buck suggests that it may be mostly a smokescreen around 'We don't want to show the CoT because competitors would fine-tune on it'. That's my guess (without any inside information): the model knows the safety rules and can think-out-loud about them in the CoT (just as it can think about anything else) but they're not fine-tuning on CoT content for ‘policy compliance or user preferences’.

If the other player is a stone with “Threat” written on it, you should do the same thing, even if it looks like the stone’s behavior doesn’t depend on what you’ll do in response. Responding to actions and ignoring the internals when threatened means you’ll get a lot fewer stones thrown at you.

In order to "do the same thing" you either need the other's player's payoffs, or according to the next section "If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat!" So if all you see is a stone, then presuma... (read more)

1Christopher King
There is a no free lunch theorem for this. LDT (and everything else) can be irrational
3Mikhail Samin
By a stone, I meant a player with very deterministic behavior in a game with known payoffs, named this way after the idea of cooperate-stones in prisoner’s dilemma (with known payoffs). I think to the extent there’s no relationship between giving in to a boulder/implemeting some particular decision theory and having this and other boulders thrown at you, UDT and FDT by default swerve (and probably don't consider the boulders to be threatening them, and it’s not very clear in what sense this is “giving in”); to the extent it sends more boulders their way, they don’t swerve. If making decisions some way incentivizes other agents to become less like LDTs and more like uncooperative boulders, you can simply not make decisions that way. (If some agents actually have an ability to turn into animals and you can’t distinguish the causes behind an animal running at you, you can sometimes probabilistically take out your anti-animal gun and put them to sleep.) Do you maybe have a realistic example where this would realistically be a problem? I’d be moderately surprised if UDT/FDT consider something to be a better policy than what’s described in the post. Edit: to add, LDTs don't swerve to boulders that were created to influence the LDT agent's responses. If you turn into a boulder because you expect some agents among all possible agents to swerve, this is a threat, and LDTs don't give in to those boulders (and it doesn't matter whether or not you tried to predict the behavior of LDTs in particular). If you believed LDT agents or agents in general would swerve against a boulder, and that made you become a boulder, LDT agents obviously don't swerve to that boulder. They might swerve to boulders that are actually natural boulders caused by the very simple physics no one influenced to cause the agents to do something. They also pay their rent- because they'd be evicted otherwise, not for the reason of getting rent from them under the threat of eviction but for the reason of g
6Thomas Kwa
I recall Eliezer saying this was an open problem, at a party about a year ago.

#1 has obviously happened. Nordstream 1 was blown up within weeks of my OP, and AFAIK Russian hasn't substantially expanded its other energy exports. Less sure about #2 and #3, as it's hard to find post-2022 energy statistics. My sense is that the answers are probably "yes" but I don't know how to back that up without doing a lot of research.

However coal stocks (BTU, AMR, CEIX, ARCH being the main pure play US coal stocks) haven't done as well as I had expected (the basket is roughly flat from Aug 2022 to today) for two other reasons: A. There have been tw... (read more)

Wei DaiΩ174111

Unfortunately this ignores 3 major issues:

  1. race dynamics (also pointed out by Akash)
  2. human safety problems - given that alignment is defined "in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy", why should we believe that AI developers and/or parts of governments that can coerce AI developers will steer the AI systems in a good direction? E.g., that they won't be corrupted by power or persuasion or distributional shift, and are benevolent to begin with.
  3. philosophical errors or bottlenecks - th
... (read more)

Just riffing on this rather than starting a different comment chain:

If alignment is "get AI to follow instructions" (as typically construed in a "good enough" sort of way) and alignment is "get AI to do good things and not bad things," (also in a "good enough" sort of way, but with more assumed philosophical sophistication) I basically don't care about anyone's safety plan to get alignment except insofar as it's part of a plan to get alignment.

Philosophical errors/bottlenecks can mean you don't know how to go from 1 to 2. Human safety pr... (read more)

I think there’s a steady stream of philosophy getting interested in various questions in metaphilosophy

Thanks for this info and the references. I guess by "metaphilosophy" I meant something more meta than metaethics or metaepistemology, i.e., a field that tries to understand all philosophical reasoning in some unified or systematic way, including reasoning used in metaethics and metaepistemology, and metaphilosophy itself. (This may differ from standard academic terminology, in which case please let me know if there's a preferred term for the concept I'... (read more)

Sadly, I don't have any really good answers for you.

Thanks, it's actually very interesting and important information.

I don't know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.

I've noticed (and stated in the OP) that normative ethics seems to be an exception where it's common to express uncertainty/confusion/difficulty. But I think, from both my inside and outside views, that this should be common in most philosophical ... (read more)

3Simon Goldstein
I think there's a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml.  There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate previous work in metaethics https://plato.stanford.edu/entries/metaepistemology/.   Sorry for being unclear, I meant that calling for a pause seems useless because it won't happen. I think calling for the pause has opportunity cost because of limited attention and limited signalling value; reputation can only be used so many times; better to channel pressure towards asks that could plausibly get done.  
4Ben Pace
FTR I'd probably be up for helping out logistically with such an open letter (e.g. making the website and any other parts of it). I previously made this open letter.

Thank you for your view from inside academia. Some questions to help me get a better sense of what you see:

  1. Do you know any philosophers who switched from non-meta-philosophy to metaphilosophy because they become convinced that the problems they were trying to solve are too hard and they needed to develop a better understanding of philosophical reasoning or better intellectual tools in general? (Or what's the closest to this that you're aware of?)
  2. Do you know any philosophers who have expressed an interest in ensuring that future AIs will be philosophical
... (read more)
9William D'Alessandro
Another academic philosopher, directed here by @Simon Goldstein. Hello Wei!  1. It's not common to switch entirely to metaphilosophy, but I think lots of us get more interested in the foundations and methodology of at least our chosen subfields as we gain experience, see where progress is(n't) being made, start noticing deep disagreements about the quality of different kinds of work, and so on. It seems fair to describe this as awakening to a need for better tools and a greater understanding of methods. I recently wrote a paper about the methodology of one of my research areas, philosophy of mathematical practice, for pretty much these reasons. 2. Current LLMs are pretty awful at discussing the recent philosophy literature, so I think anyone who'd like AI tools to serve as useful research assistants would be happy to see at least some improvement here! I'm personally also excited about the prospects of using language models with bigger context windows for better corpus analysis work in empirical and practice-oriented parts of philosophy. 3. I basically agree with Simon on this. 4. I don't think this is uncommon. You might not see these reversals in print often, because nobody wants to publish and few people want to read a paper that just says "I retract my previous claims and no longer have a confident positive view to offer". But my sense is that philosophers often give up on projects because the problems are piling up and they no longer see an appealing way forward. Sometimes this happens more publicly. Hilary Putnam, one of the most influential philosophers of the later 20th century, was famous for changing his mind about scientific realism and other basic metaphysical issues. Wesley Salmon gave up his influential "mark transmission" account of causal explanation due to counterexamples raised by Kitcher (as you can read here). It would be easy enough to find more examples.
4Simon Goldstein
Great questions. Sadly, I don't have any really good answers for you. 1. I don't know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics. 2. I do not, except for the end of Superintelligence 3. Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don't know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost. 4. I think few of my friends in philosophy have ardently abandoned a research project they once pursued because they decided it wasn't the right approach. I suspect few researchers do that. In my own case, I used to work in an area called 'dynamic semantics', and one reason I've stopped working on that research project is that I became pessimistic that it had significant advantages over its competitors. 

My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn't convince them (or someone else) to join you. To me this doesn't clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if... (read more)

If you say to someone

Ok, so, there's this thing about AGI killing everyone. And there's this idea of avoiding that by making AGI that's useful like an AGI but doesn't kill everyone and does stuff we like. And you say you're working on that, or want to work on that. And what you're doing day to day is {some math thing, some programming thing, something about decision theory, ...}. What is the connection between these things?

and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into... (read more)

(Upvoted since your questions seem reasonable and I'm not sure why you got downvoted.)

I see two ways to achieve some justifiable confidence in philosophical answers produced by superintelligent AI:

  1. Solve metaphilosophy well enough that we achieve an understanding of philosophical reasoning on par with mathematical reason, and have ideas/systems analogous to formal proofs and mechanical proof checkers that we can use to check the ASI's arguments.
  2. We increase our own intelligence and philosophical competence until we can verify the ASI's reasoning ourselves.

Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done "so much more, much more intently, and much sooner"?

5TsviBT
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn't work (or isn't working, or isn't working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings--but the rest of the team either disagreed with me or didn't understand me, and on my own I'm just not that good a thinker, and I didn't find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting--i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.

I've had this tweet pinned to my Twitter profile for a while, hoping to find some like-minded people, but with 13k views so far I've yet to get a positive answer (or find someone expressing this sentiment independently):

Among my first reactions upon hearing "artificial superintelligence" were "I can finally get answers to my favorite philosophical problems" followed by "How do I make sure the ASI actually answers them correctly?"

Anyone else reacted like this?

This aside, there are some people around LW/rationality who seem more cautious/modest/self-crit... (read more)

3M. Y. Zuo
What makes you think there are any such ‘answers’, renderable in a form that you could identify? And even if they do exist, why do you think a human being could fully grasp the explanation in finite time? Edit: It seems quite possible that even the simplest such ‘answers’ could require many years of full time effort to understand, putting it beyond most if not all human memory capacity. i.e. By the end even those who ‘learned’ it will have forgotten many parts near the beginning.
4TsviBT
Yeah that was not my reaction. (More like "that's going to be the most beautiful thing ever" and "I want to be that too".) No, if anything the job loss resulted from not doing so much more, much more intently, and much sooner.

"Signal group membership" may be true of the fields you mentioned (political philosophy and philosophy of religion), but seems false of many other fields such as philosophy of math, philosophy of mind, decision theory, anthropic reasoning. Hard to see what group membership someone is signaling by supporting one solution to Sleeping Beauty vs another, for example.

1Mateusz Bagiński
Here are some axes along which I think there's some group membership signaling in philosophy (IDK about the extent and it's hard to disentangle it from other stuff): * Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations) * Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism * Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people) * Biology: gene's-eye-view/modern synthesis vs extended evolutionary synthesis

I'm increasingly worried that philosophers tend to underestimate the difficulty of philosophy. I've previously criticized Eliezer for this, but it seems to be a more general phenomenon.

Observations:

  1. Low expressed interest in metaphilosophy (in relation to either AI or humans)
  2. Low expressed interest in AI philosophical competence (either concern that it might be low, or desire/excitement for supercompetent AI philosophers with Jupiter-sized brains)
  3. Low concern that philosophical difficulty will be a blocker of AI alignment or cause of AI risk
  4. High confiden
... (read more)
4Seth Herd
Hm. I think modern academic philosophy is a raging shitshow, but I thought philosophy on LW was quite good. I haven't been a regular LW user until a couple of years ago, and the philosophical takes here, particularly Eliezer's, converge with my own conclusions after a half lifetime of looking at philosophical questions through the lens of science, particularly neuroscience and psychology. So: what do you see as the limitations in LW/Yudkowskian philosophy? Perhaps I've overlooked them. I am currently skeptical that we need better philosophy for good AGI outcomes, vs. better practical work on technical AGI alignment (a category that barely exists) and PR work to put the likely personal intent aligned AGI into the hands of people that give half a crap about understanding or implementing ethics. Deciding on the long term future will be a matter of a long contemplation if we get AGI into good hands. We should decide if that logic is right, and if so, plan the victory party after we've won the war. I did read your metaphilosophy post and remain unconvinced that there's something big the rest of us are missing. I'm happy to be corrected (I love becoming less wrong, and I'm aware of many of my biases that might prevent it):  Here's how it currently looks to me: Ethics are ultimately a matter of preference, the rest is game theory and science (including the science of human preferences). Philosophical questions boil down to scientific questions in most cases, so epistemology is metaphilosophy for the most part. Change my mind! Seriously, I'll listen. It's been years since I've thought about philosophy hard.
4Steven Byrnes
I was just reading Daniel Dennett’s memoir for no reason in particular, it had some interesting glimpses into how professional philosophers actually practice philosophy. Like I guess there’s a thing where one person reads their paper (word-for-word!) and then someone else is the designated criticizer? I forget the details. Extremely different from my experience in physics academia though!! (Obviously, reading that memoir is probably not the most time-efficient way to learn about the day-to-day practice of academic philosophy.) (Oh, there was another funny anecdote in the memoir where the American professional philosopher association basically had a consensus against some school of philosophy, and everyone was putting it behind them and moving on, but then there was a rebellion where the people who still liked that school of philosophy did a hostile takeover of the association’s leadership!)  A non-ethics example that jumps to my mind is David Chalmers on the Hard Problem of Consciousness here: “So if I’m giving my overall credences, I’m going to give, 10% to illusionism, 30% to panpsychism, 30% to dualism, and maybe the other 30% to, I don’t know what else could be true, but maybe there’s something else out there.” That’s the only example I can think of but I read very very little philosophy.
3Shmi
What are the issues that are "difficult" in philosophy, in your opinion? What makes them difficult? I remember you and others talking about the need to "solve philosophy", but I was never sure what it meant by that.
9Simon Goldstein
I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy.  There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying credences. Finally, I think that uncertainty / decision theory is a persistent theme in recent philosophical work on AI safety and other issues in philosophy of AI; see for example this paper, which is quite sensitive to issues about chances of welfare https://link.springer.com/article/10.1007/s43681-023-00379-1.
2Vladimir_Nesov
I blame science, math, engineering, entrepreneurship. Philosophy is the practice of the esoteric method, meaning it can't be made truly legible for very long stretches of investigation. This results in accumulation of anti-epistemic hazards, which science doesn't particularly need to have tools for dealing with, because it can filter its reasoning through frequent transitions into legibility. Philosophy can't rely on such filtering through legibility, it has to maintain sanity the hard way. But as philosophy enviously looks at the more successful endeavors of science, it doesn't see respect for such methods of maintaining sanity in its reasoning, instead it sees that merely moving fast and breaking things works very well. And so the enthusiasm for their development wanes, instead philosophy remains content with the object level questions that investigate particular truths, rather than methods for getting better at telling which cognitive algorithms can more robustly arrive at truths (rationality, metaphilosophy).

Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.

3TsviBT
To whom does this not apply? Most people who "work on AI alignment" don't even think that thinking is a thing.

I have a lot of disagreements with section 6. Not sure where the main crux is, so I'll just write down a couple of things.

One intuition pump here is: in the current, everyday world, basically no one goes around with much of a sense of what people’s “values on reflection” are, or where they lead.

This only works because we're not currently often in danger of subjecting other people to major distributional shifts. See Two Neglected Problems in Human-AI Safety.

That is, ultimately, there is just the empirical pattern of: what you would think/feel/value gi

... (read more)

Crossposting from X:

High population may actually be a problem, because it allows the AI transition to occur at low average human intelligence, hampering its governance. Low fertility/population would force humans to increase average intelligence before creating our successor, perhaps a good thing!

This assumes that it's possible to create better or worse successors, and that higher average human intelligence would lead to smarter/better politicians and policies, increasing our likelihood of building better successors.

Some worry about low fertility leading t... (read more)

4Viliam
I find the idea interesting: To achieve a certain value of "total genius", we either need a large population with a small fraction of geniuses, or a small population with a large fraction of geniuses. (A third option is a small population with a small fraction of geniuses... and it takes a lot of time. The geniuses read each other's books, rather than talk to each other directly. I think it was like this in the past. Very inefficient, because the information transfer by reading books is one-sided; does not allow collaboration in real time.) I wonder how the heritability of IQ works, versus the reversion to the mean. Despite Pol Pot's dystopian project, the average IQ in Cambodia seems to be... average. What would happen to a country where let's say half of the children are produced by artificial insemination, and half of the sperm comes from fathers with IQ 130 and above? If the mother is average, the child is likely to be an average between 100 and 130, so 115. On one hand, nothing exceptional; on the other hand, if the baseline is now slightly higher, then the next generation... and here comes the question how exactly the reversion to the mean works, and whether the constant injections of IQ 130 genes in the population could outrun it.

Social media sites are already getting overwhelmed by spam, fake images, fake videos, blackmail attempts, phishing, etc. The only way to counteract the speed and volume of massive AI-driven attacks is with AI-powered defenses. These defenses need rules. If those rules aren't formal and proven robust, then they will likely be hacked and exploited by adversarial AIs. So at the most basic level, we need infrastructure rules which are provably robust against classes of attacks. What those attack classes are and what properties those rules guarantee is part o

... (read more)

It seems confusing/unexpected that a user has to click on "Personal Blog" to see organisational announcements (which are not "personal"). Also, why is it important or useful to keep timeful posts out of the front page by default?

If it's because they'll become less relevant/interesting over time, and you want to reduces the chances of them being shown to users in the future, it seems like that could be accomplished with another mechanism.

I guess another possibility is that timeful content is more likely to be politically/socially sensitive, and you want to ... (read more)

5kave
To the extent you're saying that the "Personal" name for the category is confusing, I agree. I'm not sure what a better name is, but I'd like to use one. Your last paragraph is in the right ballpark, but by my lights the central concern isn't so much about LessWrong mods getting involved in fights over what goes on the frontpage. It's more about keeping the frontpage free of certain kinds of context requirements and social forces. LessWrong is meant for thinking and communicating about rationality, AI x-risk and related ideas. It shouldn't require familiarity with the social scenes around those topics. Organisations aren't exactly "a social scene". And they are relevant to modeling the space's development. But I think there's two reasons to keep information about those organisations off the frontpage. 1. While relevant to the development of ideas, that information is not the same as the development of those ideas. We can focus on org's contribution to the ideas without focusing on organisational changes. 2. It helps limit certain social forces. My model for why LessWrong keeps politics off the frontpage is to minimize the risk of coöption by mainstream political forces and fights. Similarly, I think keeping org updates off the frontpage helps prevent LessWrong from overly identifying with particular movements or orgs. I'm afraid this would muck up our truth-seeking. Powerful, high-status organizations can easily warp discourse. "Everyone knows that they're basically right about stuff". I think this already happens to some degree – comments from staff at MIRI, ARC, Redwood, Lightcone seem to me to gain momentum solely from who wrote them. Though of course it's hard to be sure, as the comments are often also pretty good on their merits. As AI news heats up, I do think our categories are straining a bit. There's a lot of relevant but news-y content. I still feel good about keeping things like Zvi's AI newsletters off the frontpage, but I worry that putting them
Load More