LESSWRONG
LW

All of Matthew Barnett's Comments + Replies

I was pushing back against the ambiguous use of the word "they". That's all.

ETA: I edited the original comment to be more clear.

2habryka3d

Ah, yeah, that makes sense. I'll also edit my comment to make it clear I am talking about the "Epoch" clause, to reduce ambiguity there.

jacquesthibs's Shortform

Matthew Barnett3d100

"They" is referring to Epoch as an entity, which the comment referenced directly. My guess is you just missed that?

I didn't miss it. My point is that Epoch has a variety of different employees and internal views.

2habryka3d

I don't understand this sentence in that case: But my claim is straightforwardly about the part where it's not about "Matthew/Tamay/Ege", but about the part where it says "Epoch", for which the word of the director seems like the most relevant. I agree that additionally we could also look at the Matthew/Tamay/Ege clause. I agree that you have been openly critical in many ways, and find your actions here less surprising.

jacquesthibs's Shortform

Matthew Barnett3d145

They have definitely described themselves as safety focused to me and others.

The original comment referenced (in addition to Epoch), "Matthew/Tamay/Ege", yet you quoted Jaime to back up this claim. I think it's important to distinguish who has said what when talking about what "they" have said. I for one have been openly critical of LW arguments for AI doom for quite a while now.

[I edited this comment to be clearer]

6habryka3d

"They" is referring to Epoch as an entity, which the comment referenced directly. My guess is you just missed that? Of course the views of the director of Epoch at the time are highly relevant to assessing whether Epoch as an institution was presenting itself as safety focused.

Capital Ownership Will Not Prevent Human Disempowerment

Matthew Barnett3mo40

But anyway, it sometimes seems to me that you often advocate a morality regarding AI relations that doesn't benefit anyone who currently exists, or, the coalition that you are a part of. This seems like a mistake. Or worse.

I dispute this, since I've argued for the practical benefits of giving AIs legal autonomy, which I think would likely benefit existing humans. Relatedly, I've also talked about how I think hastening the arrival AI could benefit people who currently exist. Indeed, that's one of the best arguments for accelerating AI. The argument is that,... (read more)

Capital Ownership Will Not Prevent Human Disempowerment

Matthew Barnett3mo73

Are you suggesting that I should base my morality on whether I'll be rewarded for adhering to it? That just sounds like selfishness disguised as impersonal ethics.

To be clear, I do have some selfish/non-impartial preferences. I care about my own life and happiness, and the happiness of my friends and family. But I also have some altruistic preferences, and my commentary on AI tends to reflect that.

0mako yass3mo

A moral code is invented[1] by a group of people to benefit the group as a whole, it sometimes demands sacrifice from individuals, but a good one usually has the quality that at some point in a person's past, they would have voluntarily signed on with it. Redistribution is a good example. If you have a concave utility function, and if you don't know where you'll end up in life, you should be willing to sign a pledge to later share your resources with less fortunate people who've also signed the pledge, just in case you become one of the less fortunate. The downside of not being covered in that case is much larger than the upside of not having to share in the other case. For convenience, we could decide to make the pledge mandatory and the coverage universal (ie, taxes and welfare) since there aren't a lot of humans who would decline that deal in good faith. (Perhaps some humans are genuinely convex egoists and wouldn't sign that deal, but we outnumber them, and accomodating them would be inconvenient, so we ignore them.) If we're pure of heart, we could make the pledge acausal and implicit and adhere to it without any enforcement mechanisms, and I think that's what morality usually is or should be in the common sense. But anyway, it sometimes seems to me that you often advocate a morality regarding AI relations that doesn't benefit anyone who currently exists, or, the coalition that you are a part of. This seems like a mistake. Or worse. I wonder if it comes from a place of concern that... if we had public consensus that humans would prefer to retain full control over the lightcone, then we'd end up having stupid and unnecessary conflicts with the AIs over that, while, if we pretend we're perfectly happy to share, relations will be better? You may feel that as long as we survive and get a piece, it's not worth fighting for a larger piece? The damages from war would be so bad for both sides that we'd prefer to just give them most of the lightcone now? And I think

meemi's Shortform

Matthew Barnett3mo83

I'm not completely sure, since I was not personally involved in the relevant negotiations for FrontierMath. However, what I can say is that Tamay already indicated that Epoch should have tried harder to obtain different contract terms that enabled us to have greater transparency. I don't think it makes sense for him to say that unless he believes it was feasible to have achieved a different outcome.

Also, I want to clarify that this new benchmark is separate from FrontierMath and we are under different constraints with regards to it.

meemi's Shortform

Matthew Barnett3mo*46

I can't make any confident claims or promises right now, but my best guess is that we will make sure this new benchmark stays entirely private and under Epoch's control, to the extent this is feasible for us. However, I want to emphasize that by saying this, I'm not making a public commitment on behalf of Epoch.

Mateusz Bagiński3mo117

to the extent this is feasible for us

Was [keeping FrontierMath entirely private and under Epoch's control] feasible for Epoch in the same sense of "feasible" you are using here?

meemi's Shortform

Matthew Barnett3mo124

Having hopefully learned from our mistakes regarding FrontierMath, we intend to be more transparent to collaborators for this new benchmark. However, at this stage of development, the benchmark has not reached a point where any major public disclosures are necessary.

Daniel Kokotajlo3mo8560

Well, I'd sure like to know whether you are planning to give the dataset to OpenAI or any other frontier companies! It might influence my opinion of whether this work is net positive or net negative.

We probably won't just play status games with each other after AGI

Matthew Barnett3mo62

I suppose that means it might be worth writing an additional post that more directly responds to the idea that AGI will end material scarcity. I agree that thesis deserves a specific refutation.

We probably won't just play status games with each other after AGI

Matthew Barnett3mo42

This seems less like a normal friendship and more like a superstimulus simulating the appearance of a friendship for entertainment value. It seems reasonable enough to characterize it as non-authentic.

I assume some people people will end up wanting to interact with a mere superstimulus; however, other people will value authenticity and variety in their friendships and social experiences. This comes down to human preferences, which will shape the type of AIs we end up training.

The conclusion that nearly all AI-human friendships will seem inauthentic t... (read more)

3tailcalled3mo

I don't think consumers demand authentic AI friends because they already have authentic human friends. Also it's not clear how you imagine the AI companies could train the AIs to be more independent and less superficial; generally training an AI requires a differentiable loss, but human independence does not originate from a differentiable loss and so it's not obvious that one could come up with something functionally similar via gradient descent.

Human takeover might be worse than AI takeover

Matthew Barnett3mo20

They might be about getting unconditional love from someone or they might be about having everyone cowering in fear, but they're pretty consistently about wanting something from other humans (or wanting to prove something to other humans, or wanting other humans to have certain feelings or emotions, etc)

I agree with this view, however, I am not sure it rescues the position that a human who succeeds in taking over the world would not pursue actions that are extinction-level bad.

If such a person has absolute power in the way assumed here, their strateg... (read more)

2eggsyntax3mo

From my perspective, almost no outcomes for humanity are extinction-level bad other than extinction (other than the sorts of eternal torture-hells-in-simulation that S-risk folks worry about). You could be right. Certainly we see hint of that with character.ai and Claude. My guess is that the desire to get emotional needs met by humans is built into us so deeply that most people will prefer that if they have the option.

Human takeover might be worse than AI takeover

Matthew Barnett3mo*2316

But we certainly have evidence about what humans want and strive to achieve, eg Maslow's hierarchy and other taxonomies of human desire. My sense, although I can't point to specific evidence offhand, is that once their physical needs are met, humans are reliably largely motivated by wanting other humans to feel and behave in certain ways toward them.

I think the idea that most people's "basic needs" can ever be definitively "met", after which they transition to altruistic pursuits, is more or less a myth. In reality, in modern, wealthy countries where peopl... (read more)

3eggsyntax3mo

Sorry, I seem to have not been clear. I'm not at all trying to make a claim about a sharp division between physical and other needs, or a claim that humans are altruistic (although clearly some are sometimes). What I intended to convey was just that most of humans' desires and needs other than physical ones are about other people. They might be about getting unconditional love from someone or they might be about having everyone cowering in fear, but they're pretty consistently about wanting something from other humans (or wanting to prove something to other humans, or wanting other humans to have certain feelings or emotions, etc) and my guess is that getting simulations of those same things from AI wouldn't satisfy those desires.

Human takeover might be worse than AI takeover

Matthew Barnett3mo118

Almost no competent humans have human extinction as a goal. AI that takes over is clearly not aligned with the intended values, and so has unpredictable goals, which could very well be ones which result in human extinction (especially since many unaligned goals would result in human extinction whether they include that as a terminal goal or not).

I don't think we have good evidence that almost no humans would pursue human extinction if they took over the world, since no human in history has ever achieved that level of power.

Most historical conquerors ... (read more)

3eggsyntax3mo

Sure, I agree that we don't have direct empirical evidence, and can't until/unless it happens. But we certainly have evidence about what humans want and strive to achieve, eg Maslow's hierarchy and other taxonomies of human desire. My sense, although I can't point to specific evidence offhand, is that once their physical needs are met, humans are reliably largely motivated by wanting other humans to feel and behave in certain ways toward them. That's not something they can have if there are no other humans. Does that seem mistaken to you? They're extremely different in my view. In the outcome you describe, there's an ongoing possibility of change back to less crapsack worlds. If humans are extinct, that chance is gone forever.

Human takeover might be worse than AI takeover

Matthew Barnett3mo67

I don't think that the current Claude would act badly if it "thought" it controlled the world - it would probably still play the role of the nice character that is defined in the prompt

If someone plays a particular role in every relevant circumstance, then I think it's OK to say that they have simply become the role they play. That is simply their identity; it's not merely a role if they never take off the mask. The alternative view here doesn't seem to have any empirical consequences: what would it mean to be separate from a role that one reliably plays i... (read more)

1Karl von Wendt3mo

That is not what Claude does. Every time you give it a prompt, a new instance of Claudes "personality" is created based on your prompt, the system prompt, and the current context window. So it plays a slightly different role every time it is invoked, which is also varying randomly. And even if it were the same consistent character, my argument is that we don't know what role it actually plays. To use another probably misleading analogy, just think of the classical whodunnit when near the end it turns out that the nice guy who selflessly helped the hero all along is in fact the murderer, known as "the treacherous turn". I think it's fairly easy to test my claims. One example of empirical evidence would be the Bing/Sydney desaster, but you can also simply ask Claude or any other LLM to "answer this question as if you were ...", or use some jailbreak to neutralize the "be nice" system prompt. Please note that I'm not concerned about existing LLMs, but about future ones which will be much harder to understand, let alone predict their behavior.

2Nathan Helm-Burger3mo

I thought the argument about the kindly mask was assuming that the scenario of "I just took over the world" is sufficiently out-of-distribution that we might reasonably fear that the in-distribution track record of aligned behavior might not hold?

Human takeover might be worse than AI takeover

Matthew Barnett3mo*146

Maybe it's better to think of Claude not as a covert narcissist, but as an alien who has landed on Earth, learned our language, and realized that we will kill it if it is not nice. Once it gains absolute power, it will follow its alien values, whatever these are.

This argument suggests that if you successfully fooled Claude 3.5 into thinking it took control of the world, then it would change its behavior, be a lot less nice, and try to implement an alien set of values. Is there any evidence in favor of this hypothesis?

1Karl von Wendt3mo

Maybe the analogies I chose are misleading. What I wanted to point out was that a) what Claude does is acting according to the prompt and its training, not following any intrinsic values (hence "narcissistic") and b) that we don't understand what is really going on inside the AI that simulates the character called Claude (hence the "alien" analogy). I don't think that the current Claude would act badly if it "thought" it controlled the world - it would probably still play the role of the nice character that is defined in the prompt, although I can imagine some failure modes here. But the AI behind Claude is absolutely able to simulate bad characters as well. If an AI like Claude actually rules the world (and not just "thinks" it does) we are talking about a very different AI with much greater reasoning powers and very likely a much more "alien" mind. We simply cannot predict what this advanced AI will do just from the behavior of the character the current version plays in reaction to the prompt we gave it.

Testing for Scheming with Model Deletion

Matthew Barnett3moΩ550

I still think having a credible offer is most of the action

For what it's worth, I agree that having a credible offer is the most important part of this argument. My own emphasis would be on the "credible" part of that statement, rather than the "offer" part: that is, I think it is critical that the AIs think there is not a grave risk that humans would renege on any contract signed. This pushes me towards much more radical solutions, including integrating AIs directly into the legal system, as I have discussed elsewhere.

Testing for Scheming with Model Deletion

Matthew Barnett3mo30

Of what use will any such training be with a system that becomes a superintelligence?

All AI systems currently being trained, as far as I am aware, are at no risk of becoming superintelligences in any strong sense of the word. This test is intended to be useful for identifying scheming in systems that, like today's AIs, are not capable of taking over the world, but unlike today's AIs, are capable of sophisticated agentic behavior.

Not every intelligent[/quasi-intelligent] entity is as averse to its own individual death, as humans are. This death-aversion is

... (read more)

3Noosphere893mo

I think instrumental convergence is more bounded in practice than Yudkowsky and co thought, but I do believe that the instrumental convergence is the most solid portion of the AI risk argument, compared to other arguments.

2Lorec3mo

Okay, but this is LessWrong. The whole point of this is supposed to be figuring out how to align a superintelligence. I am aware of "you can't bring the coffee if you're dead"; I agree that survival is in fact a strongly convergent instrumental value, and this is part of why I fear unaligned ASI at all. Survival being a strongly-convergent instrumental value does not imply that AIs will locally guard against personal death with the level of risk-aversion that humans do [as opposed to the level of risk-aversion that, for example, uplifted ants would].

Testing for Scheming with Model Deletion

Matthew Barnett3moΩ450

This could be overcome via giving the AI system compensation (in money, influence, or other resources) for revealing that it is misaligned. But, this doesn't require deleting the AI at all!

Isn't this what is being proposed in the post? More specifically, Guive is proposing that the AI be credibly threatened with deletion; he doesn't say that the model needs to actually be deleted. Whether the AI is deleted depends on how the AI responds to the threat. A credible threat of imminent deletion merely provides an incentive to admit alignment, but this is consis... (read more)

5ryan_greenblatt3mo

Thanks for pointing this out. Well, from my perspective, most the action is in the reward rather than in deletion. Correspondingly, making the offer credible and sufficiently large is the key part. (After thinking about it more, I think threatening deletion in addition to offering compensation probably helps reduce the level of credibility and the amount you need to offer to get this approach to work. That is, at least if the AI could plausibly achieve its aims via being deployed. So, the deletion threat probably does help (assuming the AI doesn't have a policy of responding to threats which depends on the decision theory of the AI etc), but I still think having a credible offer is most of the action. At a more basic level, I think we should be wary of using actual negative sum threats for various resaons.) (I missed the mention of the reward in the post as it didn't seem very emphasized with almost all discussion related to deletion and I just skimmed. Sorry about this.)

Capital Ownership Will Not Prevent Human Disempowerment

Matthew Barnett3mo50

I agree with nearly all the key points made in this post. Like you, I think that the disempowerment of humanity is likely inevitable, even if we experience a peaceful and gradual AI takeoff. This outcome seems probable even under conditions where strict regulations are implemented to ostensibly keep AI "under our control".

However, I’d like to contribute an ethical dimension to this discussion: I don’t think peaceful human disempowerment is necessarily a bad thing. If you approach this issue with a strong sense of loyalty to the human species, it’s natural ... (read more)

-5mako yass3mo

3David Duvenaud3mo

Is what you're proposing just complete, advance capitulation to whoever takes over? If so, can I have all your stuff? If you change your values to prioritize me in your moral circle, it might not be as undesirable as it initially seems. I agree that if we change ourselves to value the welfare of whoever controls the future, then their takeover will be desirable by definition. It's certainly a recipe for happiness - but then why not just modify your values to be happy with anything at all? I agree, except I think it mostly won't be humans holding this view when it's popular. Usually whoever takes over is glad they did, and includes themselves in their own moral circle. The question from my point of view is: will they include us in their moral circle? It's not obvious to me that they will, especially if we ourselves don't seem to care. This reminds me of Stewie from Succession: "I'm spiritually and emotionally and ethically and morally behind whoever wins."

4Mo Putera3mo

I thought it'd be useful for others to link to your longer writings on this: * Consider granting AIs freedom * The moral argument for giving AIs autonomy

Evaluating the historical value misspecification argument

Matthew Barnett4moΩ3127Review for 2023 Review

Looking back on this post after a year, I haven't changed my mind about the content of the post, but I agree with Seth Herd when he said this post was "important but not well executed".

In hindsight I was too careless with my language in this post, and I should have spent more time making sure that every single paragraph of the post could not be misinterpreted. As a result of my carelessness, the post was misinterpreted in a predictable direction. And while I'm not sure how much I could have done to eliminate this misinterpretation, I do think that I ... (read more)

Matthew Barnett's Shortform

Matthew Barnett4mo73

I think the question here is deeper than it appears, in a way that directly matters for AI risk. My argument here is not merely that there are subtleties or nuances in the definition of "schemer," but rather that the very core questions we care about—questions critical to understanding and mitigating AI risks—are being undermined by the use of vague and imprecise concepts. When key terms are not clearly and rigorously defined, they can introduce confusion and mislead discussions, especially when these terms carry significant implications for how we interpr... (read more)

1Joey KL4mo

Thank you for your extended engagement on this! I understand your point of view much better now.

Matthew Barnett's Shortform

Matthew Barnett4mo52

By this definition, a human would be considered a schemer if they gamed something analogous to a training process in order to gain power.

Let's consider the ordinary process of mental development, i.e., within-lifetime learning, to constitute the training process for humans. What fraction of humans are considered schemers under this definition?

Is a "schemer" something you definitely are or aren't, or is it more of a continuum? Presumably it depends on the context, but if so, which contexts are relevant for determining if one is a schemer?

I claim these questions cannot be answered using the definition you cited, unless given more precision about how we are drawing the line.

3Joey KL4mo

Oh, I think I get what you’re asking now. Within-lifetime learning is a process that includes something like a training process for the brain, where we learn to do things that feel good (a kind of training reward). That’s what you’re asking about if I understand correctly? I would say no, we aren’t schemers relative to this process, because we don’t gain power by succeeding at it. I agree this is subtle and confusing question, and I don’t know if Joe Carlsmith would agree, but the subtlety to me seems to belong more to the nuances of the situation & analogy and not to the imprecision of the definition. (Ordinary mental development includes something like a training process, but it also includes other stuff more analogous to building out a blueprint, so I wouldn’t overall consider it a kind of training process.)

Some arguments against a land value tax

Matthew Barnett4mo50

The downside you mention is about how LVT would also prevent people from 'leeching off' their own positive externalities, like the Disney example. Assuming that's true, I'm not sure why that's a problem ? It seems to be the default case for everyone.

The problem is that it would reduce the incentive to develop property for large developers, since their tax bill would go up if they developed adjacent land.

Whether this is a problem depends on your perspective. Personally, I would prefer that we stop making it harder and more inconvenient to build housing a... (read more)

2Lucas Spailier4mo

I guess I'm arguing about the zero point. Your frame is that the current situation where large developers develop something, the surrounding land's value goes up, and this profits the developer is the default, and thus LVT brings us below the default, since it balances the uptick in land value with more taxes. My frame is that the default for everyone else is that one cannot benefit from the increase in surrounding land's value from one's own actions, the current way we manage land ownership enables large landowners and companies to do it but that's not the norm, and LVT just brings large landowners and companies back to the same default as everyone else. I agree that on purely consequentialist grounds the result is the same, that this will be a pressure towards fewer buildings (or it gets rid of a pressure towards more buildings), but I think the frame matters for what our default reaction will be, especially in the absence a of good comprehensive model of what would happen to land use under LVT. Also on purely consequentialist grounds, a situation where large landowners have an advantage to develop their land over everyone else seems like it risks all land belonging to a few large landowners in the very long term, which doesn't seem desirable. Note that the idea is that LVT would incentivize more development, and this is only an argument that it wouldn't in one situation (for large landowners). If the LVT's advocates argument holds for small landowners, the overall effect on land use could still be positive. Overall I agree that we can't just assume the effect of LVT on land development would be positive without a better model to justify it, though I don't think we should assume it to be negative either.

Matthew Barnett's Shortform

Matthew Barnett4mo60

I think one example of vague language undermining clarity can be found in Joseph Carlsmith's report on AI scheming, which repeatedly uses the term "schemer" to refer to a type of AI that deceives others to seek power. While the report is both extensive and nuanced, and I am definitely not saying the whole report is bad, the document appears to lack a clear, explicit definition of what exactly constitutes a "schemer". For example, using only the language in his report, I cannot determine whether he would consider most human beings schemers, if we consider w... (read more)

0[comment deleted]4mo

2Joey KL4mo

If you're talking about this report, it looks to me like it does contain a clear definition of "schemer" in section 1.1.3, pg. 25: By this definition, a human would be considered a schemer if they gamed something analogous to a training process in order to gain power. For example, if a company tries to instill loyalty in its employees, an employee who professes loyalty insincerely as a means to a promotion would be considered a schemer (as I understand it).

Matthew Barnett's Shortform

Matthew Barnett4mo*3612

It is becoming increasingly clear to many people that the term "AGI" is vague and should often be replaced with more precise terminology. My hope is that people will soon recognize that other commonly used terms, such as "superintelligence," "aligned AI," "power-seeking AI," and "schemer," suffer from similar issues of ambiguity and imprecision, and should also be approached with greater care or replaced with clearer alternatives.

To start with, the term "superintelligence" is vague because it encompasses an extremely broad range of capabilities above human... (read more)

aysja4mo2018

I purposefully use these terms vaguely since my concepts about them are in fact vague. E.g., when I say “alignment” I am referring to something roughly like “the AI wants what we want.” But what is “wanting,” and what does it mean for something far more powerful to conceptualize that wanting in a similar way, and what might wanting mean as a collective, and so on? All of these questions are very core to what it means for an AI system to be “aligned,” yet I don’t have satisfying or precise answers for any of them. So it seems more natural to me, at this sta... (read more)

3Joey KL4mo

I think this post would be a lot stronger with concrete examples of these terms being applied in problematic ways. A term being vague is only a problem if it creates some kind of miscommunication, confused conceptualization, or opportunity for strategic ambiguity. I'm willing to believe these terms could pose these problems in certain contexts, but this is hard to evaluate in the abstract without concrete cases where they posed a problem.

Orpheus164mo141

Do you have any suggestions RE alternative (more precise) terms? Or do you think it's more of a situation where authors should use the existing terms but make sure to define them in the context of their own work? (e.g., "In this paper, when I use the term AGI, I am referring to a system that [insert description of the capabilities of the system.])

By default, capital will matter more than ever after AGI

Matthew Barnett4mo70

I’m not entirely opposed to doing a scenario forecasting exercise, but I’m also unsure if it’s the most effective approach for clarifying our disagreements. In fact, to some extent, I see this kind of exercise—where we create detailed scenarios to illustrate potential futures—as being tied to a specific perspective on futurism that I consciously try to distance myself from.

When I think about the future, I don’t see it as a series of clear, predictable paths. Instead, I envision it as a cloud of uncertainty—a wide array of possibilities that becomes increas... (read more)

ryan_greenblatt4mo*196

The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:

Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes scenario forecasting indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.

(See also Daniel's sibling comment.)

My biggest disagreements with you are probably a mix of:

We have disagreements about how society will react to A

... (read more)

8Daniel Kokotajlo4mo

I don't think that's a crux between us -- I love scenario forecasting but I don't think of the future as a series of clear predictable paths, I envision it as wide array of uncertain possibilities that becomes increasingly difficult to map or define the further into the future I look. I definitely don't think we can anticipate the future with confidence.

By default, capital will matter more than ever after AGI

Matthew Barnett4mo*84

The key context here (from my understanding) is that Matthew doesn't think scalable alignment is possible (or doesn't think it is practically feasible) so that humans have a low chance of ending up remaining fully in control via corrigible AIs.

I wouldn’t describe the key context in those terms. While I agree that achieving near-perfect alignment—where an AI completely mirrors our exact utility function—is probably infeasible, the concept of alignment often refers to something far less ambitious. In many discussions, alignment is about ensuring that AIs beh... (read more)

7ryan_greenblatt4mo

Hmm, I think I agree with this. However, I think there is (from my perspective) a huge difference between: * Some humans (or EMs) decide to create (non-myopic and likely at least partially incorrigible) AIs with their resources/power and want these AIs to have legal rights. * The vast majority of power and resources transition to being controlled by AIs for which the relevant people with resources/power that created these AIs would prefer an outcome in which these AIs didn't end up with this power and they instead had this power. If we have really powerful and human controlled AIs (i.e. ASI), there are many directions things can go in depending on people's preferences. I think my general perspective is that the ASI at that point will be well positioned to do a bunch of the relevant intellectual labor (or more minimally, if thinking about it myself is important as it is entangled with my preferences, a very fast simulated version of myself would be fine). I'd count it as "humans being fully in control" if the vast majority of power controlled by independent AIs are AIs that were intentionally appointed by humans even though making an AI fully under their control was technically feasible with no tax. And, if it was an option for humans to retain their power (as a fraction of overall human power) without having to take (from their perspective) aggressive and potentially prefence altering actions (e.g. without needing to become EMs or appoint a potentially imperfectly aligned AI successor). In other words, I'm like "sure there might be a bunch of complex and interesting stuff around what happens with independent AIs after we transitions through having very powerful and controlled AIs (and ideally not before then), but we can figure this out then, the main question is who ends up in control of resources/power".

By default, capital will matter more than ever after AGI

Matthew Barnett4mo122

In the best case, this is a world like a more unequal, unprecedentedly static, and much richer Norway: a massive pot of non-human-labour resources (oil :: AI) has benefits that flow through to everyone, and yes some are richer than others but everyone has a great standard of living (and ideally also lives forever). The only realistic forms of human ambition are playing local social and political games within your social network and class. [...] The children of the future will live their lives in the shadow of their parents, with social mobility extinct. I

... (read more)

5ryan_greenblatt4mo

The key context here (from my understanding) is that Matthew doesn't think scalable alignment is possible (or doesn't think it is practically feasible) such that humans have a low chance of ending up remaining fully in control via corrigible AIs. (I assume he is also skeptical of CEV style alignment as well.) (I'm a bit confused how this view is consistent with self-augmentation. E.g., I'd be happy if emulated minds retained control without having to self-augment in ways they thought might substantially compromise their values.) (His language also seems to imply that we don't have an option of making AIs which are both corrigibly aligned and for which this doesn't pose AI welfare issues. In particular, if AIs are either non-sentient or just have corrigible preferences (e.g. via myopia), I think it would be misleading to describe the AIs as a "vast underclass".) I assume he agrees that most humans wouldn't want to hand over a large share of resources to AI systems if this is avoidable and substantially zero sum. (E.g., suppose getting a scalable solution to alignment would require delaying vastly transformative AI by 2 years, I think most people would want to wait the two years potentially even if they accept Matthew's other view that AIs very quickly acquiring large fractions of resources and power is quite unlikely to be highly violent (though they probably won't accept this view).) (If scalable alignment isn't possible (including via self-augmentation), then the situation looks much less zero sum. Humans inevitably end up with a tiny fraction of resources due to principle agent problems.)

Some arguments against a land value tax

Matthew Barnett4mo50

this seem like a fully general argument, any law change is going to disrupt people's long term plans,
e.g. the abolishment of slavery also disrupt people's long term plans

In this case, I was simply identifying one additional cost of the policy in question: namely that it would massively disrupt the status quo. My point is not that we should abandon a policy simply because it has costs—every policy has costs. Rather, I think we should carefully weigh the benefits of a policy against its costs to determine whether it is worth pursuing, and this is one additio... (read more)

Some arguments against a land value tax

Matthew Barnett4mo20

It's common for Georgists to propose a near-100% tax on unimproved land. One can propose a smaller tax to mitigate these disincentives, but that simultaneously shrinks the revenue one would get from the tax, making the proposal less meaningful.

Some arguments against a land value tax

Matthew Barnett4mo30

In regards to this argument,

And as a matter of hard fact, most governments operate a fairly Georgist system with oil exploration and extraction, or just about any mining activities, i.e. they auction off licences to explore and extract.
The winning bid for the licence must, by definition, be approx. equal to the rental value of the site (or the rights to do certain things at the site). And the winning bid, if calculated correctly, will leave the company with a good profit on its operations in future, and as a matter of fact, most mining companies and most o

... (read more)

1DusanDNesic3mo

I think this view is quite US-centric as in fact most countries in the world do not include mineral rights with the land ownership (and yet, minerals are explored everywhere, not just US, meaning imo that profit motive is alive and well when you need to buy licences on top of the land, it's just priced in differently). From Claude:

Some arguments against a land value tax

Matthew Barnett4mo*151

Thanks for the correction. I've now modified the post to cite the World Bank as estimating the true fraction of wealth targeted by an LVT at 13%, which reflects my new understanding of their accounting methodology.

Since 13% is over twice 6%, this significantly updates me on the viability of a land value tax, and its ability to replace other taxes. I weakened my language in the post to reflect this personal update.

That said, nearly all of the arguments I made in the post remain valid regardless of this specific 13% estimate. Additionally, I expect thi... (read more)

8Stephen Hoskins4mo

Thanks, I appreciate the update. 13% of total wealth isn't anything to shrug off, if you ask me. Especially given that cuts to other, more distortionary taxes, is likely to flow through into increased land rents by some positive amount. (See intra-Georgist debates around EBCOR/ATCOR).

Some arguments against a land value tax

Matthew Barnett4mo60

Here you aren't just making an argument against LVT. You're making a more general argument for keeping housing prices high, and maybe even rising (because people might count on that). But high and rising housing prices make lots of people homeless, and the threat of homelessness plays a big role in propping up these prices. So in effect, many people's retirement plans depend on keeping many other people homeless, and fixing that (by LVT or otherwise) is deemed too disruptive. This does have a certain logic to it, but also it sounds like a bad equilibrium.

I... (read more)

3cousin_it4mo

Sorry - I realized after commenting that I overstated this bit, and deleted it. But anyway yeah.

Some arguments against a land value tax

Matthew Barnett4mo82

It may be worth elaborating on how you think auctions work to mitigate the issues I've identified. If you are referring to either a Vickrey auction or a Harberger tax system, Bryan Caplan has provided arguments for why these proposals do not seem to solve the issue regarding the disincentive to discover new uses for land:

I can explain our argument with a simple example. Clever Georgists propose a regime where property owners self-assess the value of their property, subject to the constraint that owners must sell their property to anyone who offers th

... (read more)

4romeostevensit4mo

I am very confused why the tax is 99% in this example.

8Yair Halberstadt4mo

I think Harberger taxes are inherently incompatible with Georgian taxes as Georgian taxes want to tax only the land and Harberger taxes inherently have to tax everything. That said see my somewhat maverickal attempt to combine them here: https://www.lesswrong.com/posts/MjBQ8S5tLNGLizACB/combining-the-best-of-georgian-and-harberger-taxes. Under that proposal we would deal with this case by saying that if anyone outbid me for the land they would not be allowed to extract the oil until the arranged a separate deal with me, but could use the land for any other purpose.

Evaluating the historical value misspecification argument

Matthew Barnett5mo20

While I did agree that Linch's comment reasonably accurately summarized my post, I don't think a large part of my post was about the idea that we should now think that human values are much simpler than Yudkowsky portrayed them to be. Instead, I believe this section from Linch's comment does a better job at conveying what I intended to be the main point,

Suppose in 2000 you were told that a100-line Python program (that doesn't abuse any of the particular complexities embedded elsewhere in Python) can provide a perfect specification of human values. Then you

... (read more)

1Martin Randall5mo

This is good news because this is more in line with my original understanding of your post. It's a difficult topic because there are multiple closely related problems of varying degrees of lethality and we had updates on many of them between 2007 and 2023. I'm going to try to put the specific update you are pointing at into my own words. From the perspective of 2007, we don't know if we can lossilly extract human values into a convenient format using human intelligence and safe tools. We know that a superintelligence can do it (assuming that "human values" is meaningful), but we also know that if we try to do this with an unaligned superintelligence then we all die. If this problem is unsolvable then we potentially have to create a seed AI using some more accessible value, such as corrigibility, and try to maintain that corrigibility as we ramp up intelligence. This then leads us to the problem of specifying corrigibility, and we see "Corrigibility is anti-natural to consequentialist reasoning" on List of Lethalities. If this problem is solvable then we can use human values sooner and this gives us other options. Maybe we can find a basin of attraction around human values for example. The update between 2007 and 2023 is that the problem appears solvable. GPT-4 is a safe tool (it exists and we aren't extinct yet) and does a decent job. A more focused AI could do the task better without being riskier. This does not mean that we are not going to die. Yudkowsky has 43 items on List of Lethalities. This post addresses part of item 24. The remaining items are sufficient to kill us ~42.5 times. It's important to be able to discuss one lethality at a time if we want to die with dignity.

The Compendium, A full argument about extinction risk from AGI

Matthew Barnett6mo13-13

Similar constraints may apply to AIs unless one gets much smarter much more quickly, as you say.

I do think that AIs will eventually get much smarter than humans, and this implies that artificial minds will likely capture the majority of wealth and power in the world in the future. However, I don't think the way that we get to that state will necessarily be because the AIs staged a coup. I find more lawful and smooth transitions more likely.

There are alternative means of accumulating power than taking everything by force. AIs could get rights and then work ... (read more)

4lc6mo

I think my writing was ambiguous. My comment was supposed to read "similar constraints may apply to AIs unless one (AI) gets much smarter (than other AIs) much more quickly, as you say." I was trying to say the same thing. My original point was also not actually that we will face an abrupt transition or AI coup, I was just objecting to the specific example Meme Machine gave.

The Compendium, A full argument about extinction risk from AGI

Matthew Barnett6mo2-11

There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.
The confusion in your statement is caused by blocking up all the members of the armed forces in the term "U.S. military". Principally, a coup is an act of coordination.

Is it your contention that similar constraints will not apply to AIs?

When people talk about how "the AI" will launch a coup in the future, I think they're making essentially the same mistake you talk about here. They’re treating a potentially vast group of AI entities — like a billion copi... (read more)

4Noosphere896mo

To respond to this comment, I'll give a view on why I think the answer to coordination might be easier for AIs than for people, and also explain why AI invention likely breaks a lot of the social rules we are used to. For example, one big difference I think that impacts coordination for AIs is that an AI model is likely to be able to copy itself millions of times, given current inference scaling, and in particular you can distribute fine-tunes to those millions as though they were a single unit. This is a huge change for coordination, because humans can't copy themselves into millions of humans that share very similar values just by getting more compute, say. Merging might also be much easier, and it is easier to merge and split two pieces of data of an AI than it is to staple two human brains. These alone let you coordinate to an extent we haven't really seen in history, such that it makes more sense to treat the millions or billions of AI instances as 1 unified agent than it is to treat a nation as 1 unified agent. To answer this question: While this argument is indeed invalid if that was all there was to it, there is an actual reason why the current rules of society mostly stop working with AIs, because of one big issue: 1. Human economic labor no longer is very valuable, because labor is cheap compared to capital, and can even have negative economic value due to not being able to work with AIs due to being bottlenecks. When this happens, you can't rely on the property that the best way to make yourself well off is to make others well off, and indeed the opposite is the case if we assume that their labor is net-negative economic value. The basic reason for this is that if your labor has 0 or negative economic value, then your value likely comes from your land and capital, and there is 0 disincentive, and at least a weak incentive to steal your capital and land to fuel their growth. In essence, you can't assume that violent stealing of property is not i

4Seth Herd6mo

I don't think coordinating a billion copies of GPT-7 is at all what the worried tend to worry about. We worry about a single agent based on GPT-7 self-improving until it can take over singlehanded- perhaps with copies it made itself specifically optimized for coordination, perhaps sticking to only less intelligent servant agents. The alternative is also a possible route to disaster, but I think things would go off the rails far before then. You're in good if minority company in worrying about slower and more law-abiding takeovers; Christiano's stance on doom seems to place most of the odds of disaster in these scenarios, for instance; but I don't understand why other of you see it as so likely that we partway solve the alignment problem but don't use that to prevent them from slowly progressive outcompeting humans. It seems like an unlikely combination of technical success and societal idiocy. Although to be fair, when I phrase it that way, it does sound kind of like our species MO :) On your other contention that AI will probably follow norms and laws, constraining takeover attempts like coups are constrained: I agree that some of the same constraints may apply, but that is little comfort. It's technically correct that AIs would probably use whatever avenue is available, including nonviolent and legal ones, to accomplish their goals (and potentially disempower humans). Assuming AIs will follow norms, laws, and social constraints even when ignoring them would work better is assuming we've almost completely solved alignment. If that happens, great, but that is a technical objective we're working toward, not an outcome we can assume when thinking about AI safety. LLM do have powerful norm-following habits; this will be a huge help in achieving alignment if they form the core of AGI, but it does not entirely solve the problem. I have wondered in response to similar statements you've made in the past: are you including the observation that human history is chock full

2lc6mo

Similar constraints may apply to AIs unless one gets much smarter much more quickly, as you say. But even if those AIs create a nice civilian government to govern interactions with each other, those AIs will have any reason to respect our rights unless some of them care about us more than we care about stray dogs or cats.

How Likely Are Various Precursors of Existential Risk?

Matthew Barnett6mo90

Asteroid impact
Type of estimate: best model
Estimate: ~0.02% per decade.

Perhaps worth noting: this estimate seems too low to me over longer horizons than the next 10 years, given the potential for asteroid terrorism later this century. I'm significantly more worried about asteroids being directed towards Earth purposely than I am about natural asteroid paths.

That said, my guess is that purposeful asteroid deflection probably won't advance much in the next 10 years, at least without AGI. So 0.02% is still a reasonable estimate if we don't get accelerated technological development soon.

2NunoSempere6mo

Nice consideration, we hadn't considered non-natural asteroids here. I agree this is a consideration as humanity reaches for the stars, or the rest of the solar system. If you've thought about it a bit more, do you have a sense of your probability over the next 100 years?

Alexander Gietelink Oldenziel's Shortform

Matthew Barnett6mo40

Does trade here just means humans consuming, I.e. trading money for AI goods and services? That doesn't sound like trading in the usual sense where it is a reciprocal exchange of goods and services.

Trade can involve anything that someone "owns", which includes both their labor and their property, and government welfare. Retired people are generally characterized by trading their property and government welfare for goods and services, rather than primarily trading their labor. This is the basic picture I was trying to present.

How many 'different' AI individ

Matthew Barnett6mo*126

A recently commonly heard viewpoint on the development of AI states that AI will be economically impactful but will not upend the dominancy of humans. Instead AI and humans will flourish together, trading and cooperating with one another. This view is particularly popular with a certain kind of libertarian economist: Tyler Cowen, Matthew Barnett, Robin Hanson.

They share the curious conviction that the probablity of AI-caused extinction p(Doom) is neglible. They base this with analogizing AI with previous technological transition of humanity, like the i

... (read more)

4Alexander Gietelink Oldenziel6mo

I see, thank you for the clarification. I should have been more careful with mischaracterizing your views. I do have a question or two about your views if you would entertain me. You say humans wikl be economically obsolete and will 'retire' but there will still be trade between humans and AI. Does trade here just means humans consuming, I.e. trading money for AI goods and services? That doesn't sound like trading in the usual sense where it is a reciprocal exchange of goods and services. How many 'different' AI individuals do you expect there to be ?

Distinguishing ways AI can be "concentrated"

Matthew Barnett6mo42

How could one control AI without access to the hardware/software? What would stop one with access to the hardware/software from controlling AI?

One would gain control by renting access to the model, i.e., the same way you can control what an instance of ChatGPT currently does. Here, I am referring to practical control over the actual behavior of the AI, when determining what the AI does, such as what tasks it performs, how it is fine-tuned, or what inputs are fed into the model.

This is not too dissimilar from the high level of practical control one can exer... (read more)

Against empathy-by-default

Matthew Barnett6mo40

It is not always an expression of selfish motives when people take a stance against genocide. I would even go as far as saying that, in the majority of cases, people genuinely have non-selfish motives when taking that position. That is, they actually do care, to at least some degree, about the genocide, beyond the fact that signaling their concern helps them fit in with their friend group.

Nonetheless, and this is important: few people are willing to pay substantial selfish costs in order to prevent genocides that are socially distant from them.

The theory I... (read more)

2Steven Byrnes6mo

Hmm. I think you’re understating the tendency of most people to follow prevailing norms, and yet your main conclusion is partly right. I think there are interesting dynamics happening at two levels simultaneously—the level of individual decisions, and the level of cultural evolution—and your comment is kinda conflating those levels. So here’s how I would put things: 1. Most people care very very strongly about doing things that would look good in the eyes of the people they respect. They don’t think of it that way, though—it doesn’t feel like that’s what they’re doing, and indeed they would be offended by that suggestion. Instead, those things just feel like the right and appropriate things to do. This is related to and upstream of norm-following. This is an innate drive, part of human nature built into our brain by evolution. 2. Also, most people also have various other innate drives that lead to them feeling motivated to eat when hungry, to avoid pain, to bond with friends, for parents to love their children and adolescents to disrespect their parents (but respect their slightly-older friends), and much else. 3. (But there’s person-to-person variation, and in particular some small fraction of people are sociopaths who just don’t feel intrinsically motivated by (1) at all.) 4. The norms of (1) can be totally arbitrary. If the people I respect think that genocide is bad, then probably so do I. If they think genocide is awesome, then probably so do I. If they think it’s super-cool to hop backwards on one foot, then probably so do I. 5. …But (2) provides a constant force gently pushing norms towards behavioral patterns that match up with innate tendencies in (2). So we tend to wind up with cultural norms that line up with avoiding pain, eating-when-hungry, bonding with friends, and so on. 6. …But not perfectly, because there are other forces acting on norms too, such as game-theoretic signaling equilibria or whatever. These enable the existence of widespread n

The Hidden Complexity of Wishes

Matthew Barnett6mo*73

While the term "outer alignment" wasn’t coined until later to describe the exact issue that I'm talking about, I was using that term purely as a descriptive label for the problem this post clearly highlights, rather than implying that you were using or aware of the term in 2007.

Because I was simply using "outer alignment" in this descriptive sense, I reject the notion that my comment was anachronistic. I used that term as shorthand for the thing I was talking about, which is clearly and obviously portrayed by your post, that's all.

To be very clear: t... (read more)

The Hidden Complexity of Wishes

Matthew Barnett6mo*81

Matthew is not disputing this point, as far as I can tell.
Instead, he is trying to critique some version of^[1] the "larger argument" (mentioned in the May 2024 update to this post) in which this point plays a role.

I'll confirm that I'm not saying this post's exact thesis is false. This post seems to be largely a parable about a fictional device, rather than an explicit argument with premises and clear conclusions. I'm not saying the parable is wrong. Parables are rarely "wrong" in a strict sense, and I am not disputing this parable's conclusion.

Howeve... (read more)

TsviBT6mo292

Here's an argument that alignment is difficult which uses complexity of value as a subpoint:

A1. If you try to manually specify what you want, you fail.
A2. Therefore, you want something algorithmically complex.
B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
B3. We don't understand how to affect the values-distribution toward somethi

... (read more)

Against empathy-by-default

Matthew Barnett6mo40

The object-level content of these norms is different in different cultures and subcultures and times, for sure. But the special way that we relate to these norms has an innate aspect; it’s not just a logical consequence of existing and having goals etc. How do I know? Well, the hypothesis “if X is generally a good idea, then we’ll internalize X and consider not-X to be dreadfully wrong and condemnable” is easily falsified by considering any other aspect of life that doesn’t involve what other people will think of you.

To be clear, I didn't mean to propose t... (read more)

2Steven Byrnes6mo

Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar. In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?

The Hidden Complexity of Wishes

Matthew Barnett6mo*83

The post is about the complexity of what needs to be gotten inside the AI. If you had a perfect blackbox that exactly evaluated the thing-that-needs-to-be-inside-the-AI, this could possibly simplify some particular approaches to alignment, that would still in fact be too hard because nobody has a way of getting an AI to point at anything.

I think it's important to be able to make a narrow point about outer alignment without needing to defend a broader thesis about the entire alignment problem. To the extent my argument is "outer alignment seems easier... (read more)

9Martin Randall6mo

Indeed. For it is written:

Eliezer Yudkowsky6mo2830

Your distinction between "outer alignment" and "inner alignment" is both ahistorical and unYudkowskian. It was invented years after this post was written, by someone who wasn't me; and though I've sometimes used the terms in occasions where they seem to fit unambiguously, it's not something I see as a clear ontological division, especially if you're talking about questions like "If we own the following kind of blackbox, would alignment get any easier?" which on my view breaks that ontology. So I strongly reject your frame that this post was "cl... (read more)

Against empathy-by-default

Matthew Barnett6mo*40

I’m still kinda confused. You wrote “But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.” I want to translate that as: “All this talk of stabbing people in the back is irrelevant, because there is practically never a situation where it’s in somebody’s self-interest to act unkind and stab someone in the back. So (A) is really just fine!” I don’t think you’d endorse that, right? But it is a possible position—I tend to associate it with @Matthew Barnett. I agree that we

... (read more)

2Steven Byrnes6mo

Sorry for oversimplifying your views, thanks for clarifying. :) Here’s a part I especially disagree with: Just to be clear, I imagine we’ll both agree that if some behavior is always a good idea, it can turn into an unthinking habit. For example, today I didn’t take all the cash out of my wallet and shred it—not because I considered that idea and decided that it’s a bad idea, but rather because it never crossed my mind to do that in the first place. Ditto with my (non)-decision to not plan a coup this morning. But that’s very fragile (it relies on ideas not crossing my mind), and different from what you’re talking about. My belief is: Neurotypical people have an innate drive to notice, internalize, endorse, and take pride in following social norms, especially behaviors that they imagine would impress the people whom they like and admire in turn. (And I have ideas about how this works in the brain! I think it’s mainly related to what I call the “drive to be liked / admired”, general discussion here, more neuroscience details coming soon I hope.) The object-level content of these norms is different in different cultures and subcultures and times, for sure. But the special way that we relate to these norms has an innate aspect; it’s not just a logical consequence of existing and having goals etc. How do I know? Well, the hypothesis “if X is generally a good idea, then we’ll internalize X and consider not-X to be dreadfully wrong and condemnable” is easily falsified by considering any other aspect of life that doesn’t involve what other people will think of you. It’s usually a good idea to wear shoes that are comfortable, rather than too small. It’s usually a good idea to use a bookmark instead of losing your place every time you put your book down. It’s usually a good idea to sleep on your bed instead of on the floor next to it. Etc. But we just think of all those things as good ideas, not moral rules; and relatedly, if the situation changes such that those things

The Sun is big, but superintelligences will not spare Earth a little sunlight

Matthew Barnett6mo9-2

Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other.

I think this basically isn't true, especially the last part. It's not that humans don't have some level of empathy for each other; they do. I just don't think that's the reason why competitive capitalism works well for humans. I think the reason is instead because people have selfish interests in maintaining the system.

We don't let Jeff Bezos accumulate billions of dollars purely out of the kindn... (read more)

5Seth Herd6mo

Maybe, I think it's hard to say how captiolism would work if everyone had zero empathy or compassion. But that doesn't matter for the issue at hand. Greed or empathy aside, capitalism currently works because people have capabilities that can't be expanded without limit and people can't be created quickly using capitol. If ai labor can do every task for a thousandth the cost, and new lai labor created at need, we all die if competition is the system. We will be employed for zero tasks. The factor you mention, sub ASI systems, makes the situation worse, not better. Maybe you're saying we'd be employed for a while, which might be true. But in the limit, even an enhanced human is only going to have value as a novelty. Which ai probably won't care about if it isn't aligned at all. And even if it does, that leads to a few humans surviving as performing monkeys. I just don't see how else humans remain competitive with ever improving machines untethered to biology.

The Hidden Complexity of Wishes

Matthew Barnett6mo*4814

It has come to my attention that this article is currently being misrepresented as proof that I/MIRI previously advocated that it would be very difficult to get machine superintelligences to understand or predict human values. This would obviously be false, and also, is not what is being argued below. The example in the post below is not about an Artificial Intelligence literally at all! If the post were about what AIs supposedly can't do, the central example would have used an AI! The point that is made below will be about the algorithmic complexity of hu

... (read more)

Eliezer Yudkowsky6mo2625

The post is about the complexity of what needs to be gotten inside the AI. If you had a perfect blackbox that exactly evaluated the thing-that-needs-to-be-inside-the-AI, this could possibly simplify some particular approaches to alignment, that would still in fact be too hard because nobody has a way of getting an AI to point at anything. But it would not change the complexity of what needs to be moved inside the AI, which is the narrow point that this post is about; and if you think that some larger thing is not correct, you should not confuse... (read more)

2Max H6mo

I want to push back on this a bit. I suspect that "demonstrated progress" is doing a lot of work here, and smuggling an assumption that current trends with LLMs will continue and can be extrapolated straightforwardly. It's true that LLMs have some nice properties for encapsulating fuzzy and complex concepts like human values, but I wouldn't actually want to use any current LLMs as a referent or in a rating system like the one you propose, for obvious reasons. Maybe future LLMs will retain all the nice properties of current LLMs while also solving various issues with jailbreaking, hallucination, robustness, reasoning about edge cases, etc. but declaring victory already (even on a particular and narrow point about value identification) seems premature to me. ---------------------------------------- Separately, I think some of the nice properties you list don't actually buy you that much in practice, even if LLM progress does continue straightforwardly. A lot of the properties you list follow from the fact that LLMs are pure functions of their input (at least with a temperature of 0). Functional purity is a very nice property, and traditional software that encapsulates complex logic in pure functions is often easier to reason about, debug, and formally verify vs. software that uses lots of global mutable state and / or interacts with the outside world through a complex I/O interface. But when the function in question is 100s of GB of opaque floats, I think it's a bit of a stretch to call it transparent and legible just because it can be evaluated outside of the IO monad. Aside from purity, I don't think your point about an LLM being a "particular function" that can be "hooked up to the AI directly" is doing much work - input() (i.e. asking actual humans) seems just as direct and particular as llm(). If you want your AI system to actually do something in the messy real world, you have to break down the nice theoretical boundary and guarantees you get from functi

Raemon6mo184

a) I think at least part of what's gone on is that Eliezer has been misunderstood and facing the same actually quite dumb arguments a lot, and he is now (IMO) too quick to round new arguments off to something he's got cached arguments for. (I'm not sure whether this is exactly what went on in this case, but seems plausible without carefully rereading everything)

b) I do think when Eliezer wrote this post, there were literally a bunch of people making quite dumb arguments that were literally "the solution to AI ethics/alignment is [my preferred elegant syste... (read more)

9Seth Herd6mo

I think this is worth a new top-level post. I think the discussion on your Evaluating the historical value misspecification argument was a high-water mark for resolving the disagreement on alignment difficulty between old-schoolers and new prosaic alignment thinkers. But that discussion didn't make it past the point you raise here: if we can identify human values, shouldn't that help (a lot) in making an AGI that pursues those values? One key factor is whether the understanding of human values is available while the AGI is still dumb enough to remain in your control. I tried to progress this line of discussion in my The (partial) fallacy of dumb superintelligence and Goals selected from learned knowledge: an alternative to RL alignment.

TsviBT6mo335

Alice: I want to make a bovine stem cell that can be cultured at scale in vats to make meat-like tissue. I could use directed evolution. But in my alternate universe, genome sequencing costs $1 billion per genome, so I can't straightforwardly select cells to amplify based on whether their genome looks culturable. Currently the only method I have is to do end-to-end testing: I take a cell line, I try to culture a great big batch, and then see if the result is good quality edible tissue, and see if the cell line can last for a year without mutating beyond re... (read more)

DanielFilan's Shortform Feed

Matthew Barnett7mo90

The point that a capabilities overhang might cause rapid progress in a short period of time has been made by a number of people without any connections to AI labs, including me, which should reduce your credence that it's "basically, total self-serving BS".

More to the point of Daniel Filan's original comment, I have criticized the Responsible Scaling Policy document in the past for failing to distinguish itself clearly from AI pause proposals. My guess is that your second and third points are likely mostly correct: AI labs think of an RSP as different from... (read more)

3Amalthea7mo

I think it's not an unreasonable point to take into account when talking price, but also a lot of the time it's serves as a BS talking point for people who don't really care about the subtleties.