LESSWRONG
LW

All of rvnnt's Comments + Replies

It's unclear whether there is a tipping point where [...]

Yes. Also unclear whether the 90% could coordinate to take any effective action, or whether any effective action would be available to them. (Might be hard to coordinate when AIs control/influence the information landscape; might be hard to rise up against e.g. robotic law enforcement or bioweapons.)

Don't use passive voice for this. [...]

Good point! I guess one way to frame that would be as

by what kind of process do the humans in law enforcement, military, and intelligence agencies get repla

... (read more)

rvnnt's Shortform

rvnnt2mo21

A potentially somewhat important thing which I haven't seen discussed:

People who have a lot of political power or own a lot of capital, are unlikely to be adversely affected if (say) 90% of human labor becomes obsolete and replaced by AI.
In fact, so long as property rights are enforced, and humans retain a monopoly on decisionmaking/political power, such people are not-unlikely to benefit from the economic boost that such automation would bring.
Decisions about AI policy are mostly determined by people with a lot of capital or political power. (E.g. And

... (read more)

3Dagon2mo

That's certainly the hope of the powerful. It's unclear whether there is a tipping point where the 90% decide not to respect the on-paper ownership of capital. Don't use passive voice for this. Who is enforcing which rights, and how well can they maintain the control? This is a HUGE variable that's hard to control in large-scale social changes.

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

rvnnt2mo84

Thank you for (being one of the horrifyingly few people) doing sane reporting on these crucially important topics.

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

rvnnt2mo20

Typo: "And humanity needs all the help we it can get."

1garrison2mo

Fixed, thanks!

Altman blog on post-AGI world

rvnnt2mo21

Out of (1)-(3), I think (3)^[1] is clearly most probable:

I think (2) would require Altman to be deeply un-strategic/un-agentic, which seems in stark conflict with all the skillful playing-of-power-games he has displayed.
(3) seems strongly in-character with the kind of manipulative/deceitful maneuvering-into-power he has displayed thus far.
I suppose (1) is plausible; but for that to be his only motive, he would have to be rather deeply un-strategic (which does not seem to be the case).

(Of course one could also come up with other possibilities besides (... (read more)

Should you publish solutions to corrigibility?

rvnnt3mo10

Note that our light cone with zero value might also eclipse other light cones that might've had value if we didn't let our AGI go rogue to avoid s-risk.

That's a good thing to consider! However, taking Earth's situation as a prior for other "cradles of intelligence", I think that consideration returns to the question of "should we expect Earth's lightcone to be better or worse than zero-value (conditional on corrigibility)?"

Should you publish solutions to corrigibility?

rvnnt3mo10

To me, those odds each seem optimistic by a factor of about 1000, but ~reasonable relative to each other.

(I don't see any low-cost way to find out why we disagree so strongly, though. Moving on, I guess.)

But this isn't any worse to me than being killed [...]

Makes sense (given your low odds for bad outcomes).

Do you also care about minds that are not you, though? Do you expect most future minds/persons that are brought into existence to have nice lives, if (say) Donald "Grab Them By The Pussy" Trump became god-emperor (and was the one deciding what persons/minds get to exist)?

Should you publish solutions to corrigibility?

rvnnt3mo10

IIUC, your model would (at least tentatively) predict that

if person P has a lot of power over person Q,
and P is not sadistic,
and P is sufficiently secure/well-resourced that P doesn't "need" to exploit Q,
then P will not intentionally do anything that would be horrible for Q?

If so, how do you reconcile that with e.g. non-sadistic serial killers, rapists, or child abusers? Or non-sadistic narcissists in whose ideal world everyone else would be their worshipful subject/slave?

That last point also raises the question: Would you prefer the existence of lo... (read more)

Should you publish solutions to corrigibility?

rvnnt3mo10

It seems like you're assuming people won't build AGI if they don't have reliable ways to control it, or else that sovereign (uncontrolled) AGI would be likely the be friendly to humanity.

I'm assuming neither. I agree with you that both seem (very) unlikely. ^[1]

It seems like you're assuming that any humans succeeding in controlling AGI is (on expectation) preferable to extinction? If so, that seems like a crux: if I agreed with that, then I'd also agree with "publish all corrigibility results".

I expect that unaligned ASI would lead to extinction, and

... (read more)

2Seth Herd3mo

I see. I think about 99% of humanity at the very least are not so sadistic as to create a future with less than zero utility. Sociopaths are something like ten percent of the population, but like everything else it's on a spectrum. Sociopaths usually also have some measure of empathy and desire for approval. In a world where they've won, I think most of them would rather be bailed as a hero than be an eternal sadistic despot. Sociopathathy doesn't automatically include a lot of sadism, just desire for revenge against perceived enemies. So I'd take my chances with a human overlord far before accepting extinction. Note that our light cone with zero value might also eclipse other light cones that might've had value if we didn't let our AGI go rogue to avoid s-risk.

Should you publish solutions to corrigibility?

rvnnt3mo10

It's more important to defuse the bomb than it is to prevent someone you dislike from holding it.

I think there is a key disanalogy to the situation with AGI: The analogy would be stronger if the bomb was likely to kill everyone, but also had a some (perhaps very small) probability of conferring godlike power to whomever holds it. I.e., there is a tradeoff: decrease the probability of dying, at the expense of increasing the probability of S-risks from corrupt(ible) humans gaining godlike power.

If you agree that there exists that kind of tradeoff, I'm cur... (read more)

3Charlie Steiner3mo

I give the probability that some authority figure would use an order-following AI to get torturous revenge on me (probably for being part of a group they dislike) is quite slim. Maybe one in a few thousand, with more extreme suffering being less likely by a few more orders of magnitude? The probablility that they have me killed for instrumental reasons, or otherwise waste the value of the future by my lights, is mich higher - ten percent-ish, depends on my distribution over who's giving the orders. But this isn't any worse to me than being killed by an AI that wants to replace me with molecular smiley faces.

Should you publish solutions to corrigibility?

Answer by rvnntJan 30, 202510

Taking a stab at answering my own question; an almost-certainly non-exhaustive list:

Would the results be applicable to deep-learning-based AGIs?^[1] If I think not, how can I be confident they couldn't be made applicable?
Do the corrigibility results provide (indirect) insights into other aspects of engineering (rather than SGD'ing) AGIs?
How much weight one gives to avoiding x-risks vs s-risks.^[2]
Who actually needs to know of the results? Would sharing the results with the whole Internet lead to better outcomes than (e.g.) sharing the results wit

rvnnt3mo10

I think the main value of that operationalization is enabling more concrete thinking/forecasting about how AI might progress. Models some of the relevant causal structure of reality, at a reasonable level of abstraction: not too nitty-gritty^[1], not too abstract^[2].

which would lead to "losing the forest for the trees", make the abstraction too effortful to use in practice, and/or risk making it irrelevant as soon as something changes in the world of AI ↩︎
e.g. a higher-level abstraction like "AI that speeds up AI development by a factor of N" might at

rvnnt3mo30

I think this approach to thinking about AI capabilities is quite pertinent. Could be worth including "Nx AI R&D labor AIs" in the list?

3Algon3mo

Thanks for the recommendation! I liked ryan's sketches of what capabilities an Nx AI R&D labor AIs might possess. Makes things a bit more concrete. (Though I definitely don't like the name.) I'm not sure if we want to include this definition, as it is pretty niche. And I'm not convinced of its utility. When I tried drafting a paragraph describing it, I struggled to articulate why readers should care about it. Here's the draft paragraph. "Nx AI R&D labor AIs: The level of AI capabilities that is necessary for increasing the effective amount of labor working on AI research by a factor of N. This is not the same thing as the capabilities required to increase AI progress by a factor of N, as labor is just one input to AI progress. The virtues of this definition include: ease of operationalization, [...]"

Daniel Kokotajlo's Shortform

rvnnt3mo30

Cogent framing; thanks for writing it. I'd be very interested to read your framing for the problem of "how do we get to a good future for humanity, conditional on the first attractor state for AGI alignment?"^[1]

Would you frame it as "the AGI lab leadership alignment problem"? Or a governance problem? Or something else? ↩︎

Daniel Kokotajlo3mo124

Here is a brainstorm of the big problems that remain once we successfully get into the first attractor state:

Concentration of power / power grab risk. Liberal democracy does not work by preventing terrible people from getting into positions of power; it works by spreading out the power in a system of checks and balances and red tape transparency (free press, free speech) and term limits, that functions to limit what the terrible people can do in power. Once we get to ASI, the ASI project will determine the course of the future, not the traditional governme

... (read more)

Orienting to 3 year AGI timelines

rvnnt4mo21

Thanks for the answer. It's nice to get data about how other people think about this subject.

the concern that the more sociopathic people wind up in positions of power is the big concern.

Agreed!

Do I understand correctly: You'd guess that

99% of humans have a "positive empathy-sadism balance",
and of those, 90-99% could be trusted to control the world (via controlling ASI),
i.e., ~89-98% of humanity could be trusted to control the world with ASI-grade power?

If so, then I'm curious -- and somewhat bewildered! -- as to how you arrived at those guesses/... (read more)

Orienting to 3 year AGI timelines

rvnnt4mo10

I'd be interested to see that draft as a post!

What fraction of humans in set X would you guess have a "positive empathy-sadism balance", for

X = all of humanity?
X = people in control of (governmental) AGI projects?

I agree that the social environment / circumstances could have a large effect on whether someone ends up wielding power selfishly or benevolently. I wonder if there's any way anyone concerned about x/s-risks could meaningfully affect those conditions.

I'm guessing^[1] I'm quite a bit more pessimistic than you about what fraction of humans would... (read more)

6Seth Herd4mo

Yep, the concern that the more sociopathic people wind up in positions of power is the big concern. However, I don't think power is correlated with sadism and hopefully it's anticorrelated. I'd guess 99% of humanity and like 95% of people in control of AGI projects. Maybe similar for those high in the US government - but not in dictatorships where I think sadism and sociopathy win. I didn't finish that post because I was becoming more uncertain while writing it. A lot of hereditary monarchs have been pretty good rulers (this seems like the closest historical analogy to having AGI-level power over the world). But a lot were really bad rulers, too. That seemed to happen when a social group around them just didn't care about the commoners and got the monarchs interested in their own status games. That could happen with some who controlled an AGI. I guess they're guaranteed to be less naive than hereditary monarchs since all the candidates are adults who've earned power. Hopefully that would make them more likely to at least occasionally consider the lot of the commoner. One of the things that gave me some optimism was considering the long term. A lot of people are selfish and competitive now. But getting absolute control would over time make them less competitive. And it would be so easy to benefit humanity, just by telling your slave AGI to go make it happen. A lot of people would enjoy being hailed as a benevolent hero who's shepherded humanity into a new golden age. Anyway, I'm not sure.

Orienting to 3 year AGI timelines

rvnnt4mo10

I agree that "strengthening democracy" sounds nice, and also that it's too vague to be actionable. Also, what exactly would be the causal chain from "stronger democracy" (whatever that means) to "command structure in the nationalized AGI project is trustworthy and robustly aligned to the common good"?

If you have any more concrete ideas in this domain, I'd be interested to read about them!

Orienting to 3 year AGI timelines

rvnnt4mo20

Pushing for nationalization or not might affect when it's done, giving some modicum of control.

I notice that I have almost no concrete model of what that sentence means. A couple of salient questions^[1] I'd be very curious to hear answers to:

What concrete ways exist for affecting when (and how) nationalization is done? (How, concretely, does one "push" for/against nationalization of AGI?)
By what concrete causal mechanism could pushing for nationalization confer a modicum of control; and control over what exactly, and to whom?

Other questions

... (read more)

Orienting to 3 year AGI timelines

rvnnt4mo42

make their models sufficiently safe

What does "safe" mean, in this post?

Do you mean something like "effectively controllable"? If yes: controlled by whom? Suppose AGI were controlled by some high-ranking people at (e.g.) the NSA; with what probability do you think that would be "safe" for most people?

5Seth Herd4mo

That is very likely what "safe" means. Instruction-following AGI is easier and more likely than value aligned AGI. It seems very likely to be the default alignment goal as soon as someone thinks seriously about what they want their AGI aligned to. As for whether it's actually good for most people: it depends entirely on who in the NSA controls it. There are very probably both good (ethically typical) and bad (sociopathic/sadistic) people there. I have a whole draft speculating on which people could be trusted to control the world by controlling an AGI as it becomes ASI; I think it's between 90 and 99% of people who have a "positive empathy-sadism balance". But I'm not at all sure; it depends on who they're surrounded by and the circumstances. Being in conflict with other AGI wielders gives lots more room for negative emotions to dominate. And it could be bad for most people even if it's good in the much longer run.

Orienting to 3 year AGI timelines

rvnnt4mo92

Doing nationalization right

I think this post (or the models/thinking that generated it) might be missing an important consideration^[1]: "Is it possible to ensure that the nationalized AGI project does not end up de facto controlled by not-good people? If yes, how?"

Relevant quote from Yudkowsky's Six Dimensions of Operational Adequacy in AGI Projects (emphasis added):

Opsec [...] Military-grade or national-security-grade security. (It's hard to see how attempts to get this could avoid being counterproductive, considering the difficulty of obtaining tru

... (read more)

3Seth Herd4mo

This is an important consideration. I don't think that government power travels inevitably to bad hands, but I do think it happens far too often. Strengthening democracy is the one useful move I can think of here, but that's pretty vague. Not pushing for nationalization doesn't seem like a useful response. It will be soft-nationalized sooner or later; takeoff is going to be too slow for the AGI to outwit the US national security apparatus before they figure out what a big deal it is. Pushing for nationalization or not might affect when it's done, giving some modicum of control.

Hierarchical Agency: A Missing Piece in AI Alignment

rvnnt5mo32

A related pattern-in-reality that I've had on my todo-list to investigate is something like "cooperation-enforcing structures". Things like

legal systems, police
immune systems (esp. in suppressing cancer)
social norms, reputation systems, etc.

I'd been approaching this from a perspective of "how defeating Moloch can happen in general" and "how might we steer Earth to be less Moloch-fucked"; not so much AI safety directly.

Do you think a good theory of hierarchical agency would subsume those kinds of patterns-in-reality? If yes: I wonder if their inclusion could be used as a criterion/heuristic for narrowing down the search for a good theory?

3Noosphere895mo

Most of the basis of cooperation enforcing structures, I'd argue rests on 2 general principles: 1. An iterated game, such that there is an equilibrium for cooperation, and 2. The ability to enforce a threat of violence if a player defects, ideally credibly, and often extends to a monopoly on violence. Once you have those, cooperative equilibria become possible.

Another argument against utility-centric alignment paradigms

rvnnt7mo30

find some way to argue that "generally intelligent world-optimizing agents" and "subjects of AGI-doom arguments" are not the exact same type of system

We could maybe weaken this requirement? Perhaps it would suffice to show/argue that it's feasible^[1] to build any kind of "acute risk period -ending AI"^[2] that is not a "subject of AGI-doom arguments"?

I'd be (very) curious to see such arguments. ^[3]

within time constraints, before anyone else builds a "subject of AGI-doom arguments" ↩︎
or, "AIs that implement humanity's CEV" ↩︎
If I became convinced

... (read more)

4Thane Ruthenis7mo

Eh, the way I phrased that statement, I'd actually meant that an AGI aligned to human values would also be a subject of AGI-doom arguments, in the sense that it'd exhibit instrumental convergence, power-seeking, et cetera. It wouldn't do that in the domains where that'd be at odds with its values – for example, in cases where that'd be violating human agency —but that's true of all other AGIs as well. (A paperclip-maximizer wouldn't erase its memory of what "a paperclip" is to free up space for combat plans.) In particular, that statement certainly weren't intended as a claim that an aligned AGI is impossible. Just that its internal structure would likely be that of an embedded agent, and that if the free parameter of its values were changed, it'd be an extinction threat.

8Noosphere897mo

My short answer is that the argument would consist of human values are quite simple and are most likely a reasonably natural abstraction, and the felt complexity is due to adding both the complexity of the generators and the data, which people wouldn't do for AI capabilities, meaning the bitter lesson holds for human values and morals as well. Also, the way AI is aligned depends far more on the data that is given and our control over synthetic data means we can get AIs that follow human values before it gets too capable to take over everything, and evolutionary psychology mispredicted this and the above point pretty hard, making it lose many Bayes points compared to the Universal Learning Machine/Blank Slate hypotheses. Alignment generalizes further than capabilities for pretty deep reasons, contra Nate Soares but basically it's way easier to have an AI care about human values than it is to get it to be capable in real-world domains, combined with verification being easier than generation. Finally, there is evidence that AIs are far more robust to errors than people thought 15-20 years ago. In essence, it's a negation of the following: 1. Fragility and Complexity of Value 2. Pretty much all of evolutionary psychology literature. 3. Capabilities generalizing further than alignment. 4. The Sharp Left Turn.

AI, centralization, and the One Ring

rvnnt7mo10

I think this is an important subject and I agree with much of this post. However, I think the framing/perspective might be subtly but importantly wrong-or-confused.

To illustrate:

How much of the issue here is about the very singular nature of the One dominant project, vs centralization more generally into a small number of projects?

Seems to me that centralization of power per se is not the problem.

I think the problem is something more like

we want to give as much power as possible to "good" processes, e.g. a process that robustly pursues humanity's CE

... (read more)

TurnTrout's shortform feed

rvnnt8mo50

Upvoted and disagreed. ^[1]

One thing in particular that stands out to me: The whole framing seems useless unless Premise 1 is modified to include a condition like

[...] we can select a curriculum and reinforcement signal which [...] and which makes the model highly "useful/capable".

Otherwise, Premise 1 is trivially true: We could (e.g.) set all the model's weights to 0.0; thereby guaranteeing the non-entrainment of any ("bad") circuits.

I'm curious: what do you think would be a good (...useful?) operationalization of "useful/capable"?

Another issue: K and ... (read more)

Forecasting: the way I think about it

rvnnt1y10

In Fig 1, is the vertical axis P(world) ?

1Molly1y

Good q, yes, that's the vertical axis in all the figures.

AI Clarity: An Initial Research Agenda

rvnnt1y10

Possibly a nitpick, but:

The development and deployment of AGI, or similarly advanced systems, could constitute a transformation rivaling those of the agricultural and industrial revolutions.

seems like a very strong understatement. Maybe replace "rivaling" with e.g. "(vastly) exceeding"?

AI #56: Blackwell That Ends Well

rvnnt1y50

Referring to the quote-picture from the Nvidia GTC keynote talk: I searched the talk's transcript, and could not find anything like the quote.

Could someone point out time-stamps of where Huang says (or implies) anything like the quote? Or is the quote entirely made up?

2Tapatakt1y

I join the request. It's a good meme if and only if it is true.

7. Evolution and Ethics

rvnnt1y10

That clarifies a bunch of thing. Thanks!

7. Evolution and Ethics

rvnnt1y21

I'm not sure I understand what the post's central claim/conclusion is. I'm curious to understand it better. To focus on the Summary:

So overall, evolution is the source of ethics,

Do you mean: Evolution is the process that produced humans, and strongly influenced humans' ethics? Or are you claiming that (humans') evolution-induced ethics are what any reasonable agent ought to adhere to? Or something else?

and sapient evolved agents inherently have a dramatically different ethical status than any well-designed created agents [...]

...according to some h... (read more)

2RogerDearnaley1y

1. Evolution solves the "is-from-ought" problem: it explains how goal-directed (also known as agentic) behavior arises in a previously non-goal-directed universe. 2. In intelligent social species, where different individuals with different goals interact and are evolved to cooperate by exchanges of mutual altruism, means of reconciling those differing goals, including definitions of 'unacceptable and worthy of revenge' behavior evolves, such as distinctions between fair and unfair behavior. So now you have a basic but recognizable form of ethics, or at least ethical inuitions. So my claim is that Evolutionary psychology, as applied to intelligent social species (such as humans), explains the origin of ethics. Depending on the details of the social species, their intelligence, group size, and so forth, a lot of features of the resulting evolved ethical instincts may vary, but some basics (such as 'fairness') are probably going to be very common. The former. (To the extent that there's any stronger claim, it's made in the related post Requirements for a Basin of Attraction to Alignment,) If you haven't read Part 1 of this sequence, it's probably worth doing so first, and then coming back to this. As I show there, a constructed agent being aligned its creating evolved species is incompatible with it wanting moral patienthood . If a tool-using species constructs something, it ought (in the usual sense of 'this is the genetic-fitness-maximizing optimal outcome of the activity being attempted, which may not be fully achieved in a specific instance') to construct something that will be useful to it. If they are constructing an intelligent agent that will have goals and attempt to achieve specific outcomes, they ought to construct something well-designed that will achieve the same outcomes that they, its creators, want, not some random other things. Just as, if they're constructing a jet plane, they ought to construct a well-designed one that will safely and economica

Conversation Visualizer

rvnnt1y30

I wonder how much work it'd take to implement a system that incrementally generates a graph of the entire conversation. (Vertices would be sub-topics, represented as e.g. a thumbnail image + a short text summary.) Would require the GPT to be able to (i.a.) understand the logical content of the discussion, and detect when a topic is revisited, etc. Could be useful for improving clarity/productivity of conversations.

Vote on Interesting Disagreements

rvnnt1y10

One of the main questions on which I'd like to understand others' views is something like: Conditional on sentient/conscious humans^[1] continuing to exist in an x-risk scenario^[2], with what probability do you think they will be in an inescapable dystopia^[3]?

(My own current guess is that dystopia is very likely.)

or non-human minds, other than the machines/Minds that are in control ↩︎
as defined by Bostrom, i.e. "the permanent and drastic destruction of [humanity's] potential for desirable future development" ↩︎
Versus e.g. just limited to a small dis

... (read more)

Vote on Interesting Disagreements

rvnnt1y30

That makes sense; but:

so far outside the realm of human reckoning that I'm not sure it's reasonable to call them dystopian.

setting aside the question of what to call such scenarios, with what probability do you think the humans^[1] in those scenarios would (strongly) prefer to not exist?

or non-human minds, other than the machines/Minds that are in control ↩︎

9orthonormal1y

I expect AGI to emerge as part of the frontier model training run (and thus get a godshatter of human values), rather than only emerging after fine-tuning by a troll (and get a godshatter of reversed values), so I think "humans modified to be happy with something much cheaper than our CEV" is a more likely endstate than "humans suffering" (though, again, both much less likely than "humans dead").

Vote on Interesting Disagreements

rvnnt1y20

non-extinction AI x-risk scenarios are unlikely

Many people disagreed with that. So, apparently many people believe that inescapable dystopias are not-unlikely? (If you're one of the people who disagreed with the quote, I'm curious to hear your thoughts on this.)

1rvnnt1y

One of the main questions on which I'd like to understand others' views is something like: Conditional on sentient/conscious humans[1] continuing to exist in an x-risk scenario[2], with what probability do you think they will be in an inescapable dystopia[3]? (My own current guess is that dystopia is very likely.) ---------------------------------------- 1. or non-human minds, other than the machines/Minds that are in control ↩︎ 2. as defined by Bostrom, i.e. "the permanent and drastic destruction of [humanity's] potential for desirable future development" ↩︎ 3. Versus e.g. just limited to a small disempowered population, but living in pleasant conditions? Or a large population living in unpleasant conditions, but where everyone at least has the option of suicide? ↩︎

5sapphire1y

Farmed animals are currently inside a non-extinction X-risk.

5orthonormal1y

Steelmanning a position I don't quite hold: non-extinction AI x-risk scenarios aren't limited to inescapable dystopias as we imagine them. "Kill all humans" is certainly an instrumental subgoal of "take control of the future lightcone" and it certainly gains an extra epsilon of resources compared to any form of not literally killing all humans, but it's not literally required, and there are all sorts of weird things the AGI could prefer to do with humanity instead depending on what kind of godshatter it winds up with, most of which are so far outside the realm of human reckoning that I'm not sure it's reasonable to call them dystopian. (Far outside Weirdtopia, for that matter.) It still seems very likely to me that a non-aligned superhuman AGI would kill humanity in the process of taking control of the future lightcone, but I'm not as sure of that as I'm sure that it would take control.

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y1-2

(Ah. Seems we were using the terms "(alignment) success/failure" differently. Thanks for noting it.)

In-retrospect-obvious key question I should've already asked: Conditional on (some representative group of) humans succeeding at aligning ASI, what fraction of the maximum possible value-from-Evolution's-perspective do you expect the future to attain? ^[1]

My modal guess is that the future would attain ~1% of maximum possible "Evolution-value".^[2]

If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a simil

... (read more)

jacob_cannell2y115

In general I think maximum values are weird because they are potentially nearly unbounded, but it sounds like we may then be in agreement absent terminology.

But in general I do not think of anything "less than 1% of the maximum value" as failure in most endeavors. For example the maximum attainable wealth is perhaps $100T or something, but I don't think it'd be normal/useful to describe the world's wealthiest people as failures at being wealthy because they only have ~$100B or whatever.

And regardless the standard doom arguments from EY/MIRI etc are very much "AI will kill us all!", and not "AI will prevent us from attaining over 1% of maximum future utility!"

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y10

Evolution has succeeded at aligning homo sapiens brains to date

I'm guessing we agree on the following:

Evolution shaped humans to have various context-dependent drives (call them Shards) and the ability to mentally represent and pursue complex goals. Those Shards were good proxies for IGF in the EEA^[1].
Those Shards were also good^[2] enough to produce billions of humans in the modern environment. However, it is also the case that most modern humans spend at least part of their optimization power on things orthogonal to IGF.

I think our disagreemen... (read more)

1jacob_cannell2y

I agree with your summary of what we agree on - that evolution succeeded at aligning brains to IGF so far. That was the key point of the OP. Before getting into World A vs World B, I need to clarify again that my standard for "success at alignment" is a much weaker criterion than you may be assuming. You seem to consider success to require getting near the maximum possible (ie large fraction) utility, which I believe is uselessly unrealistic. By success I simply mean not a failure, as in not the doom scenario of extinction or near zero utility. So Worlds A is still a partial success if there is some reasonable population of humans (say even just on the order of millions) in bio bodies or in detailed sims. I don't agree with this characterization - the EEA ended ~10k years ago and human fitness has exploded since then rather than collapsed to zero. It is a simple fact that according to any useful genetic fitness metric, human fitness has exploded with our exploding optimization power so far. I believe this is the dominate evidence, and it indicates: 1. If tech evolution is similar enough to bio evolution then we should roughly expect tech evolution to have a similar level of success 2. Likewise doom is unlikely unless the tech evolution process producing AGI has substantially different dynamics from the gene evolution process which produced brains See this comment for more on the tech/gene evolution analogy and potential differences. I don't think your evidence from "opinions of people you know" is convincing for the same reasons I don't think opinions from humans circa 1900 were much useful evidence for predicting the future of 2023. I don't think "humans explicitly optimizing for the goal of IGF" is even the correct frame to think of how human value learning works (see shard theory). As a concrete example, Elon Musk seems to be on track for high long term IGF, without consciously optimizing for IGF.

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y10

vast computation some of which is applied to ancestral simulations

I agree that a successful post-human world would probably involve a large amount^[1] of resources spent on simulating (or physically instantiating) things like humans engaging in play, sex, adventure, violence, etc. IOW, engaging in the things for which Evolution installed Shards in us. However, I think that is not the same as [whatever Evolution would care about, if Evolution could care about anything]. For the post-human future to be a success from Evolution's perspective, I think it wou... (read more)

2jacob_cannell2y

Any ancestral simulation will naturally be full of that, so it boils down to the simulation argument.

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y32

Humans have not put an end to biological life.

Yup. I, too, have noticed that.

Your doom^[1] predictions [...]

C'mon, man, that's obviously a misrepresentation of what I was saying. Or maybe my earlier comment failed badly at communication? In case that's so, here's an attempted clarification (bolded parts added):

If Evolution had a lot more time (than I expect it to have) to align humans to relative-gene-replication-count, before humans put an end to biological life , as they seem to me to be on track to do, based on things I have observed in the past

... (read more)

2jacob_cannell2y

Evolution has succeeded at aligning homo sapiens brains to date[1] - that is the historical evidence we have. I don't think most transhumanists explicitly want to end biological life, and most would find that abhorrent. Transcending to a postbiological state probably doesn't end biology any more/less than biology ended geology. The future is complex and unknown. Is the 'DNA' we are discussing the information content or the physical medium? Seems rather obvious it's the information that matters, not the medium. Transcendence to a posthuman state probably involves vast computation some of which is applied to ancestral simulations (which we may already be in) which enormously preserves and multiplies the info content of the DNA. ---------------------------------------- 1. Not perfectly of course but evolution doesn't do anything perfectly. It aligned brains well enough such that the extent of any misalignment was insignificant compared to the enormous utility our brains provided. ↩︎

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y60

evolution did in fact find some weird way to create humans who rather obviously consciously optimize for IGF! [...]

If Evolution had a lot more time to align humans to relative-gene-replication-count, before humans put an end to biological life, then sure, seems plausible that Evolution might be able to align humans very robustly. But Evolution does not have infinite time or "retries" --- humanity is in the process of executing something like a "sharp left turn", and seems likely to succeed long before the human gene pool is taken over by sperm bank donors and such.

0jacob_cannell2y

Humans have not put an end to biological life. Your doom predictions aren't evidence, and can't be used in any way in this analogy. To do so is just circular reasoning. "Sure brains haven't demonstrated misalignment yet, but they are about to because doom is coming! Therefor evolution fails at alignment and thus doom is likely!" For rational minds, evidence is strictly historical[1]. The evidence we have to date is that humans are enormously successful, despite any slight misalignment or supposed "sharp left turn". There are many other scenarios where DNA flourishes even after a posthuman transition. ---------------------------------------- 1. Look closely at how Solomonoff induction works, for example. Its world model is updated strictly from historical evidence, not its own future predictions. ↩︎

Evolution Solved Alignment (what sharp left turn?)

rvnnt2y31

The utility function is fitness: gene replication count (of the human defining genes) ^[1]

Seems like humans are soon going to put an end to DNA-based organisms, or at best relegate them to some small fraction of all "life". I.e., seems to me that the future is going to score very poorly on the gene-replication-count utility function, relative to what it would score if humanity (or individual humans) were actually aligned to gene-replication-count.

Do you disagree? (Do you expect the post-ASI future to be tiled with human DNA?)

Obviously Evolution doesn

rvnnt2y10

I mostly agree.

I also think that impact is very unevenly distributed over people; the most impactful 5% of people probably account for >70% of the impact. ^[1]

And if so, then the difference in positive impact between {informing the top 5%} and {broadcasting to the field in general on the open Internet} is probably not very large. ^[2]

Possibly also worth considering: Would (e.g.) writing a public post actually reach those few key people more effectively than (e.g.) sending a handful of direct/targeted emails? ^[3]

Talking about AI (alignment) here, but I

Answer by rvnntSep 17, 2023117

If {the reasoning for why AGI might not be near} comprises {a list of missing capabilities}, then my current guess is that the least-bad option would be to share that reasoning in private with a small number of relevant (and sufficiently trustworthy) people^[1].

(More generally, my priors strongly suggest keeping any pointers to AGI-enabling capabilities private.)

E.g. the most capable alignment researchers who seem (to you) to be making bad strategic decisions due to not having considered {the reasoning for why AGI might not be near}. ↩︎

2Kaj_Sotala2y

I think that sharing the reasoning in private with a small number of people might somewhat help with the "Alignment people specifically making bad strategic decisions that end up having major costs" cost, but not the others, and even then it would only help a small amount of the people working in alignment rather than the field in general.

Optimization, loss set at variance in RL

rvnnt2y10

I can't critique your plan, because I can't parse your writing. My suggestion would be to put some effort into improving the clarity of your writing. ^[1]

Even basic things, such as the avoidance of long sentences, sometimes with side notes included and separated from the main sentence by commas, rather than e.g. em dashes, and making the scopes of various syntactic structures unambiguous, could go a long way towards making your text more legible. ↩︎

How necessary is intuition, for advanced math?

rvnnt2y*-10

[...] bridge the "gap" between (less-precise proofs backed by advanced intuition) and (precise proofs simple enough for basically anyone to technically "follow").

Meta: Please consider using curly or square brackets ({} or []) for conceptual/grammatic grouping; please avoid overloading parentheses.

Simple alignment plan that maybe works

rvnnt2y32

Thumbs up for trying to think of novel approaches to solving the alignment problem.

Every time the model does something that harms the utility function of the dumber models, it gets a loss function.

A few confusions:

By "it gets a loss function", did you mean "it gets negative reward"?
If yes, doesn't this plan consist entirely of reinforcement learning? How does this "emulate Evolution"?
What exactly does the quoted sentence mean? Does the smarter model (S) receive RL signals proportional to... changes in the dumber agents' (D's) total utility?

Some p... (read more)

Where are the people building AGI in the non-dumb way?

Answer by rvnntJul 12, 202310

Tamsin Leake's project might match what you're looking for.

1Johannes C. Mayer2y

I feel like the thing that I'm hinting is not directly related to QACI. I'm talking about a specific way to construct an AGI where we write down all of the algorithms explicitly, whereas the QACI part of QACI, is about specifying an objective that is aligned when optimized very hard. It seems like, in the thing that I'm describing, you would get the alignment properties from a different place. You get them because you understand the algorithm of intelligence that you have written down very well. Whereas in QHCI, you get the alignment properties by successfully pointing to the causal process that is the human in the world that you want to "simulate" in order to determine the "actual objective". Just to clarify, when I say non-DUMB way, I mainly refer to using giant neural networks and just making them more capable in order to get to intelligent systems to be the DUMB way. And Tasman's thing seems to be one of the least DUMB things I have heard recently. I can't see how this obviously fails (yet), though, of course, this doesn't necessarily imply that it will succeed (though it is of cause possible).

[Linkpost] Introducing Superalignment

rvnnt2y00

[...] iteratively align superintelligence.

To align the first automated alignment researcher, [...]

To validate the alignment of our systems, [...]

What do they mean by "aligned"?

How do we ensure AI systems much smarter than humans follow human intent?

OK. Assuming that

sharp left turns are not an issue,
and scalable oversight is even possible in practice,
and OAI somehow solves the problems of
- AIs hacking humans (to influence their intents),
- and deceptive alignment,
- humans going crazy when given great power,
- etc.
- and all the problems no-one has

... (read more)

Palantir's AI models

rvnnt2y10

To what extent would you expect the government's or general populace's responses to "Robots with guns" to be helpful (or harmful) for mitigating risks from superintelligence? (Would getting them worried about robots actually help with x-risks?)

The ones who endure

rvnnt2y157

Right; that would be a silly thing to think.

My intended message might've been better worded as follows

If staring into abysses is difficult/rough, then adequately staring into the darker abysses might require counter-intuitively large amounts of effort/agency. And yet, I think it might be necessary to grok those darker abysses, if we are to avoid falling into them. That makes me worried.

OTOH, you seem exceptionally reflective, so perhaps that worry is completely unfounded in your case. Anyway, I'm grateful for the work you do; I wish there were more peo... (read more)

The ones who endure

rvnnt2y169

When people call things like this post "rough to write/read", and consider them to require a content warning, I wonder if most people are able to think clearly (or at all) about actually terrible scenarios, and worry that they aren't. (I'm especially worried if those people have influence in a domain where there might be a tradeoff between mitigating X-risks vs mitigating S-risks.)

I liked the description of the good future, though. Thanks for the reminder that things can (maybe) go well, too.

0nim2y

A 10lb weight might be tough for one person to do a particular exercise with, but trivial for another person. Does that mean the weight shouldn't be labeled? How would we quantify relative weights if we didn't have mass as a globally agreed-upon metric to order them by? I think we could do better-than-nothing by comparing relative weight: In a world without lbs and kg, a gym could still mark one dumbbell as being the smallest, the next one up as being above the smallest, the next as being above the second, and so on. In the domain of content, we similarly lack an objective scale of absolute weight, but it's still often better than nothing to flag "heavier than you might be expecting". Do you get annoyed at stores that put out a "wet floor" sign after mopping, if you personally happen to have good balance and be wearing non-slip shoes? Or do you accept that convention as existing to help people who are in a worse situation than yours, such as being infirm or poorly shod, avoid a kind of mishap that's not relevant to you?

Richard_Ngo2y122

Whenever people are sad for any reason except s-risk, I wonder if they're able to think at all about important issues. /s

Instrumental Convergence? [Draft]

rvnnt2y00

Thanks for the response.

To the extent that I understand your models here, I suspect they don't meaningfully bind/correspond to reality. (Of course, I don't understand your models at all well, and I don't have the energy to process the whole post, so this doesn't really provide you with much evidence; sorry.)

I wonder how one could test whether or not the models bind to reality? E.g. maybe there are case examples (of agents/people behaving in instrumentally rational ways) one could look at, and see if the models postdict the actual outcomes in those examples?

4J. Dmitri Gallow2y

There's nothing unusual about my assumptions regarding instrumental rationality. It's just standard expected utility theory. The place I see to object is with my way of spreading probabilities over Sia's desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia's desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with---I was trying to be charitable to the thesis, & interpreting it in light of the orthogonality thesis. But if that's not the right way to distribute probability over potential desires, if it's not the right way of understanding the thesis, then I'd like to hear something about what the right way of understanding it is.