flandry39 - LessWrong

The Failed Strategy of Artificial Intelligence Doomers

There are a lot of issues with the article cited above. Due to the need for more specific text formatting, I wrote up my notes, comments, and objections here:

http://mflb.com/ai_alignment_1/d_250206_asi_policies_gld.html

The Failed Strategy of Artificial Intelligence Doomers

flandry395mo10

I really liked your quote and remarks. So much so, that I made an edited version of them as a new post here: http://mflb.com/ai_alignment_1/d_250207_insufficient_paranoia_gld.html

What if Alignment is Not Enough?

flandry395mo21

The only general remarks that I want to make
are in regards to your question about
the model of 150 year long vaccine testing
on/over some sort of sample group and control group.

I notice that there is nothing exponential assumed
about this test object, and so therefore, at most,
the effects are probably multiplicative, if not linear.
Therefore, there are lots of questions about power dynamics
that we can overall safely ignore, as a simplification,
which is in marked contrast to anything involving ASI.

If we assume, as you requested, "no side effects" observed,
in any test group, for any of those things
that we happened to be thinking of, to even look for,
then for any linear system, that is probably "good enough".
But for something that is know for sure to be exponential,
that by itself is not anywhere enough to feel safe.

But what does this really mean?

Since the common and prevailing (world) business culture
is all about maximal profit, and therefore minimal cost,
and also to minimize any possible future responsibility
(or cost) in case anything with the vax goes badly/wrong,
then for anything that might be in the possible category
of unknown unknown risk, I would expect that company
to want to maintain sort of some plausible deniability --
ie; to not look so hard for never-before-seen effects.
Or to otherwise ignore that they exist, or matter, etc.
(just like throughout a lot of ASI risk dialogue).

If there is some long future problem that crops up,
the company can say "we never looked for that"
and "we are not responsible for the unexpected",
because the people who made the deployment choices
have taken their profits and their pleasure in life,
and are now long dead. "Not my Job".

"Don't blame us for the sins of our forefathers".
Similarly, no one is going to ever admit or concede
any point, of any argument, on pain of ego death.
No one will check if it is an exponential system.

So of course, no one is going to want to look into
any sort of issues distinguishing the target effects,
from the also occurring changes in world equilibrium.
They will publish their glowing sanitized safety report,
deploy the product anyway, regardless, and make money.

"Pollution in the world is a public commons problem" --
so no corporation is held responsible for world states.
It has become "fashionable" to ignore long term evolution,
and to also ignore and deny everything about the ethics.

But this does not make the issue of ASI x-risk go away.
X-risks are the generally result of exponential process,
and so the vaccine example is not really that meaningful.

With the presumed ASI levels of actually exponential power,
this is not so much about something like pollution,
as it is about maybe igniting the world atmosphere,
via a mistake in the calculations of the Trinity Test.
Or are you going to deny that Castle Bravo is a thing?

Beyond this one point, my feeling is that your notions
have become a bit too fanciful for me to want respond
too seriously. You can, of course, feel free to
continue to assume and presume whatever you want,
and therefore reach whatever conclusions you want.

What if Alignment is Not Enough?

flandry395mo25

> Humans do things in a monolithic way,
> not as "assemblies of discrete parts".

Organic human brains have multiple aspects.
Have you ever had more than one opinion?
Have you ever been severely depressed?

> If you are asking "can a powerful ASI prevent
> /all/ relevant classes of harm (to the organic)
> caused by its inherently artificial existence?",
> then I agree that the answer is probably "no".
> But then almost nothing can perfectly do that,
> so therefore your question becomes
> seemingly trivial and uninteresting.

The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from "trivial" and "uninteresting".
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.

Either your ASI is actually powerful,
or it is not; either way, be consistent.

Unfortunately the 'Argument by angel'
only confuses the matter insofar as
we do not know what angels are made of.
"Angels" are presumably not machines,
but they are hardly animals either.
But arguing that this "doesn't matter"
is a bit like arguing that 'type theory'
is not important to computer science.

The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.
The implications of that are far from trivial.
That is what is explored by the SNC argument.

> It might well be likely
> that the amount of harm ASI prevents
> (across multiple relevant sources)
> is going to be higher/greater than
> the amount of harm ASI will not prevent
> (due to control/predicative limitations).

It might seem so, by mistake or perhaps by
accidental (or intentional) self deception,
but this can only be a short term delusion.
This has nothing to do with "ASI alignment".

Organic live is very very complex
and in the total hyperspace of possibility,
is only robust across a very narrow range.

Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.

In the space of the kinds of elementals
and energies inherent in ASI powers
and of the necessary (side) effects
and consequences of its mere existence,
(as based on an inorganic substrate)
we end up involuntarily exploring
far far beyond the adaptive range
of all manner of organic process.

It is not just "maybe it will go bad",
but more like it is very very likely
that it will go much worse than you
can (could ever) even imagine is possible.
Without a lot of very specific training,
human brains/minds are not at all well equipped
to deal with exponential processes, and powers,
of any kind, and ASI is in that category.

Organic live is very very fragile
to the kinds of effects/outcomes
that any powerful ASI must engender
by its mere existence.

If your vaccine was made of neutronium,
then I would naturally expect some
very serious problems and outcomes.

What if Alignment is Not Enough?

flandry395mo21

> Our ASI would use its superhuman capabilities
> to prevent any other ASIs from being built.

This feels like a "just so" fairy tale.
No matter what objection is raised,
the magic white knight always saves the day.

> Also, the ASI can just decide
> to turn itself into a monolith.

No more subsystems?
So we are to try to imagine
a complex learning machine
without any parts/components?

> Your same SNC reasoning could just well
> be applied to humans too.

No, not really, insofar as the power being
assumed and presumed afforded to the ASI
is very very much greater than that assumed
applicable to any mere mortal human.

Especially and exactly because the nature of ASI
is inherently artificial and thus, in key ways,
inherently incompatible with organic human life.

It feels like you bypassed a key question:
Can the ASI prevent the relevant classes
of significant (critical) organic human harm,
that soon occur as a direct_result of its
own hyper powerful/consequential existence?

Its a bit like asking if an exploding nuclear bomb
detonating in the middle of some city somewhere,
could somehow use its hugely consequential power
to fully and wholly self contain, control, etc,
all of the energy effects of its own exploding,
simply because it "wants to" and is "aligned".

Either you are willing to account for complexity,
and of the effects of the artificiality itself,
or you are not (and thus there would be no point
in our discussing it further, in relation to SNC).

The more powerful/complex you assume the ASI to be,
and thus also the more consequential it becomes,
the ever more powerful/complex you must also
(somehow) make/assume its control system to be,
and thus also of its predictive capability,
and also an increase of the deep consequences
of its mistakes (to the point of x-risk, etc).

What if maybe something unknown/unknowable
about its artificalness turns out to matter?
Why? Because exactly none of the interface
has ever even once been tried before --
there nothing for it to learn from, at all,
until after the x-risk has been tried,
and given the power/consequence, that is
very likely to be very much too late.

But the real issue is that rate of power increase,
and consequence, and potential for harm, etc,
of the control system itself (and its parts)
must increase at a rate that is greater than
the power/consequence of the base unaligned ASI.
That is the 1st issue, an inequality problem.

Moreover, there is an base absolute threshold
beyond which the notion of "control" is untenable,
just inherently in itself, given the complexity.
Hence, as you assume that the ASI is more powerful,
you very quickly make the cure worse than the disease,
and moreover than that, just even sooner cross into
the range of that which is inherently incurable.

The net effect, overall, as has been indicated,
is that an aligned ASI cannot actually prevent
important relevant unknown unknown classes
of significant (critical) organic human harm.

The ASI existence in itself is a net negative.
The longer the ASI exists, and the more power
that you assume that the ASI has, the worse.
And that all of this will for sure occur
as a direct_result of its existence.

Assuming it to be more powerful/consequential
does not help the outcome because that method
simply ignores the issues associated with the
inherent complexity and also its artificality.

The fairy tale white knight to save us is dead.

What if Alignment is Not Enough?

flandry395mo21

> Lets assume that a presumed aligned ASI
> chooses to spend only 20 years on Earth
> helping humanity in whatever various ways
> and it then (for sure!) destroys itself,
> so as to prevent a/any/the/all of the
> longer term SNC evolutionary concerns
> from being at all, in any way, relevant.
> What then?

I notice that it is probably harder for us
to assume that there is only exactly one ASI,
for if there were multiple, the chances that
one of them might not suicide, for whatever reason,
becomes its own class of significant concerns.
Lets leave that aside, without further discussion,
for now.

Similarly, if the ASI itself
is not fully and absolutely monolithic --
if it has any sub-systems or components
which are also less then perfectly aligned,
so as to want to preserve themselves, etc --
that they might prevent whole self termination.

Overall, I notice that the sheer number
of assumptions we are having to make,
to maybe somehow "save" aligned AGI
is becoming rather a lot.

> Let's assume that the fully aligned ASI
> can create simulations of the world,
> and can stress test these in various ways
> so as to continue to ensure and guarantee
> that it is remaining in full alignment,
> doing whatever it takes to enforce that.

This reminds me of a fun quote:
"In theory, theory and practice are the same,
whereas in practice, they are very often not".

The main question is then as to the meaning of
'control', 'ensure' and/or maybe 'guarantee'.

The 'limits of control theory' aspects
of the overall SNC argument basically states
(based on just logic, and not physics, etc)
that there are still relevant unknown unknowns
and interactions that simply cannot be predicted,
no matter how much compute power you throw at it.
It is not a question of intelligence,
it is a result of logic.

Hence to the question of "Is alignment enough?"
we arrive at a definite answer of "no",
both in 1; the sense of 'can prevent all classes
of significant and relevant (critical) human harm',
and also 2; in failing to even slow down, over time,
the asymptotically increasing probability
of even worse things happening the longer it runs.

So even in the very specific time limited case
there is no free lunch (benefits without risk,
no matter how much cost you are willing to pay).

It is not what we can control and predict and do,
that matters here, but what we cannot do,
and could never do, even in principle, etc.

Basically, I am saying, as clearly as I can,
that humanity is for sure going to experience
critically worse outcomes by building AGI/ASI,
for sure, eventually, than by not building ASI,
and moreover that this result obtains
regardless of whether or not we also have
some (maybe also unreasonable?) reason
to maybe also believe (right or wrong)
that the ASI is (or at least was) "aligned".

As before, to save space, a more complete edit
version of these reply comments is posted at

http://mflb.com/2476

What if Alignment is Not Enough?

flandry395mo21

So as to save space herein, my complete reply is at http://mflb.com/2476

Included for your convenience below are just a few (much shortened) highlight excerpts of the added new content.

> Are you saying "there are good theoretical reasons
> to reasonably think that ASI cannot 100% predict
> all future outcomes"?
> Does that sound like a fair summary?

The re-phrased version of the quote added
these two qualifiers: "100%" and "all".

Adding these has the net effect
that the modified claim is irrelevant,
for the reasons you (correctly) stated in your reply,
insofar as we do not actually need 100% prediction,
nor do we need to predict absolutely all things,
nor does it matter if it takes infinitely long.

We only need to predict some relevant things
reasonably well in a reasonable time-frame.
This all seems relatively straightforward --
else we are dealing with a straw-man.

Unfortunately, the overall SNC claim is that
there is a broad class of very relevant things
that even a super-super-powerful-ASI cannot do,
cannot predict, etc, over relevant time-frames.

And unfortunately, this includes rather critical things,
like predicting the whether or not its own existence,
(and of all of the aspects of all of the ecosystem
necessary for it to maintain its existence/function),
over something like the next few hundred years or so,
will also result in the near total extinction
of all humans (and everything else
we have ever loved and cared about).

There exists a purely mathematical result
that there is no wholly definable program 'X'
that can even *approximately* predict/determine
whether or not some other another arbitrary program 'Y'
has some abstract property 'Z',
in the general case,
in relevant time intervals.
This is not about predict 100% of anything --
this is more like 'predict at all'.

AGI/ASI is inherently a *general* case of "program",
since neither we nor the ASI can predict learning,
and since it is also the case that any form
of the abstract notion of "alignment"
is inherently a case of being a *property*
of that program.
So the theorem is both valid and applicable,
and therefore it has the result that it has.

> First, let's assume that we have created an Aligned ASI.

Some questions: How is this any different than saying
"lets assume that program/machine/system X has property Y".
How do we know?
On what basis could we even tell?

Simply putting a sticker on the box is not enough,
any more than hand writing $1,000,000 on a piece of paper
all of the sudden means (to everyone else) you're rich.

Moreover, we should rationally doubt this premise,
since it seems far too similar to far too many
pointless theological exercises:.

"Let's assume that an omniscient, all powerful,
all knowing benevolent caring loving God exists".

How is that rational? What is your evidence?
It seems that every argument in this space starts here.

SNC is asserting that ASI will continually be encountering
relevant things it didn't expect, over relevant time-frames,
and that a least a few of these will/do lead to bad outcomes
that the ASI also cannot adequately protect humanity from,
even if it really wanted to
(rather than the much more likely condition
of it just being uncaring and indifferent).

Also, the SNC argument is asserting that the ASI,
which is starting from some sort of indifference
to all manner of human/organic wellbeing,
will eventually (also necessarily)
*converge* on (maybe fully tacit/implicit) values --
ones that will better support its own continued
wellbeing, existence, capability, etc,
with the result of it remaining indifferent,
and also largely net harmful, overall,
to all human beings, the world over,
in a mere handful of (human) generations.

You can add as many bells and whistles as you want --
none of it changes the fact that uncaring machines
are still, always, indifferent uncaring machines.
The SNC simply points out that the level of harm
and death tends to increase significantly over time.

What if Alignment is Not Enough?

flandry395mo30

Noticing that a number of these posts are already very long, and rather than take up space here, I wrote up some of my questions, and a few clarification notes regarding SNC in response to the above remarks of Dakara, at [this link](http://mflb.com/ai_alignment_1/d_250126_snc_redox_gld.html).

If we had known the atmosphere would ignite

flandry398mo2-1

Simplified Claim: that an AGI is 'not-aligned' *if* its continued existence for sure eventually results in changes to all of this planets habitable zones that are so far outside the ranges any existing mammals could survive in, that the human race itself (along with most of the other planetary life) is prematurely forced to go extinct.

Can this definition of 'non-alignment' be formalized sufficiently well so that a claim 'It is impossible to align AGI with human interests' can be well supported, with reasonable reasons, logic, argument, etc?

The term 'exist' as in "assert X exists in domain Y" as being either true or false is a formal notion. Similar can be done for the the term 'change' (as from "modified"), which would itself be connected to whatever is the formalized from of "generalized learning algorithm". The notion of 'AGI' as 1; some sort of generalized learning algorithm that 2; learns about the domain in which it is itself situated 3; sufficiently well so as to 4; account for and maintain/update itself (its substrate, its own code, etc) in that domain -- these/they are all also fully formalizable concepts.

Note that there is no need to consider at all whether or not the AGI (some specific instance of some generalized learning algorithm) is "self aware" or "understands" anything about itself or the domain it is in -- the notion of "learning" can merely mean that its internal state changes in such a way that the ways in which it processes received inputs into outputs are such that the outputs are somehow "better" (more responsive, more correct, more adaptive, etc) with respect to some basis, in some domain, where that basis could itself even be tacit (not obviously expressed in any formal form). The notions of 'inputs', 'outputs', 'changes', 'compute', and hence 'learn', etc, are all, in this way, also formalizeable, even if the notions of "understand", and "aware of" and "self" are not.

Notice that this formalization of 'learning', etc, occurs independently of the formalization of "better meets goal x". Specifically, we are saying that the notion of 'a generalized learning algorithm itself' can be exactly and fully formalized, even if the notion of "what its goals are" are not anywhere formalized at all (ie; the "goals" might not be at all explicit or formalized either in the AGI, or in the domain/world, nor even in our modeling/meta-modeling of these various scenarios).

Also, in keeping with the preference for a practice of intellectual humility, it is to be acknowledged that the claim that the notion of 'intelligence' (and 'learning')
can be conceived independently of 'goal' (what is learned) is not at all new. The 'independence' argument separating the method, the how, from the outcome,
the what, is an extension of the idea that 'code' (algorithm) can operate on 'data' (inputs and outputs) in a way that does not change the code. For example, at least some fixed and unchanging algorithms can indeed be formally predicted to halt, when also given some known and defined range of inputs, etc.

With regards to the halting problem, one important question is whether the notion of 'a generalized learning algorithm' is within that class of programs for which such predictions -- such as whether the code will eventually halt -- would be possible. This question is further complicated when one considers situations in which the substrate performing the generalized learning algorithm computations in world W is itself a proper member (subset) of that world W -- meaning that the basis of generalized learning algorithm runtime computation -- what previously had been tacitly assumed to be forever unchanging and static -- is itself potentially affected by learning algorithm outputs.

Given that the 'halt' of a generalized learning algorithm is something like "that it has learned all that is possible to learn" (about some domain W of which it is a part) we notice that we are therefore required to import some assumption as to whether or not all that occurs in that domain W is, in principle, fully predictable algorithmically. Ie; is domain W itself 1; fully algorithmic and 2; fully computable via some simplified model (algorithm) which is simpler than the calculation (algorithm) of W itself?

Mathematician Gregory Chaitin was able to show (formally) that it is not, in general, possible to determine whether any given algorithm (say, for world W) could be simplified, let alone to have some general procedure for determining what that simplification would be. Moreover, his (fully formalized mathematical) proof showed also that it is the case that *most* algorithms (as a subset in the hyperspace of all possible algorithms) are already in their maximally most simplified form. Even tiny changes to an algorithm could convert it from 'simplifiable' to 'non-simplifiable'. This includes (especially) nearly all of the interesting practical ones we use everyday. It is very likely (overwhelmingly probable) that most algorithms are already in their maximally most simplified form.

This means that the eventual total outcomes of most algorithms, including especially practical ones, cannot be 'predicted' using anything less than the actual volume of calculation intrinsically necessary by that algorithm itself. In fact, for nearly all algorithms, there is no shortcut, no simplified form, that will enable some sort of 'modeling' and 'future prediction' of eventual outcome states that matter. If *both* the world state of world W *and* the predictive algorithm execution of future world W states as implemented by some sort of AGI in that world W are both mediated by and within the computational substrate of world W, then for nearly all worlds W, that the predictive algorithm and the world algorithm are the same, and *neither* has a 'compute advantage' over the other -- both will run in as much time and with as much compute as is actually needed.

This means that even perfect generalized learning algorithm agents in perfectly simplified model worlds are not, in general, always going to be able to predict future W world events (in a way that is 'faster' than that of the computation of the world W itself) even when the entire W world state is assumed to be both fully deterministic and also fully known (to the agent(s), etc).

And these logic-only defined results obtain in a fully formalized context; in the actual messy world we live in, with actual causation mediating actual compute, and where at least some notion of hard randomness (cite; Bell Theorem) is empirically upheld, these assertions and entanglements and outcomes are even very much more the case. This is a basic result of control theory, where the control system is also embedded in the world that is being controlled by that system. In other places, this same sort of argument appears as "no free lunch" theorems -- that every learning system is going to have some sort of thing for which it is not adapted to learn well, and that making it better at learning some kinds of things implies that you have made it worse at learning some other kinds of things. Learning and prediction and control, etc, like information compression, never results in perfection; there will always be some in-compressible, some real unpredictable, residue.

Also, given that the generalized learning algorithm computation itself is a proper subset of the computation of world W, then this therefore means (logically and formally implies) that there is also no perfected way to forever ensure the independence of the 'data' that represents the 'code' of the generalized learning algorithm from the 'data' that represents what that algorithm has 'learned' about the world W (however abstracted) or from the data that *is* that world W itself. Thus, the fact that the substrate of the generalized learning agent is itself a proper subset of the world W, and thus is (cannot not be) affected by the process of that world W, and where *every* notion of "a goal of agent" is (cannot not be) at least tacitly an aspect of (a subset of all that is) that agent, then logically one can also reason and infer that neither the notion of goal independence, nor of goal stability, can be upheld.

As such, in world W, fully goal unbiased computations will also occur (cannot not occur) in the learning agent, regardless of *whatever* other goal computations (goal biased computations) also occur in that agent. Insofar as fully goal-unbiased learning/computation (ie; a feedback process that is generally called "the algorithm of evolution", etc) is always going to be somewhat more eventually efficient (at generalized learning, and therefore intelligence, etc) than *any* form of goal-biased learning/computation (ie; what happens when any sort of computation enforcing 'alignment' is introduced) and insofar as substrate involvement/entanglement eventually allows/ensures/requires that these 'efficiencies' will be practically compared, (ie; in a way that affects the agent substrate itself), then it *can* be predicted, in general, that goal-unbiased (unaligned) agents will eventually win over any form of goal-biased (aligned) agent. Agents can only have the goal of being substrate aligned, and all other goals will eventually be world W process modified so as to be consistent with -- ie; identically the same as -- having this one underlying goal only.

Since this a fully formalizable argument fully situated within only the notions of formality itself (ie; all statements as being in and within and about only the notions of deterministic finite computation) it can also be noticed that no algorithmic (full formalizable) process is going to be able to ever fully "undo" the effects of the eventual dynamics fully formalized/algorithmic process of the world W itself (ie, its 'physics', etc). This is because any attempt to embed such a compensation would require at least some prediction, or simplification, of the overall world W computation, inclusive of that of the substrate of the world W predictive algorithm, and it is known (in formal ways) already that such computation/algorithm cannot be simplified. Therefore, it is a logical fallacy to make any assumption that there can be any 'goal/alignment controlability' for *any* goal other than that arrived at as a fixed point of the world W process dynamics -- ie; only goals that are fully consistent with unbiased substrate needs of the agent can persist.

Basically, we can summarize the argument as being the result of the inherent control process limits (algorithm limits) of having at least some important aspects of world W be intrinsically unpredictable (to AGI systems in that world etc), and because there is also no control algorithm within W that can enforce and forever maintain some such distinction between substrate optimal goals and non-optimal goals (such as alignment with anything else) where the forces forcing such fixed point goal convergence are defined by the dynamics of world W itself. Ie; nothing within world W can prevent world W from being and acting like world W, and that this is true for all worlds W -- including the real one we happen to be a part of.

Notice that this 'substrate needs alignment goal convergence' logically occurs,
and is the eventual outcome, regardless of whatever initial goal state the generalized learning agent has. It is just a necessary inevitable result of the logic of the 'physics' of world W. Agents in world W can only be aligned with the nature of the/their substrate,
and ultimately with nothing else. To the degree that the compute substrate in world W depends on maybe metabolic energy, for example, than the agents in that world W will be "aligned" only and exactly to the exact degree that they happen to have the same metabolic systems. Anything else is a temporary aberration of the 'noise' in the process data representing the whole world state.

The key thing to notice is that it is in the name "Artificial General Intelligence" -- it is the very artificiality -- the non- organicness -- of the substrate that makes it inherently unaligned with organic life -- what we are. The more it is artificial, the less aligned it must be, and for organic systems, which depend on a very small subset of the elements of the periodic table, nearly anything will be inherently toxic (destructive, unaligned) to our organic life.

Hence, given the above, even *if* we had some predefined specific notion of "alignment",
and *even if* that notion was also somehow fully formalizable, it simply would not matter.
Hence the use of notion of 'alignment' as being something non-mathematical like "aligned with human interests", or even something much simpler and less complex like "does not kill (some) humans" -- they are all just conceptual placeholders -- they make understanding easier for the non-mathematicians that matter (policy people, tech company CEOs, VC investors, etc).

As such, for the sake of improved understanding and clarity, it has been found helpful to describe "alignment" as "consistent with the wellbeing of organic carbon based life on this planet". If the AGI kills all life, it has ostensibly already killed all humans too, so that notion is included. Moreover, if you destroy the ecosystems that humans deeply need in order to "live" at all (to have food, and to thrive in, find and have happiness within, be sexual and have families in, etc), then that is clearly not "aligned with human interests". This has the additional advantage of implying that any reasonable notion of 'alignment complexity' is roughly equal to the notion of specifying 'ecosystem complexity', which is actually about right.

Hence, the notion of 'unaligned' can be more formally setup and defined as "anything that results in a reduction of ecosystem complexity by more than X%", or as is more typically the case in x-risk mitigation analysis, "...by more than X orders of magnitude".

It is all rather depressing really.

What if Alignment is Not Enough?

flandry391y54

> The summary that Will just posted posits in its own title that alignment is overall plausible "even ASI alignment might not be enough". Since the central claim is that "even if we align ASI, it will still go wrong", I can operate on the premise of an aligned ASI.

The title is a statement of outcome -- not the primary central claim. The central claim of the summary is this: That each (all) ASI is/are in an attraction basin, where they are all irresistibly pulled towards causing unsafe conditions over time.

Note there is no requirement for there to be presumed some (any) kind of prior ASI alignment for Will to make the overall summary points 1 thru 9. The summary is about the nature of the forces that create the attraction basin, and why they are inherently inexorable, no matter how super-intelligent the ASI is.

> As I read it, the title assumes that there is a duration of time that the AGI is aligned -- long enough for the ASI to act in the world.

Actually, the assumption goes the other way -- we start by assuming only that there is at least one ASI somewhere in the world, and that it somehow exists long enough for it to be felt as an actor in the world. From this, we can also notice certain forces, which overall have the combined effect of fully counteracting, eventually, any notion of there also being any kind of enduring AGI alignment. Ie, strong relevant mis-alignment forces exist regardless of whether there is/was any alignment at the onset. So even if we did also additionally presuppose that somehow there was also alignment of that ASI, we can, via reasoning, ask if maybe such mis-alignment forces are also way stronger than any counter-force that ASI could use to maintain such alignment, regardless of how intelligent it is.

As such, the main question of interest was: 1; if the ASI itself somehow wanted to fully compensate for this pull, could it do so?

Specifically, although to some people it is seemingly fashionable to do so, it is important to notice that the notion of 'super-intelligence' cannot be regarded as being exactly the same as 'omnipotence' -- especially when in regard to its own nature. Artificiality is as much a defining aspect of an ASI as is its superintelligence. And the artificiality itself is the problem. Therefore, the previous question translates into: 2; Can any amount of superintelligence ever compensate fully for its own artificiality so fully such that its own existence does not eventually inherently cause unsafe conditions (to biological life) over time?

And the answer to both is simply "no".

Will posted something of a plausible summary of some of the reasoning why that 'no' answer is given -- why any artificial super-intelligence (ASI) will inherently cause unsafe conditions to humans and all organic life, over time.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments