[Video] Intelligence and Stupidity: The Orthogonality Thesis

plex

Transcript for searchability:

hi this video is kind of a response to
various comments that I've got over the
years ever since that video on computer
file where I was describing the sort of
problems that we might have when we have
a powerful artificial general
intelligence with goals which aren't the
same as our goals even if those goals
seem pretty benign we use this thought
experiment of an extremely powerful AGI
working to optimize the simple goal of
collecting stamps and some of the
problems that that might cause I got
some comments from people saying that
they think the stamp collecting device
is stupid and not that it's a stupid
thought experiment but the device itself
is actually stupid they said unless it
has complex goals or the ability to
choose its own goals then it didn't
count as being highly intelligent in
other videos I got comments saying it
takes intelligence to do moral reasoning
so an intelligent AGI system should be
able to do that and a super intelligence
should be able to do it better than
humans in fact if a super intelligence
decides that the right thing to do is to
kill us all then I guess that's the
right thing to do these comments are all
kind of suffering from the same mistake
which is what this video is about but
before I get to that I need to lay some
groundwork first if you like Occam's
razor then you'll love Humes guillotine
also called the is odd problem this is a
pretty simple concept that I'd like to
be better known the idea is statements
can be divided up into two types is
statements and Hort statements these
statements or positive statements are
statements about how the world is how
the world was in the past how the world
will be in the future or how the world
would be in hypothetical situations this
is facts about the nature of reality the
causal relationships between things that
kind of thing then you have the ought
statements the should statements the
normative statements these are about the
way the world should be the way we want
the world to be statements about our
goals our values ethics morals what we
want all of that stuff now you can
derive logical statements from one
another like it's snowing outside
that's a nice statement it's cold when
it snows another s statement and then
you can deduce therefore it's cold
outside
that's another is statement it's our
conclusion this is all pretty obvious
but you might say something like it's
snowing outside therefore you ought to
put on a coat and that's a very normal
sort of sentence that people might say
but as a logical statement it actually
relies on some hidden assumption
without assuming some kind of ought
statement you can't derive another ought
statement this is the core of the Azure
problem you can never derive an ought
statement using only is statements you
ought to put on a coat why because it's
snowing outside so what is the fact that
it's snowing mean I should put on the
coat well the fact that it's snowing
means that it's cold and why should it
being cold mean I should put on a coat
if it's cold and you go outside without
a coat you'll be cold should I not be
cold well if you get too cold you'll
freeze to death okay you're saying I
shouldn't freeze to death
that was kind of silly but you see what
I'm saying you can keep laying out is
statements for as long as you want you
will never be able to derive that you
ought to put on a coat at some point in
order to derive that ought statement you
need to assume at least one other ought
statement if you have some kind of ought
statement like I ought to continue to be
alive you can then say given that I
ought to keep living and then if I go
outside without a coat I'll die then I
ought to put on a coat but unless you
have at least one ought statement you
cannot derive any other ought statements
statements
and Hort statements are separated by
Hume skia T okay so people are saying
that a device that single-mindedly
collects stamps at the cost of
everything else is stupid and doesn't
count as a powerful intelligence so
let's define our terms what is
intelligence and conversely what is
stupidity I feel like I made fairly
clear in those videos what I meant by
intelligence we're talking about a GI
systems as intelligent agents they're
entities that take actions in the world
in order to achieve their goals or
maximize their utility functions
intelligence is the thing that allows
them to choose good actions to choose
actions that will get them what they
want an agent's level of intelligence
really means its level of effectiveness
of pursuing its goals in practice this
is likely to involve having or building
an accurate model of reality keeping
that model up-to-date by reasoning about
observations and using the model to make
predictions about the future and the
likely consequences of different
possible actions to figure out which
actions will result in which outcomes
intelligence involves answering
questions like what is the world like
how does it work what will happen next
what would happen in this scenario or
that scenario what would happen if I
took this action or that action more
intelligent systems are in some sense
better at answering these kinds of
questions which allows them to be better
at choosing actions but one thing you
might notice about these questions is
they're all ears questions the system
has goals which can be thought of as
Hort statements but the level of
intelligence depends only on the ability
to reason about is questions in order to
answer the single ort question what
action should I take next so given that
that's what we mean by intelligence what
does it mean to be stupid well firstly
you can be stupid in terms of those
questions for example by building a
model that doesn't correspond with
reality or by failing to update your
model properly with new evidence if I
look out of my window
and I see there's snow everywhere you
know I see a snowman and I think to
myself oh what a beautiful warm sunny
day then that's stupid right my belief
is wrong and I had all the clues to
realize it's cold outside so beliefs can
be stupid by not corresponding to
reality
what about actions like if I go outside
in the snow without my coat that's
stupid right well it might be if I think
it's sunny and warm and I go outside to
sunbathe then yeah that's stupid but if
I just came out of a sauna or something
and I'm too hot and I want to cool
myself down then going outside without a
coat might be quite sensible you can't
know if an action is stupid just by
looking at its consequences you have to
also know the goals of the agent taking
the action you can't just use is
statements you need a naught so actions
are only stupid relative to a particular
goal it doesn't feel that way though
people often talk about actions being
stupid without specifying what goals
they're stupid relative to but in those
cases the goals are implied we're humans
and when we say that an action is stupid
in normal human communication we're
making some assumptions about normal
human goals and because we're always
talking about people and people tend to
want similar things it's sort of a
shorthand that we can skip what goals
were talking about so what about the
goals then can goals be stupid
well this depends on the difference
between instrumental goals and terminal
goals
this is something I've covered elsewhere
but your terminal goals are the things
that you want just because you want them
you don't have a particular reason to
want them they're just what you want the
instrumental goals are the goals you
want because they'll get you closer to
your terminal goals like if I have a
terminal goal to visit a town that's far
away maybe an instrumental goal would be
to find a train station I don't want to
find a train station just because trains
are cool I want to find a train as a
means to an end it's going to take me to
this town
so that makes it an instrumental goal
now an instrumental goal can be stupid
if I want to go to this distant town so
I decide I want to find a pogo stick
that's pretty stupid
finding a pogo stick is a stupid
instrumental goal if my terminal goal is
to get to a faraway place but if we're
terminal go with something else like
having fun it might not be stupid so in
that way it's like actions instrumental
goals can only be stupid relative to
terminal goals so you see how this works
beliefs and predictions can be stupid
relative to evidence or relative to
reality actions can be stupid relative
to goals of any kind
instrumental goals can be stupid
relative to terminal goals but here's
the big point terminal goals can't be
stupid there's nothing to judge them
against if a terminal goal seems stupid
like let's say collecting stamps seems
like a stupid terminal goal that's
because it would be stupid as an
instrumental goal to human terminal
goals but the stamp collector does not
have human terminal goals
similarly the things that humans care
about would seem stupid to the stamp
collector because they result in so few
stamps so let's get back to those
comments one type of comments says this
behavior of just single mindedly going
after one thing and ignoring everything
else and ignoring the totally obvious
fact that stamps aren't that important
is really stupid behavior you're calling
this thing of super intelligence but it
doesn't seem super intelligent to me it
just seems kind of like an idiot
hopefully the answer to this is now
clear the stamp collectors actions are
stupid relative to human goals but it
doesn't have human goals its
intelligence comes not from its goals
but from its ability to understand and
reason about the world allowing it to
choose actions that achieve its goals
and this is true whatever those goals
actually are some people commented along
the lines of well okay yeah sure you've
defined intelligence to only include
this type of is statement kind of
reasoning but I don't like that
definition I think to be truly
intelligent you need to have complex
goals something with simple goals
doesn't count as intelligent to that I
say well you can use words however you
want I guess I'm using intelligence here
as a technical term in the way that it's
often used in the field you're free to
have your own definition of the word but
the fact that something fails to meet
your definition of intelligence does not
mean that it will fail to behave in a
way that most people would call
intelligent
if the stamp collector outwits you gets
around everything you've put in its way
and outmaneuvers you mentally it comes
up with new strategies that you would
never have thought of to stop you from
turning it off and stopping from
preventing it from making stamps and as
a consequence it turns the entire world
into stamps in various ways you could
never think of it's totally okay for you
to say that it doesn't count as
intelligent if you want but you're still
dead I prefer my definition because it
better captures the ability to get
things done in the world which is the
reason that we actually care about AGI
in the first place
similarly people who say that in order
to be intelligent you need to be able to
choose your own goals
I would agree you need to be able to
choose your own instrumental goals but
not your own terminal goals changing
your terminal goals is like willingly
taking a pill that will make you want to
murder your children it's something you
pretty much never want to do apart from
some bizarre edge cases if you
rationally want to take an action that
changes one of your goals then that
wasn't a terminal goal now moving on to
these comments saying an AGI will be
able to reason about morality and if
it's really smarter than us it will
actually do moral reasoning better than
us
so there's nothing to worry about it's
true that a superior intelligence might
be better at moral reasoning than us but
ultimately moral behavior depends not on
moral reasoning but on having the right
terminal goals there's a difference
between figuring out and understanding
human morality and actually wanting to
act according to it the stamp collecting
device has a perfect understanding of
human goals ethics and values and it
uses that only to manipulate people for
stamps it's super human moral reasoning
doesn't make its actions good if we
create a super intelligence and it
decides to kill us that doesn't tell us
anything about morality it just means we
screwed up
so what mistake do all of these comments
have in common the orthogonality thesis
in AI safety is that more or less any
goal is compatible with more or less any
level of intelligence ie those
properties are orthogonal you can place
them on these two axes and it's possible
to have agents anywhere in this space
anywhere on either scale you can have
very weak low intelligence agents that
have complex human compatible goals you
can have powerful highly intelligent
systems with complex sophisticated goals
you can have weak simple agents with
silly goals and yes
can have powerful highly intelligent
systems with simple weird inhuman goals
any of these are possible because level
of intelligence is about effectiveness
at answering is questions and goals are
all about what questions and the two
sides are separated by Humes guillotine
hopefully looking at what we've talked
about so far it should be pretty obvious
that this is the case like what would it
even mean for it to be false but for it
to be impossible to create powerful
intelligences with certain goals the
stamp collector is intelligent because
it's effective at considering the
consequences of sending different
combinations of packets on the internet
and calculating how many stamps that
results in exactly how good do you have
to be at that before you don't care
about stamps anymore and you randomly
start to care about some other thing
that was never part of your terminal
goals like feeding the hungry or
whatever it's just not gonna happen so
that's the orthogonality thesis it's
possible to create a powerful
intelligence that will pursue any goal
you can specify knowing an agent's
terminal goals doesn't really tell you
anything about its level of intelligence
and knowing an agent's level of
intelligence doesn't tell you anything
about its goals
[Music]
I want to end the video by saying thank
you to my excellent patrons so it's all
of these people here thank you so much
for your support
lets me do stuff like building this
light boy thank you for sticking with me
through that weird patreon fees thing
and my moving to a different city which
has really got in the way of making
videos recently but I'm back on it now
new video every two weeks is the part
anyway in this video I'm especially
Franklin Katie Beirne who's supported
the channel for a long time she actually
has her own YouTube channel about 3d
modeling and stuff so a link to that and
while I'm at it when I think Chad Jones
ages ago I didn't mention his YouTube
channel so link to both of those in the
description thanks again and I'll see
you next time I don't speak cat what
does that mean

LESSWRONG
LW

[Video] Intelligence and Stupidity: The Orthogonality Thesis

5

New to LessWrong?