Introduction: Bias in Evaluating AGI X-Risks

Remmelt; flandry19

Bias in Evaluating AGI X-Risks

1 Introduction: Bias in Evaluating AGI X-Risks

by Remmelt, flandry19

27th Dec 2022

4 min read

0

1

The rationality community has a tradition of checking for biases, particularly when it comes to evaluating the non-intuitive risks of general AI.

We thought you might like this list, adapted from a 2015 essay by Forrest Landry^[1]. Many names of biases listed may already be familiar for you. If you "boggle" more at the text, you will find curious new connections to evaluating upcoming risks of AI developments.

About Forrest Landry

Forrest is a polymath working on civilisation design and mitigating risks of auto-scaling/catalysing technology (eg. Dark Fire scenario). About 15 years ago, he started researching how to build in deep existential alignment into the internals of AGI, applying his deep understanding of programming, embodied ethics, and meta-physics. Then, Forrest discovered the substrate-needs convergence argument (as distinct from yet much enabled by instrumental convergence). Unfortunately, because of substrate-needs convergence, any approach to aligning any AI at the embedded level turned out unsound in practice (and moreover, inconsistent with physical theory). To inquire further, see this project.

Introduction

Note on unusual formatting: Sentences are split into lines so you can parse parts precisely.

Ideally,
in any individual or group decision making,
there would be some means, processes,
and procedures in place to ensure that
the kinds of distortions and inaccuracies
introduced by individual and collective
psychological and social bias
do not lead to incorrect results,
and thus poor (risk-prone) choices,
with potentially catastrophic outcomes.

While many types of bias
are known to science
and have been observed
to be common to all people
and all social groups, the world over
in all working contexts, regardless
of their background, training, etc,
they are also largely unconscious,
being 'built-in' by long-term
evolutionary processes.

These unconscious cognitive biases,
while they are adaptive for the purposes
of our being able to survive in
non-technological environments,
are also not able to serve us equally well
when attempting to survive our
current technological contexts.

The changes in our
commonly experienced world
continue to occur far too fast
for our existing evolutionary
and cognitive adaptations
to adjust naturally.
We will therefore need to
add the necessary corrections to
our thinking and choice-making process,
our own evolution, 'manually'.
The hope is that
these 'adjustments' might
make it possible to mitigate
the distortions and inaccuracies
introduced by the human condition
to the maximum extent possible.

Bear in mind
that each type of bias
does not just affect individuals –
they also arise due to specific
interpersonal and trans-personal effects
seen only in larger groups.^[2]
These bias aspects affect
all of us, and in all sorts of ways,
many of which are complex.
It is important for everyone involved
in critical decisions and projects
to be aware of these general
and mutual concerns.

We all run on corrupted hardware.
Our minds are composed of many modules,
and the modules that evolved to make
us seem impressive and gather allies
are also evolved to subvert the ones
holding our conscious beliefs.

Even when we believe that
we are working on something
that may ultimately determine
the fate of humanity, our signaling
modules may hijack our goals so as
to optimize for persuading outsiders
that we are working on the goal,
instead of optimizing for
achieving the goal.

What is intended herein
is to make some of these
unconscious processes conscious,
to provide a basis, and
to identify the need,
for clear conversation
about these topics.
Hopefully, as a result
of these conversations,
and with the possibility
of a reasonable consensus reached,
we will be able to identify (or create)
a good general practice of decision making,
which when implemented both
individually and collectively
(though perhaps not easily),
can materially improve
our mutual situation.

The need for these practices
of accuracy, precision, and correctness
in decision making are especially
acute in proportion to the degree
that we all find ourselves faced
with a seemingly ever increasing
number of situations for
which our evolution has
not yet prepared us.
Where the true goal
is making rational, realistic,
and reasonably good choices
about matters that may
potentially involve many people,
larger groups and tribes, etc,
many specific and strong
cognitive and social biases
will need to be compensated for.

Particularly in regards
to category 1 and 2 extinction risks
nothing less than complete and full
compensation for all bias,
and the complete application
of correct reason
can be allowed for.

This sequence will not attempt
to outline or validate any of the
specific risk possibilities and outcomes
for which there is significant concern
(this is done elsewhere).
Nor can it attempt to outline or define
which or what means, processes, or procedures
should be used for effective individual
or group decision making.

As the 'general problem of governance',
the main issue remains one of the identification,
development, and testing/refining of
such means and methods by which
all bias can be compensated for,
and a basis for clear reason
thereby created.
Hopefully this will
lead to real techniques of
group decision making –
and high quality decisions –
that can be realistically defined,
outlined, and implemented.

A Partial List Of Affecting Bias...

The next posts cover a list
of some of the known types of bias
that have a significant and real potential to
harmfully affect the accuracy and correctness
of extinction risk assessments.

Each bias will be given its
common/accepted consensus name,
along with relevant links to Wikipedia
articles with more details.^[3]
Each bias will be briefly described
with particular regard to its potential impact
on risk assessment in an existential context.^[4]

^{^}
Some of the remarks and observations herein
have been derived from content posted
to the website LessWrong.com –
no claim of content originality by
this author is implied or intended.
Content has been duplicated
and edited/expanded here for
informational and research purposes only.
^{^}
Nothing herein is intended
to implicate or impugn any
specific individual, group, or institution.
The author has not specifically encountered
these sorts of issues in regards to
just one person or person or project.

Most people are actually well-intentioned.
Unfortunately,
'good intentions' is not equivalent to
(nor necessarily yielding of)
'good results', particularly where
the possibility of extinction risks
is concerned.
^{^}
All of the descriptive notations regarding
the specific characteristics of each bias
have been derived from Wikipedia.
^{^}
These descriptions, explanations, and discussions
are not intended to be comprehensive
or authoritative – they are merely
indicative for the purposes of stimulating
relevant/appropriate conversation.