LESSWRONG
LW

All of Kerrigan's Comments + Replies

Humans aren't agents - what then for value learning?

Why can't the true values live at the level of anatomy and chemistry?

2Charlie Steiner11h

If God has ordained some "true values," and we're just trying to find out what pattern has received that blessing, then yes this is totally possible, God can ordain values that have their most natural description at any level He wants. On the other hand, if we're trying to find good generalizations of the way we use the notion of "our values" in everyday life, then no, we should be really confident that generalizations that have simple descriptions in terms of chemistry are not going to be good.

The Anthropic Trilemma

Kerrigan13d10

Would this be solved if cresting a copy is creating someone functionally the same as you but who is someone else's identity, and not you?

Stupid Questions - April 2023

Kerrigan2mo10

Is there a page similar to this, but for alignment solutions?

2ChristianKl2mo

Not as far as I know, feel free to create one.

The Assassination of Trump's Ear is Evidence for Time-Travel

Kerrigan5mo10

What about from a quantum immortality perspective?

Understanding and avoiding value drift

Kerrigan6mo1-2

Could there not be AI value drift in our favor, from a paperclipper AI to a moral realist AI?

2Martin Randall2mo

Possible but unlikely to occur by accident. Value-space is large. For any arbitrary biological species, most value systems don't optimize in favor of that species.

The alignment stability problem

Kerrigan6mo10

Both quotes are from your above post. Apologies for confusion.

The alignment stability problem

Kerrigan6mo30

“A sufficiently intelligent agent will try to prevent its goals^[1] from changing, at least if it is consequentialist.”

It seems that in humans, smarter people are more able and likely to change their goals. A smart person may change his/her views about how the universe can best be arranged upon reading Nick Bostrom’s book Deep Utopia, for example.

‘I think humans are stable, multi-objective systems, at least in the short term. Our goals and beliefs change, but we preserve our important values over most of those changes. Even when gaining or losing... (read more)

2Seth Herd6mo

I think my terminology isn't totally clear. By "goals" in that statement, I mean what we mean by "'values" in humans. The two are used in overlapping and mostly interchangable ways in my writing 1. Humans aren't sufficiently intelligent to be all that internally consistent 2. In many cases of humans changing goals, I'd say they're actually changing subgoals, while their central goal (be happy/satisfied/joyous) remains the same. This may be described as changing goals while keeping the same values. 3. Note 'in the short term' (I think you're quoting Bostrom? The context isn't quite clear). In the long term, with increasing intelligence and self-awareness, I'd expect some of people's goals to change as they become more self-aware and work toward more internal coherence (e.g., many people change their goal of eating delicious food when they realize it conflicts with their more important goal of being happy and living a a long life). Yes, humans may change exactly that way. A friend said he'd gotten divorced after getting a CPAP to solve his sleep apnea: "When we got married, we were both sad and angry people. Now I'm not." But that's because we're pretty random and biology determined.

Understanding and avoiding value drift

Kerrigan6mo120

How do humans, for example, read a philosophy book and update their views about what they value about the world?

1Kerrigan6mo

Could there not be AI value drift in our favor, from a paperclipper AI to a moral realist AI?

Decision theory does not imply that we get to have nice things

Kerrigan6moΩ110

“Similarly, it's possible for LDT agents to acquiesce to your threats if you're stupid enough to carry them out even though they won't work. In particular, the AI will do this if nothing else the AI could ever plausibly meet would thereby be incentivized to lobotomize themselves and cover the traces in order to exploit the AI.

But in real life, other trading partners would lobotomize themselves and hide the traces if it lets them take a bunch of the AI's lunch money. And so in real life, the LDT agent does not give you any lunch money, for all that you claim to be insensitive to the fact that your threats don't work.”

Can someone please why trading partners would lobotomize themselves?

2quetzal_rainbow6mo

Let's suppose that you give in to threats if your opponent is not capable to predict that you do not give in to threats, so they carry the threat anyway. Therefore, other opponents are incentivised to pretend very hard to be such opponent, up to "literally turn themselves into sort of opponent that carries on useless threats".

Stupid Questions - April 2023

Kerrigan1y10

How does inner misalignment lead to paperclips? I understand the comparison of paperclips to ice cream, and that after some threshold of intelligence is reached, then new possibilities can be created that satisfy desires better than anything in the training distribution, but humans want to eat ice cream, not spread the galaxies with it. So why would the AI spread the galaxies with paperclips, instead of create them and
”consume“ them? Please correct any misunderstandings of mine,

2ChristianKl1y

Paperclips are a metaphor for some things but don't really help here. The AIs that are productive need a lot of compute to do so. Spreading to other solar systems means accessing more compute.

Stupid Questions - April 2023

Kerrigan1y10

And a subset might value drift towards optimizing the internal experiences of all conscious minds?

2ChristianKl1y

That's a much more complex goal than wireheading for a digital mind that can self-modify. In any case, those agents that care a lot about getting more power over the world are more likely to get power than agents that don't.

Stupid Questions - April 2023

Kerrigan1y*10

If an AGI achieves consciousness, why would its values not drift towards optimizing its own internal experience, and away from tiling the lightcone with something?

2ChristianKl1y

If some AGI's only care about their internal experience and not affecting the outside world, they are basically wireheading. If a subset of AGI wireheads and some AGIs don't wirehead the AGIs that don't wirehead will have all the power over the world. Wireheaded AGIs are also economically useless so people try to develop AGIs that don't do that.

AGI Safety FAQ / all-dumb-questions-allowed thread

Kerrigan1y40

How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?

AGI Safety FAQ / all-dumb-questions-allowed thread

Kerrigan1y10

How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?

An Orthodox Case Against Utility Functions

Kerrigan1y70

How can utility be a function of worlds, if the agent doesn’t have access to the state of the world, but only the sense data?

2abramdemski1y

The post is making the distinction between seeing preferences as a utility function of worlds (this is the regular old idea of utility functions as random variables) vs seeing preferences as an expectation function on events (the jeffrey-bolker view). Both perspectives hold that an agent can optimize things it does not have direct access to. Agency is optimization at a distance. Optimization that isn't at a distance is selection as opposed to control.

Are wireheads happy?

Kerrigan1y10

“The wanting system is activated by dopamine, and the liking system is activated by opioids. There are enough connections between them that there's a big correlation in their activity” But are they orthogonal in principle?

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

Kerrigan2y60

Possible addition: "If you, the ASI, have not solved the hard problem of consciousness, you should place weight on Open Individualism being true. 'Open individualism is the view in the philosophy of self, according to which there exists only one numerically identical subject, who is everyone at all times, in the past, present and future.' - https://en.wikipedia.org/wiki/Open_individualism. This can be though of that all conscious minds are the same identity, at different points in time. Consider then, that exterminating humanity wo... (read more)

Open Thread - August 2023

Kerrigan2y50

How many LessWrong users/readers are there total?

2Yoav Ravid2y

If you go to the user tab on the search page with no search term you can see there's currently 113,654 users (of course, how many of those are active or are 'readers' is a completely different question).