LESSWRONG
LW

1490
quetzal_rainbow
2690Ω3137210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
10Linkpost: Predicting Empirical AI Research Outcomes with Language Models
6mo
1
21Definition of alignment science I like
10mo
0
15How do you shut down an escaped model?
Q
1y
Q
8
15Training of superintelligence is secretly adversarial
2y
2
8There is no sharp boundary between deontology and consequentialism
2y
2
17Where Does Adversarial Pressure Come From?
2y
1
7Predictable Defect-Cooperate?
2y
1
61They are made of repeating patterns
2y
4
10How to model uncertainty about preferences?
Q
3y
Q
2
3What literature on the neuroscience of decision making can you recommend?
Q
3y
Q
0
Load More
1quetzal_rainbow's Shortform
3y
163
Your Clone Wants to Kill You Because You Assumed Too Much
quetzal_rainbow3d20

Another thing is narrow self-concept.

In original thread, people often write about things they have and their clone would want, like family. They fail to think about things they don't have due to having families, like cocaine orgies, or volunteering to war for just cause, or monastic life in search of enlightenment, so they could flip a coin and go pursue alternative life in 50% of cases. I suspect it's because thinking about desirable things you won't have on the best available course of your life is very sour-grapes-flavored.

Reply
Don't use the phrase "human values"
quetzal_rainbow4d20

When I say "human values" without reference I mean "type of things that human-like mind can want and their extrapolations". Like, blind from birth person can want their vision restored, even if they have sufficiently accommodating environment and other ways to orient, like echolocation. Able-bodied human can notice this and extrapolate this into possible new modalities of perception. You can be not vengeful person, but concept of revenge makes sense to almost any human, unlike concept of paperclip-maximization.

Reply
Vladimir_Nesov's Shortform
quetzal_rainbow5d55

It's nice ideal to strive for, but sometimes you need to make judgement call based on things you can't explain.

Reply
Human Values ≠ Goodness
quetzal_rainbow6d50

Okay, but yumminess is not values. If we pick ML analogy, yumminess is reward signal or some other training hyperparameter.

My personal operationalization of values is "the thing that helps you to navigate trade-offs". You can have yummi feelings about saving life of your son or about saving life of ten strangers, but we can't say what you value until you consider situation where you need to choose between two. And, conversely, if you have good feelings about parties and reading books, your values direct what you choose.

Choice in case of real, value-laden trade-offs is usually defined by significant amount of reflection about values and memetic ambience supplies known summaries of such reflection in the past.

Reply1
Eric Neyman's Shortform
quetzal_rainbow15d20

This reason only makes sense if you expect first person to develop AGI to create singleton which takes over the world and locks in pre-installed values, which, again, I find not very compatible with low p(doom). What prevents scenario "AGI developers look around for a year after creation of AGI and decide that they can do better" if not misaligned takeover and not suboptimal value lock-in?

Reply
Eric Neyman's Shortform
quetzal_rainbow15d20

The reason to work on preventing AI takeover now, as opposed to working on already invented AGI in the future, is the first try problem: if you have unaligned takeover-capable AGI, takover just happens and you don't get to iterate. The same happens with problem of extremely good future only if you believe that the main surviving scenario is "aligned-with-developer-intention singleton takes over the world very quickly, locking in pre-installed values". People who believe in such scenario usually have very high p(doom), so I assume you are not one of them.

What exactly prevents your strategy here from being "wait for aligned AGI, ask it how to make future extremely good and save some opportunity cost"?

Reply
Decision theory when you can't make decisions
quetzal_rainbow17d40

Sure, set of available options is defined in problem setup. It's "one-box" and "two-box".

Reply
Decision theory when you can't make decisions
quetzal_rainbow17d20

I feel like it's confusion about type signature of decision theory? Decision theory talks about mappings from observations and probabilistic models to actions. In case of humans, actions are motor outputs. Decision theory asks "what sort of motor output is the best?" and answers "which leads you to leave with one box". You are allowed to be really indecisive in process and cry "it feels wrong to leave the second box!", Omega in this scenario doesn't care.

Reply
Supervillain Monologues Are Unrealistic
quetzal_rainbow18d120

I think the difference between reality and fiction is that fiction contains heroes - high-agency people in very personal relationships with villains.

Unrealistic part of being hero to designated villain is that in real life enemies are screened off from each other by many layers of social reality. Roosevelt was kinda hero to Hitler's villainy, but Roosevelt mostly didn't spend his nights thinking about Hitler could think, because 1) FDR was busy managing his country, 2) anything that Hitler could do was overconstrained by abilities of Germany, so it was more productive to think about what Germany as a whole could do. As a result, there is no interpersonal drama in which supervillain monologue fits.

Reply
Mo Putera's Shortform
quetzal_rainbow19d170

Stanislaw Lem once wrote (I translated it from Russian, because I can't find another source in the open):

...sometimes it seems incredible to me that there are so many billions of people, many thousands of whom are gifted with imagination and expertise in the field of hypothesis generation, but no one takes on what I do. How is it that in some remote corner of Slavic land, between the North Pole and the Balkans, one madman suffers in solitude? Even from my favored statistical point of view, it follows that there must be at least a few such Quasi-Lems, Anti-Lems, Para-Lems, or Proto-Lems in the world, but here there are none.

Reply1
Load More