LESSWRONG
LW

153
quetzal_rainbow
2690Ω3137200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1quetzal_rainbow's Shortform
3y
163
Don't use the phrase "human values"
quetzal_rainbow13h20

When I say "human values" without reference I mean "type of things that human-like mind can want and their extrapolations". Like, blind from birth person can want their vision restored, even if they have sufficiently accommodating environment and other ways to orient, like echolocation. Able-bodied human can notice this and extrapolate this into possible new modalities of perception. You can be not vengeful person, but concept of revenge makes sense to almost any human, unlike concept of paperclip-maximization.

Reply
Vladimir_Nesov's Shortform
quetzal_rainbow1d55

It's nice ideal to strive for, but sometimes you need to make judgement call based on things you can't explain.

Reply
Human Values ≠ Goodness
quetzal_rainbow2d50

Okay, but yumminess is not values. If we pick ML analogy, yumminess is reward signal or some other training hyperparameter.

My personal operationalization of values is "the thing that helps you to navigate trade-offs". You can have yummi feelings about saving life of your son or about saving life of ten strangers, but we can't say what you value until you consider situation where you need to choose between two. And, conversely, if you have good feelings about parties and reading books, your values direct what you choose.

Choice in case of real, value-laden trade-offs is usually defined by significant amount of reflection about values and memetic ambience supplies known summaries of such reflection in the past.

Reply1
Eric Neyman's Shortform
quetzal_rainbow11d20

This reason only makes sense if you expect first person to develop AGI to create singleton which takes over the world and locks in pre-installed values, which, again, I find not very compatible with low p(doom). What prevents scenario "AGI developers look around for a year after creation of AGI and decide that they can do better" if not misaligned takeover and not suboptimal value lock-in?

Reply
Eric Neyman's Shortform
quetzal_rainbow12d20

The reason to work on preventing AI takeover now, as opposed to working on already invented AGI in the future, is the first try problem: if you have unaligned takeover-capable AGI, takover just happens and you don't get to iterate. The same happens with problem of extremely good future only if you believe that the main surviving scenario is "aligned-with-developer-intention singleton takes over the world very quickly, locking in pre-installed values". People who believe in such scenario usually have very high p(doom), so I assume you are not one of them.

What exactly prevents your strategy here from being "wait for aligned AGI, ask it how to make future extremely good and save some opportunity cost"?

Reply
Decision theory when you can't make decisions
quetzal_rainbow14d40

Sure, set of available options is defined in problem setup. It's "one-box" and "two-box".

Reply
Decision theory when you can't make decisions
quetzal_rainbow14d20

I feel like it's confusion about type signature of decision theory? Decision theory talks about mappings from observations and probabilistic models to actions. In case of humans, actions are motor outputs. Decision theory asks "what sort of motor output is the best?" and answers "which leads you to leave with one box". You are allowed to be really indecisive in process and cry "it feels wrong to leave the second box!", Omega in this scenario doesn't care.

Reply
Supervillain Monologues Are Unrealistic
quetzal_rainbow15d120

I think the difference between reality and fiction is that fiction contains heroes - high-agency people in very personal relationships with villains.

Unrealistic part of being hero to designated villain is that in real life enemies are screened off from each other by many layers of social reality. Roosevelt was kinda hero to Hitler's villainy, but Roosevelt mostly didn't spend his nights thinking about Hitler could think, because 1) FDR was busy managing his country, 2) anything that Hitler could do was overconstrained by abilities of Germany, so it was more productive to think about what Germany as a whole could do. As a result, there is no interpersonal drama in which supervillain monologue fits.

Reply
Mo Putera's Shortform
quetzal_rainbow16d170

Stanislaw Lem once wrote (I translated it from Russian, because I can't find another source in the open):

...sometimes it seems incredible to me that there are so many billions of people, many thousands of whom are gifted with imagination and expertise in the field of hypothesis generation, but no one takes on what I do. How is it that in some remote corner of Slavic land, between the North Pole and the Balkans, one madman suffers in solitude? Even from my favored statistical point of view, it follows that there must be at least a few such Quasi-Lems, Anti-Lems, Para-Lems, or Proto-Lems in the world, but here there are none.

Reply1
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
quetzal_rainbow1mo30

Alternative hypothesis: when you recontextualize, you train model "you will get away with X, given this signal (encouragement to do X in system prompt)". When model sees no such signal, it decides that it won't get away with bad behavior and acts good.

I wonder what would happen if you recontextualize with pair of prompts "your actions and thoughts will be recorded and evaluated" vs "you are acting absolutely autonomously, no oversight".

Reply
Load More
10Linkpost: Predicting Empirical AI Research Outcomes with Language Models
5mo
1
21Definition of alignment science I like
10mo
0
15How do you shut down an escaped model?
Q
1y
Q
8
15Training of superintelligence is secretly adversarial
2y
2
8There is no sharp boundary between deontology and consequentialism
2y
2
17Where Does Adversarial Pressure Come From?
2y
1
7Predictable Defect-Cooperate?
2y
1
61They are made of repeating patterns
2y
4
10How to model uncertainty about preferences?
Q
3y
Q
2
3What literature on the neuroscience of decision making can you recommend?
Q
3y
Q
0
Load More