Nick_Tarleton - LessWrong

Nobody likes rules that are excessive or poorly chosen, or bad application of rules. I like rules that do things like^[1]:

Prohibit others from doing things that would harm me, where either I don't want to do those things, or I prefer the equilibrium where nobody does to that where everybody does.
Require contributing to common goods. (sometimes)
Take the place of what would otherwise be unpredictable judgments of my actions.

not a complete list ↩︎

Thane Ruthenis's Shortform

Nick_Tarleton5d62

Besides uncertainty, there's the problem of needing to pick cutoffs between tiers in a ~continuous space of 'how much effect does this have on a person's life?', with things slightly on one side or the other of a cutoff being treated very differently.

Intuitively, tiers correspond to the size of effect a given experience has on a person's life:

I agree with the intuition that this is important, but I think that points toward just rejecting utilitarianism (as in utility-as-a-function-purely-of-local-experiences, not consequentialism).

Just Make a New Rule!

Nick_Tarleton5d184

I think this point and Zack's argument are pretty compatible (and both right).

Rules don't have to be formally specified, just clear to humans and consistent and predictable in their interpretation. Common law demonstrates social tech, like judicial precedent and the reasonable-person standard, for making interpretation consistent and predictable when interpretation is necessary (discussed in Free to Optimize).

"Some Basic Level of Mutual Respect About Whether Other People Deserve to Live"?!

Nick_Tarleton6d40

I basically agree with you, but this

"Go die, idiot" is generally bad behavior, but not because it's "lacking respect".

confusingly contradicts (semantically if not substantively)

"Do I respect you as a person?" fits well with the "treat someone like a person" meaning. It means I value not burning bridges by saying things like "Go die, idiot"

Stephen Martin's Shortform

Nick_Tarleton6d30

Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values') influence over the world more than their existence, which a heaven likely wouldn't help with.

A night-watchman ASI as a first step toward a great future

Nick_Tarleton6d30

Consensual inspections don't help much if the dangerous thing is actually cheap and easy to create.

A night-watchman ASI as a first step toward a great future

Nick_Tarleton6d50

I'd say it's hard to do at least as much because the claim 'we are doing these arbitrary searches only in order to stop bioweapons' is untrustworthy by default, and even if it starts out true, once the precedent is there it can be used (and is tempting to use) for other things. Possibly an AI could be developed and used in a transparent enough way to mitigate this.

OpenAI Claims IMO Gold Medal

Nick_Tarleton6d40

But, on one hand, he is saying that proper methodology is important and expects it to be in place for the next year competition:

But most of his specific methodological issues are inapplicable here, unless OpenAI is lying: they didn't rewrite the questions, provide tools, intervene during the run, or hand-select answers.

I don't have a theory of Tao's motivations, but if the post I linked is interpreted as a response to OpenAI's result (he didn't say it was, but he didn't say it wasn't and the timing makes it an obvious interpretation) raising those issues is bizarre.

Parallel Parking and possibly Instrumental Convergence

Nick_Tarleton6d30

If one approach is simply better - why isn't everybody doing it?

Many people don't live in places where they have to parallel-park in tight spaces frequently. (Are there many people who do who don't use the better approach? I don't know.)
The better way (reversing) isn't the maximally intuitive or direct way. (I don't remember if I was ever taught it, but if so I definitely didn't internalize it and had to re-learn from experience. If I was taught it, it was when I was learning to drive, and it feels like it'd be hard to understand why it's better without more experience.)
Learned blankness, or aversion to paying enough attention to a stressful/anxious thing to learn how to do it better.

Video and transcript of talk on "Can goodness compete?"

Nick_Tarleton6d133

This talk helped crystallize for me how two very different things go by the term "values":

strategies by which an entity survives and propagates (the "values" of animals, humans following instinct, traditional cultures, ?locusts?)
consequentialist goals that needn't have anything to do with an agent's survival or its strategies (the "values" of paperclippers, utilitarians, EAs and other ideologically-driven humans)

Different intuitions about e.g. whether the strategy-stealing assumption holds up seem likely to be related to different senses of whether "values" paradigmatically means #1 or #2.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments