Nick_Tarleton

Wikitag Contributions

Comments

Sorted by

Nobody likes rules that are excessive or poorly chosen, or bad application of rules. I like rules that do things like[1]:

  • Prohibit others from doing things that would harm me, where either I don't want to do those things, or I prefer the equilibrium where nobody does to that where everybody does.
  • Require contributing to common goods. (sometimes)
  • Take the place of what would otherwise be unpredictable judgments of my actions.

  1. not a complete list ↩︎

Besides uncertainty, there's the problem of needing to pick cutoffs between tiers in a ~continuous space of 'how much effect does this have on a person's life?', with things slightly on one side or the other of a cutoff being treated very differently.

Intuitively, tiers correspond to the size of effect a given experience has on a person's life:

I agree with the intuition that this is important, but I think that points toward just rejecting utilitarianism (as in utility-as-a-function-purely-of-local-experiences, not consequentialism).

I think this point and Zack's argument are pretty compatible (and both right).

Rules don't have to be formally specified, just clear to humans and consistent and predictable in their interpretation. Common law demonstrates social tech, like judicial precedent and the reasonable-person standard, for making interpretation consistent and predictable when interpretation is necessary (discussed in Free to Optimize).

I basically agree with you, but this

"Go die, idiot" is generally bad behavior, but not because it's "lacking respect".

confusingly contradicts (semantically if not substantively)

"Do I respect you as a person?" fits well with the "treat someone like a person" meaning. It means I value not burning bridges by saying things like "Go die, idiot"

Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values') influence over the world more than their existence, which a heaven likely wouldn't help with.

Consensual inspections don't help much if the dangerous thing is actually cheap and easy to create.

I'd say it's hard to do at least as much because the claim 'we are doing these arbitrary searches only in order to stop bioweapons' is untrustworthy by default, and even if it starts out true, once the precedent is there it can be used (and is tempting to use) for other things. Possibly an AI could be developed and used in a transparent enough way to mitigate this.

But, on one hand, he is saying that proper methodology is important and expects it to be in place for the next year competition:

But most of his specific methodological issues are inapplicable here, unless OpenAI is lying: they didn't rewrite the questions, provide tools, intervene during the run, or hand-select answers.

I don't have a theory of Tao's motivations, but if the post I linked is interpreted as a response to OpenAI's result (he didn't say it was, but he didn't say it wasn't and the timing makes it an obvious interpretation) raising those issues is bizarre.

If one approach is simply better - why isn't everybody doing it?

  • Many people don't live in places where they have to parallel-park in tight spaces frequently. (Are there many people who do who don't use the better approach? I don't know.)
  • The better way (reversing) isn't the maximally intuitive or direct way. (I don't remember if I was ever taught it, but if so I definitely didn't internalize it and had to re-learn from experience. If I was taught it, it was when I was learning to drive, and it feels like it'd be hard to understand why it's better without more experience.)
  • Learned blankness, or aversion to paying enough attention to a stressful/anxious thing to learn how to do it better.

This talk helped crystallize for me how two very different things go by the term "values":

  1. strategies by which an entity survives and propagates (the "values" of animals, humans following instinct, traditional cultures, ?locusts?)
  2. consequentialist goals that needn't have anything to do with an agent's survival or its strategies (the "values" of paperclippers, utilitarians, EAs and other ideologically-driven humans)

Different intuitions about e.g. whether the strategy-stealing assumption holds up seem likely to be related to different senses of whether "values" paradigmatically means #1 or #2.

(Related, I think: Is "VNM-agent" one of several options, for what minds can grow up into?)

Load More