Besides uncertainty, there's the problem of needing to pick cutoffs between tiers in a ~continuous space of 'how much effect does this have on a person's life?', with things slightly on one side or the other of a cutoff being treated very differently.
Intuitively, tiers correspond to the size of effect a given experience has on a person's life:
I agree with the intuition that this is important, but I think that points toward just rejecting utilitarianism (as in utility-as-a-function-purely-of-local-experiences, not consequentialism).
I think this point and Zack's argument are pretty compatible (and both right).
Rules don't have to be formally specified, just clear to humans and consistent and predictable in their interpretation. Common law demonstrates social tech, like judicial precedent and the reasonable-person standard, for making interpretation consistent and predictable when interpretation is necessary (discussed in Free to Optimize).
I basically agree with you, but this
"Go die, idiot" is generally bad behavior, but not because it's "lacking respect".
confusingly contradicts (semantically if not substantively)
"Do I respect you as a person?" fits well with the "treat someone like a person" meaning. It means I value not burning bridges by saying things like "Go die, idiot"
Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values') influence over the world more than their existence, which a heaven likely wouldn't help with.
Consensual inspections don't help much if the dangerous thing is actually cheap and easy to create.
I'd say it's hard to do at least as much because the claim 'we are doing these arbitrary searches only in order to stop bioweapons' is untrustworthy by default, and even if it starts out true, once the precedent is there it can be used (and is tempting to use) for other things. Possibly an AI could be developed and used in a transparent enough way to mitigate this.
But, on one hand, he is saying that proper methodology is important and expects it to be in place for the next year competition:
But most of his specific methodological issues are inapplicable here, unless OpenAI is lying: they didn't rewrite the questions, provide tools, intervene during the run, or hand-select answers.
I don't have a theory of Tao's motivations, but if the post I linked is interpreted as a response to OpenAI's result (he didn't say it was, but he didn't say it wasn't and the timing makes it an obvious interpretation) raising those issues is bizarre.
If one approach is simply better - why isn't everybody doing it?
This talk helped crystallize for me how two very different things go by the term "values":
Different intuitions about e.g. whether the strategy-stealing assumption holds up seem likely to be related to different senses of whether "values" paradigmatically means #1 or #2.
(Related, I think: Is "VNM-agent" one of several options, for what minds can grow up into?)
Nobody likes rules that are excessive or poorly chosen, or bad application of rules. I like rules that do things like[1]:
not a complete list ↩︎