Comments

Wei Dai1d140

If you think there’s something mysterious or unknown about what happens when you make two copies of yourself

Eliezer talked about some puzzles related to copying and anticipation in The Anthropic Trilemma that still seem quite mysterious to me. See also my comment on that post.

I think the way morality seems to work in humans is that we have a set of potential moral values, determined by our genes, that culture can then emphasize or de-emphasize. Altruism seems to be one of these potential values, that perhaps got more emphasized in recent times, in certain cultures. I think altruism isn't directly evolutionarily connected to power, and it's more like "act morally (according to local culture) while that's helpful for gaining power" which translates to "act altruistically while that's helpful for gaining power" in cultures that emphasize altruism. Does this make more sense?

Wei Dai2d289

What are some failure modes of such an agency for Paul and others to look out for? (I shared one anecdote with him, about how a NIST standard for "crypto modules" made my open source cryptography library less secure, by having a requirement that had the side effect that the library could only be certified as standard-compliant if it was distributed in executable form, forcing people to trust me not to have inserted a backdoor into the executable binary, and then not budging when we tried to get an exception for this requirement.)

Wei Dai16d42

The only way to win is not to play.

Seems like a lot of people are doing exactly this, but interpreting it as "not having kids" instead of "having kids but not trying to compete with others in terms of educational investment/signaling". As a parent myself I think this is pretty understandable in terms of risk-aversion, i.e., being worried that one's unconventional parenting strategy might not work out well in terms of conventional success, and getting a lot of guilt / blame / status loss because of it.

Given it is a dystopian status competition hell, pay for it seems terrible, but if we have 98% participation now and 94% financial hardship, then this could be a way to justify a huge de facto transfer to parents.

I don't understand how this justifies paying. Wouldn't a big transfer to parents just cause more educational investment/signaling and leave the overall picture largely unchanged?

Wei Dai16d50

Trying to draw some general lessons from this:

  1. We are bad at governance, even on issues/problems that emerge/change slowly relative to human thinking (unlike, e.g., COVID-19). I think people who are optimistic about x-risk governance should be a bit more pessimistic based on this.
  2. Nobody had the foresight to think ahead of time about status dynamics in relation to fertility and parental investment. Academic theories about this are lagging empirical phenomena by a lot. What important dynamics will we miss with AI? (Nobody seems to be thinking about status and AI, which is one obvious candidate.)
Wei Dai18d31

It seems that humans, starting from a philosophically confused state, are liable to find multiple incompatible philosophies highly plausible in a path-dependent way, see for example analytic vs continental philosophy vs non-Western philosophies. I think this means if we train an AI to optimize directly for plausibility, there's little assurance that we actually end up with philosophical truth.

A better plan is to train the AI in some way that does not optimize directly for plausibility, have some independent reason to think that the AI will be philosophically competent, and then use plausibility only as a test to detect errors in this process. I've written in the past that ideally we would first solve metaphilosophy so we that we can design the AI and the training process with a good understanding of the nature of philosophy and philosophical reasoning in mind, but failing that, I think some of the ideas in your list are still better than directly optimizing for plausibility.

You can do something like train it with RL in an environment where doing good philosophy is instrumentally useful and then hope it becomes competent via this mechanism.

This is an interesting idea. If it was otherwise feasible / safe / a good idea, we could perhaps train AI in a variety of RL environments, see which ones produce AIs that end up doing something like philosophy, and then see if we can detect any patterns or otherwise use the results to think about next steps.

Wei Dai22d112

I'm guessing you're not being serious, but just in case you are, or in case someone misinterprets you now or in the future, I think we probably do not want to train AIs to give us answers optimized to sound plausible to humans, since that would make it even harder to determine whether or not the AI is actually competent at philosophy. (Not totally sure, as I'm confused about the nature of philosophy and philosophical reasoning, but I think we definitely don't want to do that in our current epistemic state, i.e., unless we had some really good arguments that says it's actually a good idea.)

Wei Dai22d159

Many comments pointed out that NYT does not in fact have a consistent policy of always revealing people's true names. There's even a news editorial about this which I point out in case you trust the fact-checking of NY Post more.

I think that leaves 3 possible explanations of what happened:

  1. NYT has a general policy of revealing people's true names, which it doesn't consistently apply but ended up applying in this case for no particular reason.
  2. There's an inconsistently applied policy, and Cade Metz's (and/or his editors') dislike of Scott contributed (consciously or subconsciously) to insistence on applying the policy in this particular case.
  3. There is no policy and it was a purely personal decision.

In my view, most rationalists seem to be operating under a reasonable probability distribution over these hypotheses, informed by evidence such as Metz's mention of Charles Murray, lack of a public written policy about revealing real names, and lack of evidence that a private written policy exists.

Wei Dai22d257

While reading this, I got a flash-forward of what my life (our lives) may be like in a few years, i.e., desperately trying to understand and evaluate complex philosophical constructs presented to us by superintelligent AI, which may or may not be actually competent at philosophy.

Wei Dai23dΩ340

I gave this explanation at the start of the UDT1.1 post:

When describing UDT1 solutions to various sample problems, I've often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.

Load More