All of CharlesRW's Comments + Replies

strictly weaker

add "than Condorcet" in this sentence since its only implied but not said

I think we agree modulo terminology, with respect to your remarks up to the part about the Krakovna paper, which I had to sit and think a little bit more about.

For the Krakovna paper, you're right that it has a different flavor than I remembered - it still seems, though, that the proof relies on having some ratio of recurrent vs. non-recurrent states. So if you did something like 1000x the number of terminal states, the reward function is 1000x less retargetable to recurrent-states - I think this is still true even if the new terminal states are entirely ... (read more)

Hi thanks for the response :) So I'm not sure what the distinction you're making between utility and reward functions, but as far as I can tell we're referring to the same object - the thing which is changed in the 'retargeting' process, the parameters theta - but feel free to correct me if the paper distinguishes between these in a way I'm forgetting; I'll be using "utility function", "reward function" and "parameters theta" interchangably, but will correct if so.

I think perhaps we're just calling different objects as "agents" - I mean p(__ | theta) for ... (read more)

2Jannes Elstner
For me utility functions are about decision-making, e.g. utility-maximization, while the reward functions are the theta, i.e. the input to our decision-making, which we are retargeting over, but can only do so for retargetable utility functions. I think the theta is not a property of the agent, but of the training prodecure. Actually, Parametrically retargetable decision-makers tend to seek power is not about trained agents in the first place, so I'd say we're never talking about different agents in the first place. I agree with this if we constrain ourselves to Turner's work. While V. Krakovna's work still depends on the option-variegation, but we're not picking random reward-functions, which is a nice improvement. Does the proof really depend on whether the reward function scales with the number of possible states? It seems to me that you just need some reward from the reward function that the agent has not seen during training so that we can retarget by swapping the rewards. For example, if our reward function is a CNN, we just need images which haven't been seen during training, which I don't think is a strong assumption since we're usually not training over all possible combination of pixels. Do you agree with this? If you have concrete suggestions that you'd like me to change, then you can click on the edit button at the article and leave a comment on the underlying google doc, I'd appreciate it :) Maybe its also useless to discuss this...

Remarks on the Slow Boring post about "Kritiks" in debate, copied here for a friend who wanted to reference them, and lightly edited to fit the shortform format:

In the format of debate I do ("British Parliamentary" (BP), also my favorite, having also done "Public Forum"), this sort of thing is sorta-kinda explicitly not allowed. Ironically, I think BP is somewhere where this would be most appropriate - because you don't have the 'motion' until 15 minutes prior to start, focus on ground-facts is low (only relatively, compared to other research-heavy styles)... (read more)

2Quinn
(I was the one who asked Charles to write up his inside view, as reading the article is the only serious information I've ever gathered about debate culture https://www.slowboring.com/p/how-critical-theory-is-radicalizing )

Hi just a quick comment regarding the power-seeking theorems post: the definition you give of "power" as expected utility of optimal behavior is not the same as that used in the power-seeking theorems.

The theorems are not about any particular agent, but are statements about processes which produce agents. The definition of power is more about the number of states an agent can access. Colloquially, they're more of the flavor "for a given optimizing-process, training it on most utility functions will cause the agent to take actions which give it access to ... (read more)

2Jannes Elstner
I'm the author. This refers to the fact that most utility functions are retargetable. But the most important part of the power-seeking theorems is the actual power-seeking, which is proven in the appendix of Parametrically Retargetable Decision-Makers Tend To Seek Power, so I don't agree with your summary.   There is no averaging over utility functions happening, the averaging is over reward functions. From Parametrically Retargetable Decision-Makers Tend To Seek Power: "a trained policy π seeks power when π’s actions navigate to states with high average optimal value (with the average taken over a wide range of reward functions." This matches with what I wrote in the article.  I do agree that utility functions are missing from the post, but they aren't averaged over.  They relate to the decision-making of the agent, and thus to the condition of retargetability that the theorems require. 

Tl;dr is that your argument doesn't meaningfully engage the counterproposition, and I think this not only harms your argument, but severely limits the extent to which the discussion in the comments can be productive. I'll confess that the wall of text below was written because you made me angry, not because I'm so invested in epistemic virtue - that said, I hope it will be taken as constructive criticism which will help the comments-section be more valuable for discussion :)

  • Missing argument pieces: you lack an argument for why higher fertility rates ar

... (read more)
-4Roman Leventov
First of all, as our society and civilisation gets more complex, "18 is an adult" is more and more comically low and inadequate. Second, I think a better reference class are decisions that may have irreversible consequences. E.g., the minimum age of voluntary human sterilisation is 25, 35, and even 40 years in some countries (but is apparently just 18 in the US, which is a joke). I cannot easily find statistics of the minimum age when a single person can adopt a child, but it appears to be 30 years in the US. If the rationale behind this policy was about financial stability only, why rich, single 25 yo's cannot adopt? I think it's better to compare entering AI relationship with these policies than with drinking alcohol or watching porn or having sex with humans (individual cases of which, for the most part, don't change human lives irreversibly, if practiced safely; and yes, it would be prudent to ban unprotected sex for unmarried people under 25, but alas, such a policy would be unenforceable). I don't think any mental condition disqualifies person from having a human relationship, but I think it shifts the balance in the other direction. E.g., if a person has bouts of uncontrollable aggression and has a history of domestic abuse and violence, it makes much less sense to bar him from AI partners and thus compel him to find new potential human victims (although he/she is not prohibited from doing that, unless jailed). No, this is not what I meant, see above. All these things are at least mildly bad for society, I think this is very uncontroversial. What is much more doubtful (including for me) is how the effects of these things on individual weigh against their effects on society. The balance may be different for different things and is also different than the respective balance for AI partners. First, the discussion of the ban of porn is unproductive because it's completely unenforceable. Online dating is a very complicated matter and I don't want to discus
-7Roman Leventov
Answer by CharlesRW72

(Personal bias is heavily towards the upskilling side of the scale) There are three big advantages to “problem first, fundamentals later”:

  1. You get experience doing research directly
  2. You save time
  3. Anytime you go to learn something for the problem, you will always have the context of “what does this mean in my case?”

3 is a mixed bag - sometimes this will be useful because it brings together ideas from far-away in idea-space; other times it will make it harder for you to learn things on their own terms - you may end up basing your understanding on non-cent... (read more)

2Nicholas / Heather Kross
Thank you, this makes sense currently! (Right now I'm on Pearl's Causality)

Hi!  I have a pretty good amount of experience with playing this game - I have a google spreadsheet collecting all sorts of data wrt food, exercise, habits etc.  that I've been collecting for quite a while.  I've had some solid successes (I'd say improved quality-of-life by 100x, but starting from an unrepresentatively low baseline), but also can share a few difficulties I've had with this approach; I'll just note some general difficulties and then talk about how this might translate into a useful app sorta thing that one could use.

1. It's h... (read more)

Fwiw, my experience with MOOCs/OCW  has been extremely positive (mainly in math and physics). Regarding the issue of insufficient depth, for a given 'popular topic' like ML, there are indeed strong incentives for lots of folks to put out courses on these, so there'll be lots to choose from, albeit with a wide variance in quality - c.f. all the calculus courses on edX.

That said, I find that as you move away from 'intro level', courses are generally of a more consistent quality, where they tend to rely a lot less on a gimmicky MOOC structure and follow ... (read more)