All of electroswing's Comments + Replies

I wrote a what I believe to be simpler explanation of this post here. Things I tried to do differently: 

  1. More clearly explaining what Nash equilibrium means for infinitely repeated games -- it's a little subtle, and if you go into it just with intuition, it's not clear why the "everyone puts 99" situation can be a Nash equilibrium 
  2. Noting that just because something is a Nash equilibrium doesn't mean it's what the game is going to converge to 
  3. Less emphasis on minimax stuff (it's just boilerplate, not really the main point of folk theorems) 

The strategy profile I describe is where each person has the following strategy (call it "Strategy A"): 

  • If empty history, play 99 
  • If history consists only of 99s from all other people, play 99 
  • If any other player's history contains a choice which is not 99, play 100 

The strategy profile you are describing is the following (call it "Strategy B"): 

  • If empty history, play 99
  • If history consists only of 99s from all other people, play 99 
  • If any other player's history contains a choice which is not 99, play 30 

I agree Strategy B... (read more)

I'm not sure what the author intended, but my best guess is they wanted to say "punishment is bad because there exist really bad equilibria which use punishment, by folk theorems". Some evidence from the post (emphasis mine): 

Rowan: "If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently."

Neal: "Punishment seems like a bad idea. It's pessimizing another agent's utility function. You could get a pretty bad equilibrium if you're saying agents sho

... (read more)
3Seth Herd
That's right, I think that was the original point. But this example seems to be a bad one for making that point, because it's punishing pro-social behavior. If you could show how punishing antisocial, defecting behavior had bad consequences, that would be interesting.

I have 2 separate claims:

  1. Any researcher, inside or outside of academia, might consider emulating attributes successful professors have in order to boost personal research productivity. 
  2. AI safety researchers outside of academia should try harder to make their legible to academics, as a cheap way to get more good researchers thinking about AI safety. 

What I'm questioning is the implicit assumption in your post that AI safety research will inevitably take place in an academic environment [...]

This assumption is not implicit, you're putting together ... (read more)

There are lots of ways a researcher can choose to adopt new productivity habits. They include:

  1. Inside view, reasoning from first principles 
  2. Outside view, copying what successful researchers do

The purpose of this post is to, from an outside view perspective, list what a class of researchers (professors) does, which happens to operate very differently from AI safety.

Once again, I am not claiming to have an inside view argument in favor of the adoption of each of these attributes. I do not have empirics. I am not claiming to have an airtight causal model.... (read more)

What I'm questioning is the implicit assumption in your post that AI safety research will inevitably take place in an academic environment, and therefore productivity practices derived from other academic settings will be helpful. Why should this be the case when, over the past few years, most of the AI capabilities research has occurred in corporate research labs?

Some of your suggestions, of course, work equally well in either environment. But not all, and even the ones which do work would require a shift in emphasis. For example, when you say professors ... (read more)

Overall, it seems like your argument is that AI safety researchers should behave more like traditional academia for a bunch reasons that have mostly to do with social prestige.

 

That is not what I am saying. I am saying that successful professors are highly successful researchers, that they share many qualities (most of which by the way have nothing to do with social prestige), and that AI safety researchers might consider emulating these qualities. 

Furthermore, I would note that traditional academia has been moving away from these practices, to a

... (read more)

I am saying that successful professors are highly successful researchers

Are they? That's why I'm focusing on empirics. How do you know that these people are highly successful researchers? What impressive research findings have they developed, and how did e.g. networking and selling their work enable them to get to these findings? Similarly, with regards to bureaucracy, how did successfully navigating the bureaucracy of academia enable these researchers to improve their work?

The way it stands right now, what you're doing is pointing at some traits that c... (read more)

However, I'd still like to know where you're drawing these observations from? Is it personal observation?

 

Yes, personal observation, across quite a few US institutions. 

And if so, how have you determined whether a professor is successful or not?

One crude way of doing it is saying that a professor is successful if they are a professor at a top 10-ish university. Academia is hypercompetitive so this is a good filter. Additionally my personal observations are skewed toward people who I think do good research, so additionally "successful" here means ... (read more)

quanticle1310

One crude way of doing it is saying that a professor is successful if they are a professor at a top 10-ish university.

But why should that be the case? Academia is hypercompetitive, but the way it selects is not solely on the quality of one's research. Choosing the trendiest fields has a huge impact. Perhaps the professors that are chosen by prestigious universities are the ones that the prestigious universities think are the best at drawing in grant money and getting publications into high-impact journals, such as Nature, or Science.


Specifically I th

... (read more)

I think the obvious answer here is AutoPay -- this should hedge against situations you are describing. 

The costs of making a mistake are certainly high, since it's a permanent hit to your credit report. I am not super knowledgeable of how late payments affect credit score (other than that it has a negative sign), this is an interesting question.

1Brendan Long
It's not permanent. Late payments stay on your credit report for up to 7 years, but they have very little effect after a year or two (assuming it's a one-time thing). There are also ways to get late payments removed from a credit report, especially if you pay the bill as as soon as possible after realizing it's late. https://www.creditkarma.com/credit-cards/i/how-long-do-late-payments-stay-on-credit-report I agree that anyone using a credit card should have autopay enabled though.

Hmmm...the orthogonality thesis is pretty simple to state, so I don't think necessarily that it has been grossly misunderstood. The bad reasoning in Fallacy 4 seems to come from a more general phenomenon with classic AI Safety arguments, where they do hold up, but only with some caveats and/or more precise phrasing. So I guess "bad coverage" could apply to the extent that popular sources don't go in depth enough. 

I do think the author presented good summaries of Bostrom's and Russell's viewpoints. But then they immediately jump to a "special sauce" ty... (read more)

1AM
Good points!  Yes this snippet is particularly nonsensical to me It sounds like their experience with computers has involved them having a lot of "basic humanlike common sense" which is a pretty crazy experience in this case. When I explain what programming is like to kids, I usually say something like "The computer will do exactly exactly exactly what you tell it to, extremely fast. You can't rely on any basic sense checking or common sense, or understanding from it, if you can't define what you want specifically enough, the computer will fail in a (to you) very stupid way, very quickly."

I mean...sure...but again, this does not affect the validity of my counterargument. Like I said, I'm using as strong as possible of a counterargument by saying that even if the non-brain parts of the body were to add 2-100x computing power, this would not restrict our ability to scale up NNs to get human-level cognition. Obviously this still holds if we replace "2-100x" with "1x". 

The advantage of "2-100x" is that it is extraordinarily charitable to the "embodied cognition" theory—if (and I consider this to be extremely low probability) embodied cogni... (read more)

This claim is false.  (as in, the probability that it is true is vanishingly close to zero, unless the human brain uses supernatural elements).  All of the motor drivers except for the most primitive reflexes (certain spinal reflexes) are in the brain.  You can say that for all practical purposes, 100% of the computational power the brain has is in the brain.

I agree with your intuition here, but this doesn't really affect the validity of my counterargument. I should have stated more clearly that I was computing a rough upper bound. So saying... (read more)

1[anonymous]
Yes and first of all, why are you even attempting to add "2x".  A reasonable argument would be "~1x", as in, the total storage of all state outside the body is so small it can be neglected.

What do you do to keep up with AI Safety / ML / theoretical CS research, to the extent that you do? And how much time do you spend on this? For example, do you browse arXiv, Twitter, ...? 

A broader question I'd also be interested in (if you're willing to share) is how you allocate your working hours in general. 

6paulfchristiano
Mostly word of mouth (i.e. I know the authors, or someone sends a link to a paper either to me directly or to a slack channel I'm on or...). I sometimes browse conference proceedings or arxiv but rarely find that much valuable. Sometimes I'm curious if anyone has made progress on issue X so search for it, or more often I'm curious about what some people have been up to so check if I've missed a paper. I've been keeping up with things less well since leaving OpenAI.

What's your take on "AI Ethics", as it appears in large tech companies such as Google or Facebook? Is it helping or hurting the general AI safety movement? 

I think "AI ethics" is pretty broad and have different feelings of different parts. I'm generally supportive of work that makes AI better for humanity or non-human animals, even when it's not focused on the long-term. Sometimes I'm afraid about work in AI ethics that doesn't seem pass any reasonable cost-benefit analysis, and that it will annoy people in AI and make it harder to get traction with pro-social policies that are better-motivated (I'm also sometimes concerned about this for work in AI safety). I don't have a strong view about the net effect of ... (read more)

You've appeared on the 80,000 Hours podcast two times. To the extent that you remember what you said in 2018-19, are there any views you communicated then which you no longer hold now? Another way of asking this question is—do you still consider those episodes to be accurate reflections of your views? 

3paulfchristiano
I don't remember anything in particular where my view changed but I don't really remember what I said (happy to answer particular questions about my views). I'd guess they are still roughly accurate but that like 5% of the things I said I'd now disagree with and 5% I'd feel ambivalent about?

According to your internal model of the problem of AI safety, what are the main axes of disagreement researchers have? 

The three that first come to mind:

  • How much are ML systems likely to learn to "look good to the person training them" in a way that will generalize scarily to novel test-time situations, vs learning to straightforwardly do what we are trying to train them to do?
  • How much alien knowledge are ML systems likely to have? Will humans be able to basically understand what they are doing with some effort, or will it quickly become completely beyond us?
  • How much time will we have to adapt gradually as AI systems improve, and how fast will we be able to adapt? How similar will the problems that arise be to the ones we can anticipate now?