User Comment Replies

Thanks for the comment! I agree with this characterization. I think one of the main points I was trying to make in this piece was that as long as the prior for "amount of special sauce in the brain (or needed for TAI)" is a free parameter, the uncertainty in this parameter may dominate the timelines conversation (e.g. people who think that it is big may be basically unmoved by the Bio-Anchors calculation), so I'd love to see more work aimed at estimating it. (Then the rest of my post was an attempt to give some preliminary intuition pumps for why this para... (read more)

2Daniel Kokotajlo2y

Nice. I agree with your point about how uncertainty in this parameter may dominate timelines conversation. If you do write more about why the prior on special sauce should be large (e.g. whether TAI is in the space of current algorithms), I'd be interested to read them! Though don't feel like you have to do this--maybe you have more important projects to do. (No rush! This sort of conversation doesn't have a time limit, so it's not hurting me at all to wait even months before replying. I'm glad you like LW. :) )

Transparency and AGI safety

jylin044y10

Interesting, thanks! I stand corrected (and will read your paper)...

Transparency and AGI safety

jylin044yΩ550

Thanks Rohin! Agree with and appreciate the summary as I mentioned before.

I don’t agree with motivation 1 as much: if I wanted to improve AI timeline forecasts, there are a lot of other aspects I would investigate first. (Specifically, I’d improve estimates of inputs into <@this report@>(@Draft report on AI timelines@).) Part of this is that I am less uncertain than the author about the cruxes that transparency could help with, and so see less value in investigating them further.

I'm curious: does this mean that you're on board with the as... (read more)

5Rohin Shah4y

Yes, with the exception that I don't know if compute will be the bottleneck (that is my best guess; I think Ajeya's report makes a good case for it; but I could see it being other factors as well). I think the case for is basically "we see a bunch of very predictable performance lines; seems like they'll continue to go up". But more importantly I don't know of any compelling counterpoints; the usual argument seems to be "but we don't see any causal reasoning / abstraction / <insert property here> yet", which I think is perfectly compatible with the scaling hypothesis (see e.g. this comment). I see, that makes sense, and I think it does make sense as an intuition pump for what the "ML paradigm" is trying to do (though as you sort of mentioned I don't expect that we can just do the motivation / cognition decomposition). Definitely depends on how powerful you're expecting the AI system to be. It seems like if you want to make the argument that AI will go well by default, you need the research accelerator to be quite powerful (or you have to combine with some argument like "AI alignment will be easy to solve"). I don't think papers, books, etc are a "relatively well-defined training set". They're a good source of knowledge, but if you imitate papers and books, you get a research accelerator that is limited by the capabilities of human scientists (well, actually much more limited, since it can't run experiments). They might be a good source of pretraining data, but there would still be a lot of work to do to get a very powerful research accelerator. Fwiw I'm not convinced that we avoid catastrophic deception either, but my thoughts here are pretty nebulous and I think that "we don't know of a path to catastrophic deception" is a defensible position.

Transparency and AGI safety

jylin044y10

Thanks for the comment! Naively I feel like dropout would make things worse for the reason that you mentioned and anti-dropout better, but I’m definitely not an expert on this stuff.

I’m not sure I totally understand your first idea. Is the idea something like

- Feed some images through a NN and record which neurons have high average activation on them

- Randomly pick some of those neurons and record which dataset examples cause them to have a high average activation

- Pick some subset of those images and iterate until convergence?

5Charlie Steiner4y

Dropout makes interpretation easier because it disincentivizes complicated features where you can only understand the function of the parts in terms of their high-order correlations with other parts. This is because if a feature relies on such correlations, it will be fragile to some of the pieces being dropped out. Anti-dropout promotes consolidation of similar features into one, but it also incentivizes that one feature to be maximally complicated and fragile. Re: first idea. Yeah, something like that. Basically just an attempt at formalization of "functionally similar neurons," so that when you go to drop out a neuron, you actually drop out all functionally similar ones.

6DanielFilan4y

One interesting fact my group discovered is that dropout seems to increase the extent to which a network was modular. We have some results on the topic here but a more comprehensive paper should be out soon.

Transparency and AGI safety

jylin044yΩ250

Thanks a lot for all the effort you put into this post! I don't agree with anything, but reading and commenting it was very stimulating, and probably useful for my own research.

Likewise, thanks for taking the time to write such a long comment! And hoping that's a typo in the second sentence :)

I'm quite curious about why you wrote this post. If it's for convincing researchers in AI Safety that transparency is useful and important for AI Alignment, my impression is that many researchers do agree, and those who don't tend to have thought about it for qu

... (read more)

2adamShimi4y

You're welcome. And yes, this was as typo that I corrected. ^^ My take is that a lot of people around here agree that transparency is at least useful, and maybe necessary. And the main reason why people are not working on it is a mix of personal fit, and the fact that without research in AI Alignment proper, transparency doesn't seem that useful (if we don't know what to look for). Well, transparency is doing some work, but it's totally unable to prove anything. That's a big part of the approach I'm proposing. That being said, I agree that this doesn't look like scaling the current way. You're right that I was thinking of a more online system that could update it's weights during deployment. Yet even with frozen weights, I definitely expect the model to make plans involving things that were not involved. For example, it might not have a bio-weapon feature, but the relevant subfeature to build some by quite local rules that don't look like a plan to build a bio-weapon. That seems reasonable.

LESSWRONG
LW

All of jylin04's Comments + Replies