All of carboniferous_umbraculum 's Comments + Replies

Newtonian mechanics was systematized as a special case of general relativity.

One of the things I found confusing early on in this post was that systemization is said to be about representing the previous thing as an example or special case of some other thing that is both simpler and more broadly-scoped.

In my opinion, it's easy to give examples where the 'other thing' is more broadly-scoped and this is because 'increasing scope' corresponds to the usual way we think of generalisation, i.e. the latter thing applies to more setting or it is 'about a wi... (read more)

2Richard_Ngo1y

Yeah, good point. The intuition I want to point at here is "general relativity was simpler than Newtonian mechanics + ad-hoc adjustments for Mercury's orbit". But I do think it's a little tricky to pin down the sense in which it's simpler. E.g. what if you didn't actually have any candidate explanations for why Mercury's orbit was a bit off? (But you'd perhaps always have some hypothesis like "experimental error", I guess.) I'm currently playing around with the notion that, instead of simplicity, we're actually optimizing for something like "well-foundedness", i.e. the ability to derive everything from a small set of premises. But this feels close enough to simplicity that maybe I should just think of this as one version of simplicity.

carboniferous_umbraculum 1y20

OK I think this will be my last message in this exchange but I'm still confused. I'll try one more time to explain what I'm getting at.

I'm interested in what your precise definition of subjective probability is.

One relevant thing I saw was the following sentence:

If I say that a coin is 50% likely to come up heads, that's me saying that I don't know the exact initial conditions of the coin well enough to have any meaningful knowledge of how it's going to land, and I can't distinguish between the two options.

It seems to give something like a defi... (read more)

So my point is still: What is that thing? I think yes I actually am trying to push proponents of this view down to the metaphysics - If they say "there's a 40% chance that it will rain tomorrow", I want to know things like what it is that they are attributing 40%-ness to. And what it means to say that that thing "has probability 40%". That's why I fixated on that sentence in particular because it's the closest thing I could find to an actual definition of subjective probability in this post.

2TAG1y

Which view? Subjective probability? Subjective probability is a credence, a level of belief.

carboniferous_umbraculum 1y2-1

I have in mind very simple examples. Suppose that first I roll a die. If it doesn't land on a 6, I then flip a biased coin that lands on heads 3/5 of the time. If it does land on a 6 I just record the result as 'tails'. What is the probability that I get heads?

This is contrived so that the probability of heads is

5/6 x 3/5 = 1/2.

But do you think that that in saying this I mean something like "I don't know the exact initial conditions... well enough to have any meaningful knowledge of how it's going to land, and I can't distinguish be... (read more)

1Isaac King1y

I don't understand how either of those are supposed to be a counterexample. If I don't know what seat is going to be chosen randomly each time, then I don't have enough information to distinguish between the outcomes. All other information about the problem (like the fact that this is happening on a plane rather than a bus) is irrelevant to the outcome I care about. This does strike me as somewhat tautological, since I'm effectively defining "irrelevant information" as "information that doesn't change the probability of the outcome I care about". I'm not sure how to resolve this; it certainly seems like I should be able to identify that the type of vehicle is irrelevant to the question posed and discard that information.

We might be using "meaning" differently then!

I'm fine with something being subjective, but what I'm getting at is more like: Is there something we can agree on about which we are expressing a subjective view?

2TAG1y

Sure, if we are observing the same things and ignorant about the same the things. Subjective doesn't mean necessarily different.

I'm kind of confused what you're asking me - like which bit is "accurate" etc.. Sorry, I'll try to re-state my question again:

- Do you think that when someone says something has "a 50% probability" then they are saying that they do not have any meaningful knowledge that allows them to distinguish between two options?

I'm suggesting that you can't possibly think that, because there are obviously other ways things can end up 50/50. e.g. maybe it's just a very specific calculation, using lots of specific information, that ends up with the value 0.5 at the end.... (read more)

1Isaac King1y

No, I think what I said was correct? What's an example that you think conflicts with that interpretation?

carboniferous_umbraculum 1y2-1

Presumably you are not claiming that saying

...I don't know the exact initial conditions of the coin well enough to have any meaningful knowledge of how it's going to land, and I can't distinguish between the two options...

is actually necessarily what it means whenever someone says something has a 50% probability? Because there are obviously myriad ways something can have a 50% probability and this kind of 'exact symmetry between two outcomes' + no other information is only one very special way that it can happen.

So what does it mean exactly when you say something is 50% likely?

1Isaac King1y

I think that's accurate, yeah. What's your objection to it?

2TAG1y

It doesn't have to have a single meaning. Objective probability and subjective probability can co-exist, and if you are just trying to calculate a probability, you don't have to worry about the metaphysics.

carboniferous_umbraculum 1y22

The traditional interpretation of probability is known as frequentist probability. Under this interpretation, items have some intrinsic "quality" of being some % likely to do one thing vs. another. For example, a coin has a fundamental probabilistic essence of being 50% likely to come up heads when flipped.

Is this right? I would have said that what you describe is a more like the classical, logical view of probability, which isn't the same as the frequentist view. Even the wiki page you've linked seems to disagree with what you've written, i.e. it describe... (read more)

1Isaac King1y

Yeah that was a mistake, I mixed frequentism and propensity together.

2bideup1y

Sounds like the propensity interpretation of probability.

At 87, Pearl is still able to change his mind

carboniferous_umbraculum 1y62

My rejoinder to this is that, analogously to how a causal model can be re-implemented as a more complex non-causal model^[2], a learning algorithm that looks at data that in some ways is saying something about causality, be it because the data contains information-decision-action-outcome units generated by agents, because the learning thing can execute actions itself and reflectively process the information of having done such actions, or because the data contains an abstract description of causality, can surely learn causality.

Short comment/feedback just t... (read more)

1Mo Putera1y

I had to read this sentence a few times to grok the author's point...

A Defense of Work on Mathematical AI Safety

carboniferous_umbraculum 2y20

Ah OK, I think I've worked out where some of my confusion is coming from: I don't really see any argument for why mathematical work may be useful, relative to other kinds of foundational conceptual work. e.g. you write (with my emphasis): "Current mathematical research could play a similar role in the coming years..." But why might it? Isn't that where you need to be arguing?

The examples seem to be of cases where people have done some kind of conceptual foundational work which has later gone on to influence/inspire ML work. But early work on deception or goodhart was not mathematical work, that's why I don't understand how these are examples.

2Davidmanheim1y

I think th dispute here is that you're interpreting mathematical too narrowly, and almost all of the work happening in agent foundations and similar is exactly what was being worked on by "mathematical AI research" 5-7 years ago. The argument was that those approaches have been fruitful, and we should expect them to continue to be so - if you want to call that "foundational conceptual research" instead of "Mathematical AI research," that's fine..

Short Remark on the (subjective) mathematical 'naturalness' of the Nanda--Lieberum addition modulo 113 algorithm

A Defense of Work on Mathematical AI Safety

Thanks for the comment Rohin, that's interesting (though I haven't looked at the paper you linked).

I'll just record some confusion I had after reading your comment that stopped me replying initially: I was confused by the distinction between modular and non-modular because I kept thinking: If I add a bunch of numbers $x$ and $y$ and don't do any modding, then it is equivalent to doing modular addition modulo some large number (i.e. at least as large as the largest sum you get). And otoh if I tell you I'm doing 'addition modulo 113', but I o... (read more)

3Rohin Shah2y

I agree -- the point is that if you train on addition examples without any modular wraparound (whether you think of that as regular addition or modular addition with a large prime, doesn't super matter), then there is at least some evidence that you get a different representation than the one Nanda et al found.

carboniferous_umbraculum 2y20

I'm still not sure I buy the examples. In the early parts of the post you seem to contrast 'machine learning research agendas' with 'foundational and mathematical'/'agent foundations' type stuff. Mechanistic interpretability can be quite mathematical but surely it falls into the former category? i.e. it is essentially ML work as opposed to constituting an example of people doing "mathematical and foundational" work.

I can't say much about the Goodhart's Law comment but it seems at best unclear that its link to goal misgeneralization is an example of t... (read more)

2Davidmanheim2y

I'm not really clear what you mean by not buying the example. You certainly seem to understand the distinction I'm drawing - mechanistic interpretability is definitely not what I mean by "mathematical AI safety," though I agree there is math involved. And I think the work on goal misgeneralization was conceptualized in ways directly related to Goodhart, and this type of problem inspired a number of research projects, including quantilizers, which is certainly agent-foundations work. I'll point here for more places the agents foundations people think it is relevant.

A Defense of Work on Mathematical AI Safety

carboniferous_umbraculum 2y43

Strongly upvoted.

I roughly think that a few examples showing that this statement is true will 100% make OP's case. And that without such examples, it's very easy to remain skeptical.

Brief summary of ai-plans.com

ARC is hiring theoretical researchers

Currently, it takes a very long time to get an understanding of who is doing what in the field of AI Alignment and how good each plan is, what the problems are, etc.

Is this not ~normal for a field that it maturing? And by normal I also mean approximately unavoidable or 'essential'. Like I could say 'it sure takes a long time to get an understanding of who is doing what in the field of... computer science', but I have no reason to believe that I can substantially 'fix' this situation in the space of a few months. It just really is because there is lot... (read more)

6Seth Herd2y

Sure, but that's no reason not to try to make it easier!

2Iknownothing2y

Thank you, I think there's an error in my phrasing. I should have said: Currently, it takes a very long time to get an idea of who is doing what in the field of AI Alignment and how good each plan is, what the problems are, etc.

1Iknownothing2y

not just that. It's because the field isn't organized at all.

carboniferous_umbraculum 2yΩ234213

I think that perhaps as a result of a balance of pros and cons, I initially was not very motivated to comment (and haven't been very motivated to engage much with ARC's recent work). But I decided maybe it's best to comment in a way that gives a better signal than silence.

I've generally been pretty confused about Formalizing the presumption of Independence and, as the post sort of implies, this is sort of the main advert that ARC have at the moment for the type of conceptual work that they are doing, so most of what I have to say is meta stuff ... (read more)

paulfchristiano2yΩ6100

I think this is a reasonable perception and opinion. We’ve written a little bit about how heuristic estimators might help with ELK (MAD and ELK and finding gliders), but that writing is not particularly clear and doesn’t present a complete picture.

We’ve mostly been focused on finding heuristic estimators, because I am fairly convinced they would be helpful and think that designing them is our key technical risk. But now that we are hiring again I think it’s important for us to explain publicly why they would be valuable, and to generally motivate and situa... (read more)

1Quinn2y

I can't say anything rigorous, sophisticated, or credible. I can just say that the paper was a very welcome spigot of energy and optimism in my own model of why "formal verification" -style assurances and QA demands are ill-suited to models (either behavioral evals or reasoning about the output of decompilers).

Erik Jenner2y102

Have you seen https://www.alignment.org/blog/mechanistic-anomaly-detection-and-elk/ and any of the other recent posts on https://www.alignment.org/blog/? I don't think they make it obvious that formalizing the presumption of independence would lead to alignment solutions, but they do give a much more detailed explanation of why you might hope so than the paper.

carboniferous_umbraculum 2y60

How exactly can an org like this help solve (what many people see as one of the main bottlenecks:) the issue of mentorship? How would Catalyze actually tip the scales when it comes to 'mentor matching'?

(e.g. see Richard Ngo's first high-level point in this career advice post)

Short Remark on the (subjective) mathematical 'naturalness' of the Nanda--Lieberum addition modulo 113 algorithm

carboniferous_umbraculum 2y1510

Hi Garrett,

OK so just being completely honest, I don't know if it's just me but I'm getting a slightly weird or snarky vibe from this comment? I guess I will assume there is a good faith underlying point being made to which I can reply. So just to be clear:

I did not use any words such as "trivial", "obvious" or "simple". Stories like the one you recount are obviously making fun of mathematicians, some of whom do think its cool to say things are trivial/simple/obvious after they understand them. I often strongly disagree and generally dislike this beh

... (read more)

4Garrett Baker2y

Sorry about that. On a re-read, I can see how the comment could be seen as snarky, but I was going more for critical via illustrative analogy. Oh the perils of the lack of inflection and facial expressions. I think your criticisms of my thought in the above comment are right-on, and you've changed my mind on how useful your post was. I do think that lots of progress can be made in understanding stuff by just finding the right frame by which the result seems natural, and your post is doing this. Thanks!

'Fundamental' vs 'applied' mechanistic interpretability research

carboniferous_umbraculum 2y41

Interesting thoughts!

It reminds me (not only of my own writing on a similar theme) but of another one of these viewpoints/axes along which to carve interpretability work that is mentioned in this post by jylin04:

...a dream for interpretability research would be if we could reverse-engineer our future AI systems into human-understandable code. If we take this dream seriously, it may be helpful to split it into two parts: first understanding what "programming language" an architecture + learning algorithm will end up using at the end of training, and then wh

At the start you write

3. Unnecessarily diluting the field’s epistemics by introducing too many naive or overly deferent viewpoints.

And later Claim 3 is:

Scholars might defer to their mentors and fail to critically analyze important assumptions, decreasing the average epistemic integrity of the field

It seems to me there might be two things being pointed to?

A) Unnecessary dilution: Via too many naive viewpoints;
B) Excessive deference: Perhaps resulting in too few viewpoints or at least no new ones;

And arguably these two things are in tension, in the fol... (read more)

2Ryan Kidd2y

Mentorship is critical to MATS. We generally haven't accepted mentorless scholars because we believe that mentors' accumulated knowledge is extremely useful for bootstrapping strong, original researchers. Let me explain my chain of thought better: 1. A first-order failure mode would be "no one downloads experts' models, and we grow a field of naive, overconfident takes." In this scenario, we have maximized exploration at the cost of accumulated knowledge transmission (and probably useful originality, as novices might make the same basic mistakes). We patch this by creating a mechanism by which scholars are selected for their ability to download mentors' models (and encouraged to do so). 2. A second-order failure mode would be "everyone downloads and defers to mentors' models, and we grow a field of paradigm-locked, non-critical takes." In this scenario, we have maximized the exploitation of existing paradigms at the cost of epistemic diversity or critical analysis. We patch this by creating mechanisms for scholars to critically examine their assumptions and debate with peers.

On Developing a Mathematical Theory of Interpretability

carboniferous_umbraculum 2y40

Hey Joseph, thanks for the substantial reply and the questions!

Why call this a theory of interpretability as opposed to a theory of neural networks?

Yeah this is something I am unsure about myself (I wrote: "something that I'm clumsily thinking of as 'the mathematics of (the interpretability of) deep learning-based AI'"). But I think I was imagining that a 'theory of neural networks' would be definitely broader than what I have in mind as being useful for not-kill-everyoneism. I suppose I imagine it including lots of things that are intere... (read more)

1Joseph Bloom2y

Thanks Spencer! I'd love to respond in detail but alas, I lack the time at the moment. Some quick points: 1. I'm also really excited about SLT work. I'm curious to what degree there's value in looking at toy models (such as Neel's grokking work) and exploring them via SLT or to what extent reasoning in SLT might be reinvigorated by integrating experimental ideas/methodology from MI (such as progress measures). It feels plausible to me that there just haven't been enough people in any of a number of intersections look at stuff and this is a good example. Not sure if you're planning on going to this: https://www.lesswrong.com/posts/HtxLbGvD7htCybLmZ/singularities-against-the-singularity-announcing-workshop-on but it's probably not in the cards for me. I'm wondering if promoting it to people with MI experience could be good. 2. I totally get what you're saying about toy model in sense A or B doesn't necessarily equate to a toy model being a version of the hard part of the problem. This explanation helped a lot, thank you! 3. I hear what you are saying about next steps being challenging for logistical and coordination issues and because the problem is just really hard! I guess the recourse we have is something like: Look for opportunities/chances that might justify giving something like this more attention or coordination. I'm also wondering if there might be ways of dramatically lowering the bar for doing work in related areas (eg: the same way Neel writing TransformerLens got a lot more people into MI). Looking forward to more discussions on this in the future, all the best!

Misgeneralization as a misnomer

carboniferous_umbraculum 2y*30

I spent some time trying to formulate a good response to this that analyzed the distinction between (1) and (2) (in particular how it may map onto types of pseudo alignment described in RFLO here) but (and hopefully this doesn't sound too glib) it started to seem like it genuinely mattered whether humans in separate individual heavily-defended cells being pumped full of opiates have in fact been made to be 'happy' or not?

I think because if so, it is at least some evidence that the pseudo-alignment during training is for instrumental reasons (i.e. maybe it ... (read more)

Beren's "Deconfusing Direct vs Amortised Optimisation"