LESSWRONG
LW

All of Kaarel's Comments + Replies

make humans (who are) better at thinking (imo maybe like continuing this way $\approx$ forever, not until humans can "solve AI alignment")
think well. do math, philosophy, etc.. learn stuff. become better at thinking
live a good life

A few quick observations (each with like $90 %$ confidence; I won't provide detailed arguments atm, but feel free to LW-msg me for more details):

Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times.
The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass

... (read more)

Jesse Hoogland's Shortform

Kaarel10d64

I think AlphaProof is pretty far from being just RL from scratch:

they use a pretrained language model; I think the model is trained on human math in particular ( https://archive.is/Cwngq#selection-1257.0-1272.0:~:text=Dr. Hubert’s team,frequency was reduced. )
do we have good reason to think they didn't specifically train it on human lean proofs? it seems plausible to me that they did but idk
the curriculum of human problems teaches it human tricks
lean sorta "knows" a bunch of human tricks

We could argue about whether AlphaProof "is mostly human imitat... (read more)

2Jesse Hoogland9d

Okay, great, then we just have to wait a year for AlphaProofZero to get a perfect score on the IMO.

Views on when AGI comes and on strategy to reduce existential risk

Kaarel23d30

I didn't express this clearly, but yea I meant no pretraining on human text at all, and also nothing computer-generated which "uses human mathematical ideas" (beyond what is in base ZFC), but I'd probably allow something like the synthetic data generation used for AlphaGeometry (Fig. 3) except in base ZFC and giving away very little human math inside the deduction engine. I agree this would be very crazy to see. The version with pretraining on non-mathy text is also interesting and would still be totally crazy to see. I agree it would probably imply your "... (read more)

TsviBT23d101

I'd probably allow something like the synthetic data generation used for AlphaGeometry (Fig. 3) except in base ZFC and giving away very little human math inside the deduction engine

IIUC yeah, that definitely seems fair; I'd probably also allow various other substantial "quasi-mathematical meta-ideas" to seep in, e.g. other tricks for self-generating a curriculum of training data.

But I wouldn't be surprised if like >20% of the people on LW who think A[G/S]I happens in like 2-3 years thought that my thing could totally happen in 2025 if the labs were

Kaarel23d10

¿ thoughts on the following:

solving >95% of IMO problems while never seeing any human proofs, problems, or math libraries (before being given IMO problems in base ZFC at test time). like alphaproof except not starting from a pretrained language model and without having a curriculum of human problems and in base ZFC with no given libraries (instead of being in lean), and getting to IMO combos

3TsviBT23d

(I'm not sure whether I'm supposed to nitpick. If I were nitpicking I'd ask things like: Wait are you allowing it to see preexisting computer-generated proofs? What counts as computer generated? Are you allowing it to see the parts of papers where humans state and discuss propositions and just cutting out the proofs? Is this system somehow trained on a giant human text corpus, but just without the math proofs?) But if you mean basically "the AI has no access to human math content except a minimal game environment of formal logic, plus whatever abstract priors seep in via the training algorithm+prior, plus whatever general thinking patterns in [human text that's definitely not mathy, e.g. blog post about apricots]", then yeah, this would be really crazy to see. My points are trying to be, not minimally hard, but at least easier-ish in some sense. Your thing seems significantly harder (though nicely much more operationalized); I think it'd probably imply my "come up with interesting math concepts"? (Note that I would not necessary say the same thing if it was >25% of IMO problems; there I'd be significantly more unsure, and would defer to you / Sam, or someone who has a sense for the complexity of the full proofs there and the canonicalness of the necessary lemmas and so on.)

Deep Learning is cheap Solomonoff induction?

Kaarel2mo*40

some afaik-open problems relating to bridging parametrized bayes with sth like solomonoff induction

I think that for each NN architecture+prior+task/loss, conditioning the initialization prior on train data (or doing some other bayesian thing) is typically basically a completely different learning algorithm than (S)GD-learning, because local learning is a very different thing, which is one reason I doubt the story in the slides as an explanation of generalization in deep learning^[1].^[2] But setting this aside (though I will touch on it again briefly in the... (read more)

mesaoptimizer's Shortform

Kaarel3mo30

you say "Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments." and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for "too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments"

i feel like that post and that statement are in contradiction/tension or at best orthogonal

3Mateusz Bagiński3mo

I think Mesa is saying something like "The missing pieces are too alien for us to expect to discover them by thinking/theorizing but we'll brute-force the AI into finding/growing those missing pieces by dumping more compute into it anyway." and Tsvi's koan post is meant to illustrate how difficult it would be to think oneself into those missing pieces.

Raemon's Shortform

Kaarel3mo2-1

there's imo probably not any (even-nearly-implementable) ceiling for basically any rich (thinking-)skill at all^[1] — no cognitive system will ever be well-thought-of as getting close to a ceiling at such a skill — it's always possible to do any rich skill very much better (I mean these things for finite minds in general, but also when restricting the scope to current humans)

(that said, (1) of course, it is common for people to become better at particular skills up to some time and to become worse later, but i think this has nothing to do with having r... (read more)

kh's Shortform

Kaarel3mo330

a few thoughts on hyperparams for a better learning theory (for understanding what happens when a neural net is trained with gradient descent)

Having found myself repeating the same points/claims in various conversations about what NN learning is like (especially around singular learning theory), I figured it's worth writing some of them down. My typical confidence in a claim below is like 95%^[1]. I'm not claiming anything here is significantly novel. The claims/points:

local learning (eg gradient descent) strongly does not find global optima. insofar as

... (read more)

9Alexander Gietelink Oldenziel3mo

Simon-Pepin Lehalleur weighs in on the DevInterp Discord: I think his overall position requires taking degeneracies seriously: he seems to be claiming that there is a lot of path dependency in weight space, but very little in function space 😄 In general his position seems broadly compatible with DevInterp: * models learn circuits/algorithmic structure incrementally * the development of structures is controlled by loss landscape geometry * and also possibly in more complicated cases by the landscapes of "effective losses" corresponding to subcircuits... This perspective certainly is incompatible with a naive SGD = Bayes = Watanabe's global SLT learning process, but I don't think anyone has (ever? for a long time?) made that claim for non toy models. It seems that the difference with DevInterp is that * we are more optimistic that it is possible to understand which geometric observables of the landscape control the incremental development of circuits * we expect, based on local SLT considerations, that those observables have to do with the singularity theory of the loss and also of sub/effective losses, with the LLC being the most important but not the only one * we dream that it is possible to bootstrap this to a full fledged S4 correspondence, or at least to get as close as we can. Ok, no pb. You can also add the following : I am sympathetic but also unsatisfied with a strong empiricist position about deep learning. It seems to me that it is based on a slightly misapplied physical, and specifically thermodynamical intuition. Namely that we can just observe a neural network and see/easily guess what the relevant "thermodynamic variables" of the system. For ordinary 3d physical systems, we tend to know or easily discover those thermodynamic variables through simple interactions/observations. But a neural network is an extremely high-dimensional system which we can only "observe" through mathematical tools. The loss is clearly one such thermodynamic var

2Alexander Gietelink Oldenziel3mo

I'd be curious about hearing your intuition re " i'm further guessing that most structures basically have 'one way' to descend into them"

leogao's Shortform

Kaarel4mo*30

a thing i think is probably happening and significant in such cases: developing good 'concepts/ideas' to handle a problem, 'getting a feel for what's going on in a (conceptual) situation'

a plausibly analogous thing in humanity(-seen-as-a-single-thinker): humanity states a conjecture in mathematics, spends centuries playing around with related things (tho paying some attention to that conjecture), building up mathematical machinery/understanding, until a proof of the conjecture almost just falls out of the machinery/understanding

Why I’m not a Bayesian

Kaarel4mo187

I find it surprising/confusing/confused/jarring that you speak of models-in-the-sense-of-mathematical-logic=:L-models as the same thing as (or as a precise version of) models-as-conceptions-of-situations=:C-models. To explain why these look to me like two pretty much entirely distinct meanings of the word 'model', let me start by giving some first brushes of a picture of C-models. When one employs a C-model, one likens a situation/object/etc of interest to a situation/object/etc that is already understood (perhaps a mathematical/abstract one), that one exp... (read more)

Momentum of Light in Glass

Kaarel4mo*60

3Ben4mo

I think the point about angular momentum is a very good way of gesturing at how its possibly different. Angular momentum is conserved, but an isolated system can still rotate itself, by spinning up and then stopping a flywheel (moving the "center of rotation"). Thank for finding that book and screenshot. Equation 12.72 is directly claiming that momentum is proportional to energy flow (and in the same direction). I am very curious how that intersects with claims common in metamaterials (https://journals.aps.org/pra/abstract/10.1103/PhysRevA.75.053810 ) that the two can flow in opposite directions.

Momentum of Light in Glass

Kaarel4mo40

here's a picture from https://hansandcassady.org/David%20J.%20Griffiths-Introduction%20to%20Electrodynamics-Addison-Wesley%20(2012).pdf :

Given 12.72, uniform motion of the center of energy is equivalent to conservation of momentum, right? P is const <=> dR_e/dt is const.

(I'm guessing 12.72 is in fact correct here, but I guess we can doubt it — I haven't thought much about how to prove it when fields and relativistic and quantum things are involved. From a cursory look at his comment, Lubos Motl seems to consider it invalid lol ( in https://physics.st... (read more)

3Ben4mo

In my post the way I cited Lubos Motl's comment implicitly rounded it off to "Minkowski is just right" (option [6]), which is indeed his headline and emphasis. But if we are zooming in on him I should admit that his full position is a little more nuanced. My understanding is that he makes 3 points: (1) - Option [1] is correct. (Abraham gives kinetic momentum, Minkowski the canonical momentum) (2) - In his opinion the kinetic momentum is pointless and gross, and that true physics only concerns itself with the canonical momentum. (3) - As a result of the kinetic momentum being worthless its basically correct to say Minkowski was "just right"(option [6]). This means that the paper proposing option [1] was a waste of time (much ado about nothing), because the difference between believing [1] and believing [6] only matters when doing kinetics, which he doesn't care about. Finally, having decided that Minkowski was correct in the only way that he thinks matters, he goes off into a nasty side-thing about how Abraham was supposedly incompetent. So his actual position is sort of [1] and [6] at the same time (because he considers the difference between them inconsequential, as it only applies to kinetics). If he leans more on the [1] side he can consider 12.72 to be valid. But why would he bother? 12.72 is saying something about kinetics, it might as well be invalid. He doesn't care either way. He goes on to explicitly say that he thinks 12.72 is invalid. Although I think his logic on this is flawed. He says the glass block breaks the symmetry, which is true for the photon. However, the composite system (photon + glass block) still has translation and boost symmetry, and it is the uniform motion of the center of mass of the composite system that is at stake.

6Kaarel4mo

That said, the hypothetical you give is cool and I agree the two principles decouple there! (I intuitively want to save that case by saying the COM is only stationary in a covering space where the train has in fact moved a bunch by the time it stops, but idk how to make this make sense for a different arrangement of portals.) I guess another thing that seems a bit compelling for the two decoupling is that conservation of angular momentum is analogous to conservation of momentum but there's no angular analogue to the center of mass (that's rotating uniformly, anyway). I guess another thing that's a bit compelling is that there's no nice notion of a center of energy once we view spacetime as being curved ( https://physics.stackexchange.com/a/269273 ). I think I've become convinced that conservation of momentum is a significantly bigger principle :). But still, the two seem equivalent to me before one gets to general relativity. (I guess this actually depends a bit on what the proof of 12.72 is like — in particular, if that proof basically uses the conservation of momentum, then I'd be more happy to say that the two aren't equivalent already for relativity/fields.)

Momentum of Light in Glass

Kaarel4mo10

The microscopic picture that Mark Mitchison gives in the comments to this answer seems pretty: https://physics.stackexchange.com/a/44533 — though idk if I trust it. The picture seems to be to think of glass as being sparse, with the photon mostly just moving with its vacuum velocity and momentum, but with a sorta-collision between the photon and an electron happening every once in a while. I guess each collision somehow takes a certain amount of time but leaves the photon unchanged otherwise, and presumably bumps that single electron a tiny bit to the righ... (read more)

2Ben4mo

I presented the redshift calculation in terms of a single photon, but actually, the exact same derivation goes through unchanged if you replace every instance of ℏω0 with E0 and ℏω with E . Where E0 and E are the energy of a light pulse before and after it enters the glass. There is no need to specify whether the light pulse is a single photon a big flash of classical light or anything else. Something linear in the distance travelled would not be a cumulatively increasing red shift, but instead an increasing loss of amplitude (essentially a higher cumulative probability of being absorbed). This is represented using a complex valued refractive index (or dielectric constant) where the real part is how much the wave slows down and the imaginary part is how much it attenuates per distance. There is no reason in principle why the losses cannot be arbitrarily close to zero at the wavelength we are using. (Interestingly, the losses have to be nonzero at some wavelength due to something called the Kramers Kronig relation, but we can assume they are negligible at our wavelength).

Momentum of Light in Glass

Kaarel4mo40

Anyway, in your argument for the redshift as the photon enters the block, I worry about the following:

can we really think of 1 photon entering the block becoming 1 photon inside the block, as opposed to needing to think about some wave thing that might translate to photons in some other way or maybe not translate to ordinary photons at all inside the materi

... (read more)

1Kaarel4mo

Momentum of Light in Glass

Kaarel4mo10

re redshift: Sorry, I should have been clearer, but I meant to talk about redshift (or another kind of energy loss) of the light that comes out of the block on the right compared to the light that went in from the left, which would cause issues with going from there being a uniformly-moving stationary center of mass to the conclusion about the location of the block. (I'm guessing you were right when you assumed in your argument that redshift is 0 for our purposes, but I don't understand light in materials well enough atm to see this at a glance atm.)

4Kaarel4mo

And the loss mechanism I was imagining was more like something linear in the distance traveled, like causing electrons to oscillate but not completely elastically wrt the 'photon' inside the material. Anyway, in your argument for the redshift as the photon enters the block, I worry about the following: 1. can we really think of 1 photon entering the block becoming 1 photon inside the block, as opposed to needing to think about some wave thing that might translate to photons in some other way or maybe not translate to ordinary photons at all inside the material (this is also my second worry from earlier)? 2. do we know that this photon-inside-the-material has energy ℏω?

Momentum of Light in Glass

Kaarel4mo15

Note however, that the principle being broken (uniform motion of centre of mass) is not at all one of the "big principles" of physics, especially not with the extra step of converting the photon energy to mass. I had not previously heard of the principle, and don't think it is anywhere near the weight class of things like momentum conservation.

I found these sentences surprising. To me, the COM moving at constant velocity (in an inertial frame) is Newton's first law, which is one of the big principles (and I also have a mental equality between that an... (read more)

4Ben4mo

I consider momentum conservation a "big principle.", and Newtons 3 laws indeed set out momentum conservation. However, I believe uniform centre of mass motion to be an importantly distinct principle. The drive loop thing would conserve momentum even if it were possible. Indeed momentum conservation is the principle underpinning the assumed reaction forces that make it work in the first place. To take a different example, if you had a pair of portals (like from the game "portal") on board your spaceship, and ran a train between them, you could drive the train backwards, propelling your ship forwards, and thereby move while conserving total momentum, only to later put the train's breaks on and stop. I am not asking you to believe in portals, I am just trying to motivate that weird hypotheticals can be cooked up where the principle of momentum conservation decouples from the principle of uniform centre of mass motion. The two are distinct principles. Abraham supporters do indeed think you can use conservation of momentum to work out which way the glass block moves in that thought experiment, showing that (because the photon momentum goes down) the block must move to the right. Minkowksi supporters also think you can use conservation of momentum to work out how the glass block moves, but because they think the photon momentum goes up the block must move to the left. The thing that is at issue is the question of what expression to use to calculate the momentum, both sides agree that whatever the momentum is it is conserved. As a side point, a photon has nonzero momentum in all reference frames, and that is not an aspect of relativity that is sensibly ignored. You are actually correct that the photon does have to red-shift very slightly as it enters the glass block. If the glass was initially at rest, then after the photon has entered the photon has either gained or lost momentum (depending on Abraham or Minkowski), in either case imparting the momentum difference onto

DanielFilan's Shortform Feed

Kaarel5mo111

It additionally seems likely to me that we are presently missing major parts of a decent language for talking about minds/models, and developing such a language requires (and would constitute) significant philosophical progress. There are ways to 'understand the algorithm a model is' that are highly insufficient/inadequate for doing what we want to do in alignment — for instance, even if one gets from where interpretability is currently to being able to replace a neural net by a somewhat smaller boolean (or whatever) circuit and is thus able to transl... (read more)

Alexander Gietelink Oldenziel's Shortform

Kaarel5mo90

Confusion #2: Why couldn't we make similar counting arguments for Turing machines?

I guess a central issue with separating NP from P with a counting argument is that (roughly speaking) there are equally many problems in NP and P. Each problem in NP has a polynomial-time verifier, so we can index the problems in NP by polytime algorithms, just like the problems in P.

in a bit more detail: We could try to use a counting argument to show that there is some problem with a (say) $< n^{2}$ time verifier which does not have any (say) $< n^{1000}$ time solver. To do th... (read more)

3Alexander Gietelink Oldenziel5mo

Thank you Kaarel - this the kind of answer I was after.

What are your cruxes for imprecise probabilities / decision rules?

Kaarel6mo10

To clarify, I think in this context I've only said that the claim "The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret" (and maybe the claim after it) was "false/nonsense" — in particular, because it doesn't make sense to talk about a distribution that induces maximum regret (without reference to a particular action) — which I'm guessing you agree with.

I wanted to say that I endorse the following:

Neither of the two decision rules you mentioned is (in general

Kaarel7mo10

Oh ok yea that's a nice setup and I think I know how to prove that claim — the convex optimization argument I mentioned should give that. I still endorse the branch of my previous comment that comes after considering roughly that option though:

That said, if we conceive of the decision rule as picking out a single action to perform, then because the decision rule at least takes Pareto improvements, I think a convex optimization argument says that the single action it picks is indeed the maximal EV one according to some distribution ~~(though not necess~~

... (read more)

1Anthony DiGiovanni6mo

The branch that's about sequential decision-making, you mean? I'm unconvinced by this too, see e.g. here — I'd appreciate more explicit arguments for this being "nonsense."

What are your cruxes for imprecise probabilities / decision rules?

Kaarel7mo10

Sorry, I feel like the point I wanted to make with my original bullet point is somewhat vaguer/different than what you're responding to. Let me try to clarify what I wanted to do with that argument with a caricatured version of the present argument-branch from my point of view:

your original question (caricatured): "The Sun prayer decision rule is as follows: you pray to the Sun; this makes a certain set of actions seem auspicious to you. Why not endorse the Sun prayer decision rule?"

my bullet point: "Bayesian expected utility maximization has this big red ... (read more)

1Anthony DiGiovanni6mo

I don't really understand your point, sorry. "Big red arrows towards X" only are a problem for doing Y if (1) they tell me that doing Y is inconsistent with doing [the form of X that's necessary to avoid leaving value on the table]. And these arrows aren't action-guiding for me unless (2) they tell me which particular variant of X to do. I've argued that there is no sense in which either (1) or (2) is true. Further, I think there are various big green arrows towards Y, as sketched in the SEP article and Mogensen paper I linked in the OP, though I understand if these aren't fully satisfying positive arguments. (I tentatively plan to write such positive arguments up elsewhere.) I'm just not swayed by vibes-level "arrows" if there isn't an argument that my approach is leaving value on the table by my lights, or that you have a particular approach that doesn't do so.

1Anthony DiGiovanni7mo

Oops sorry, my claim had the implicit assumptions that (1) your representor includes all the convex combinations, and (2) you can use mixed strategies. ((2) is standard in decision theory, and I think (1) is a reasonable assumption — if I feel clueless as to how much I endorse distribution p vs distribution q, it seems weird for me to still be confident that I don't endorse a mixture of the two.) If those assumptions hold, I think you can show that the max-regret-minimizing action maximizes EV w.r.t. some distribution in your representor. I don't have a proof on hand but would welcome counterexamples. In your example, you can check that either the uniformly fine action does best on a mixture distribution, or a mix of the other actions does best (lmk if spelling this out would be helpful).

What are your cruxes for imprecise probabilities / decision rules?

Kaarel7mo10

But the CCT only says that if you satisfy [blah], your policy is consistent with precise EV maximization. This doesn't imply your policy is inconsistent with Maximality, nor (as far as I know) does it tell you what distribution with respect to which you should maximize precise EV in order to satisfy [blah] (or even that such a distribution is unique). So I don’t see a positive case here for precise EV maximization [ETA: as a procedure to guide your decisions, that is]. (This is my also response to your remark below about “equivalent to "act consistently w

... (read more)

1Anthony DiGiovanni6mo

As an aspiring rational agent, I'm faced with lots of options. What do I do? Ideally I'd like to just be able to say which option is "best" and do that. If I have a complete ordering over the expected utilities of the options, then clearly the best option is the expected utility-maximizing one. If I don't have such a complete ordering, things are messier. I start by ruling out dominated options (as Maximality does). The options in the remaining set are all "permissible" in the sense that I haven't yet found a reason to rule them out. I do of course need to choose an action eventually. But I have some decision-theoretic uncertainty. So, given the time to do so, I want to deliberate about which ways of narrowing down this set of options further seem most reasonable (i.e., satisfy principles of rational choice I find compelling). (Basically I think EU maximization is a special case of “narrow down the permissible set as much as you can via principles of rational choice,[1] then just pick something from whatever remains.” It’s so straightforward in this case that we don’t even recognize we’re identifying a (singleton) “permissible set.”) Now, maybe you'd just want to model this situation like: "For embedded agents, 'deliberation' is just an option like any other. Your revealed strict preference is to deliberate about rational choice." I might be fine with this model.[2] But: * For the purposes of discussing how {the VOI of deliberation about rational choice} compares to {the value of going with our current “best guess” in some sense}, I find it conceptually helpful to think of “choosing to deliberate about rational choice” as qualitatively different from other choices. * The procedure I use to decide to deliberate about rational choice principles is not “I maximize EV w.r.t. some beliefs,” it’s “I see that my permissible set is not a singleton, I want more action-guidance, so I look for more action-guidance.” 1. ^ "Achieve Pareto-efficiency" (as per the

1Anthony DiGiovanni6mo

My claim is that your notion of "utter disaster" presumes that a consequentialist under deep uncertainty has some sense of what to do, such that they don't consider ~everything permissible. This begs the question against severe imprecision. I don't really see why we should expect our pretheoretic intuitions about the verdicts of a value system as weird as impartial longtermist consequentialism, under uncertainty as severe as ours, to be a guide to our epistemics. I agree that intuitively it's a very strange and disturbing verdict that ~everything is permissible! But that seems to be the fault of impartial longtermist consequentialism, not imprecise beliefs.

1Anthony DiGiovanni7mo

No, you have an argument that {anything that cannot be represented after the fact as precise EV maximization, with respect to some utility function and distribution} is bad. This doesn't imply that an agent who maintains imprecise beliefs will do badly. Maybe you're thinking something like: "The CCT says that my policy is guaranteed to be Pareto-efficient iff it maximizes EV w.r.t. some distribution. So even if I don't know which distribution to choose, and even though I'm not guaranteed not to be Pareto-efficient if I follow Maximality, I at least know I don't violate Pareto-efficiency if do precise EV maximization"? If so: I'd say that there are several imprecise decision rules that can be represented after the fact as precise EV max w.r.t. some distributions, so the CCT doesn't rule them out. E.g.: * The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret. * The maximin rule (sec 5.4.1) is equivalent to EV max w.r.t. the most pessimistic distribution. You might say "Then why not just do precise EV max w.r.t. those distributions?" But the whole problem you face as a decision-maker is, how do you decide which distribution? Different distributions recommend different policies. If you endorse precise beliefs, it seems you'll commit to one distribution that you think best represents your epistemic state. Whereas someone with imprecise beliefs will say: "My epistemic state is not represented by just one distribution. I'll evaluate the imprecise decision rules based on which decision-theoretic desiderata they satisfy, then apply the most appealing decision rule (or some way of aggregating them) w.r.t. my imprecise beliefs." If the decision procedure you follow is psychologically equivalent to my previous sentence, then I have no objection to your procedure — I just think it would be misleading to say you endorse precise beliefs in that case.

What are your cruxes for imprecise probabilities / decision rules?

Answer by KaarelAug 02, 2024*7-1

Here are some brief reasons why I dislike things like imprecise probabilities and maximality rules (somewhat strongly stated, medium-strongly held because I've thought a significant amount about this kind of thing, but unfortunately quite sloppily justified in this comment; also, sorry if some things below approach being insufficiently on-topic):

I like the canonical arguments for bayesian expected utility maximization ( https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations ; also https://web.stanford.edu/~hamm

... (read more)

6Anthony DiGiovanni7mo

Thanks for the detailed answer! I won't have time to respond to everything here, but: But the CCT only says that if you satisfy [blah], your policy is consistent with precise EV maximization. This doesn't imply your policy is inconsistent with Maximality, nor (as far as I know) does it tell you what distribution with respect to which you should maximize precise EV in order to satisfy [blah] (or even that such a distribution is unique). So I don’t see a positive case here for precise EV maximization [ETA: as a procedure to guide your decisions, that is]. (This is my also response to your remark below about “equivalent to "act consistently with being an expected utility maximizer".”) Could you expand on this with an example? I don’t follow. Maximality and imprecision don’t make any reference to “default actions,” so I’m confused. I also don’t understand what’s unnatural/unprincipled/confused about permissibility or preferential gaps. They seem quite principled to me: I have a strict preference for taking action A over B (/ B is impermissible) only if I’m justified in beliefs according to which I expect A to do better than B. This is a much longer conversation, but briefly: I think it’s ad hoc / putting the cart before the horse to shape our epistemology to fit our intuitions about what decision guidance we should have.

I found >800 orthogonal "write code" steering vectors

Kaarel7mo*367

I think most of the quantitative claims in the current version of the above comment are false/nonsense/[using terms non-standardly]. (Caveat: I only skimmed the original post.)

"if your first vector has cosine similarity 0.6 with d, then to be orthogonal to the first vector but still high cosine similarity with d, it's easier if you have a larger magnitude"

If by 'cosine similarity' you mean what's usually meant, which I take to be the cosine of the angle between two vectors, then the cosine only depends on the directions of vectors, not their magnitudes... (read more)

1[comment deleted]7mo

7Fabien Roger7mo

You're right, I mixed intuitions and math about the inner product and cosines similarity, which resulted in many errors. I added a disclaimer at the top of my comment. Sorry for my sloppy math, and thank you for pointing it out. I think my math is right if only looking at the inner product between d and theta, not about the cosine similarity. So I think my original intuition still hold.

StefanHex7mo112

Hmm, with that we'd need $δ \leq 0.05$ to get 800 orthogonal vectors.^[1] This seems pretty workable. If we take the MELBO vector magnitude change (7 -> 20) as an indication of how much the cosine similarity changes, then this is consistent with $δ = 0.15$ for the original vector. This seems plausible for a steering vector?

^{^}
Thanks to @Lucius Bushnaq for correcting my earlier wrong number

Formal verification, heuristic explanations and surprise accounting

Kaarel8mo*50

how many times did the explanation just "work out" for no apparent reason

From the examples later in your post, it seems like it might be clearer to say something more like "how many things need to hold about the circuit for the explanation to describe the circuit"? More precisely, I'm objecting to your "how many times" because it could plausibly mean "on how many inputs" which I don't think is what you mean, and I'm objecting to your "for no apparent reason" because I don't see what it would mean for an explanation to hold for a reason in this case.

5Jacob_Hilton8mo

Yes, that's a clearer way of putting it in the case of the circuit in the worked example. The reason I said "for no apparent reason" is that there could be some redundancy in the explanation. For example, if you already had an explanation for the output of some subcircuit, you shouldn't pay additional surprise if you then check the output of that subcircuit in some particular case. But perhaps this was a distracting technicality.

kh's Shortform

Kaarel11mo*20

The Deep Neural Feature Ansatz

@misc{radhakrishnan2023mechanism, title={Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features}, author={Adityanarayanan Radhakrishnan and Daniel Beaglehole and Parthe Pandit and Mikhail Belkin}, year={2023}, url = { https://arxiv.org/pdf/2212.13881.pdf } }

The ansatz from the paper

Let $h_{i} (x) \in R^{k}$ denote the activation vector in layer $i$ on input $x \in R^{d}$ , with the input layer being at index $i = 1$ , so $h_{1} (x) = x$ . Let $W_{i}$ be the weight matrix after activation layer $i$ . Let $f_{i}$ be t... (read more)

kh's Shortform

Kaarel11mo40

A thread into which I'll occasionally post notes on some ML(?) papers I'm reading

I think the world would probably be much better if everyone made a bunch more of their notes public. I intend to occasionally copy some personal notes on ML(?) papers into this thread. While I hope that the notes which I'll end up selecting for being posted here will be of interest to some people, and that people will sometimes comment with their thoughts on the same paper and on my thoughts (please do tell me how I'm wrong, etc.), I expect that the notes here will not be sig... (read more)

2Kaarel11mo

The Deep Neural Feature Ansatz @misc{radhakrishnan2023mechanism, title={Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features}, author={Adityanarayanan Radhakrishnan and Daniel Beaglehole and Parthe Pandit and Mikhail Belkin}, year={2023}, url = { https://arxiv.org/pdf/2212.13881.pdf } } The ansatz from the paper Let hi(x)∈Rk denote the activation vector in layer i on input x∈Rd, with the input layer being at index i=1, so h1(x)=x. Let Wi be the weight matrix after activation layer i. Let fi be the function that maps from the ith activation layer to the output. Then their Deep Neural Feature Ansatz says that WTiWi∝∼1|D|∑x∈D∇fi(hi(x))∇fi(hi(x))T (I'm somewhat confused here about them not mentioning the loss function at all — are they claiming this is reasonable for any reasonable loss function? Maybe just MSE? MSE seems to be the only loss function mentioned in the paper; I think they leave the loss unspecified in a bunch of places though.) A singular vector version of the ansatz Letting Wi=UΣVT be a SVD of Wi, we note that this is equivalent to VΣ2VT∝∼1|D|∑x∈D∇fi(hi(x))∇fi(hi(x))T, i.e., that the eigenvectors of the matrix M on the RHS are the right singular vectors. By the variational characterization of eigenvectors and eigenvalues (Courant-Fischer or whatever), this is the same as saying that right singular vectors of Wi are the highest orthonormal vTMv directions for the matrix M on the RHS. Plugging in the definition of M, this is equivalent to saying that the right singular vectors are the sequence of highest-variance directions of the data set of gradients ∇fi(hi(x)). (I have assumed here that the linearity is precise, whereas really it is approximate. It's probably true though that with some assumptions, the approximate initial statement implies an approximate conclusion too? Getting approx the same vecs out probably requires some assumption about gaps in singular values being big enough, b

Why does generalization work?

Kaarel1y*80

I'd be very interested in a concrete construction of a (mathematical) universe in which, in some reasonable sense that remains to be made precise, two 'orthogonal pattern-universes' (preferably each containing 'agents' or 'sophisticated computational systems') live on 'the same fundamental substrate'. One of the many reasons I'm struggling to make this precise is that I want there to be some condition which meaningfully rules out trivial constructions in which the low-level specification of such a universe can be decomposed into a pair $(s_{1}, s_{2})$ such that $s_{1}$ ... (read more)

2Martín Soto1y

I think that's the right next question! The way I was thinking about it, the mathematical toy model would literally have the structure of microstates and macrostates. What we need is a set of (lawfully, deterministically) evolving microstates in which certain macrostate partitions (macroscopic regularities, like pressure) are statistically maintained throughout the evolution. And then, for my point, we'd need two different macrostate partitions (or sets of macrostate partitions) such that each one is statistically preserved. That is, complex macroscopic patterns it self-replicate (a human tends to stay in the macrostate partition of "the human being alive"). And they are mostly independent (humans can't easily learn about the completely different partition, otherwise they'd already be in the same partition). In the direction of "not making it trivial", I think there's an irresolvable tension. If by "not making it trivial" you mean "s1 and s2 don't obviously look independent to us", then we can get this, but it's pretty arbitrary. I think the true name of "whether s1 and s2 are independent" is "statistical mutual information (of the macrostates)". And then, them being independent is exactly what we're searching for. That is, it wouldn't make sense to ask for "independent pattern-universes coexisting on the same substrate", while at the same time for "the pattern-universes (macrostate partitions) not to be truly independent". I think this successfully captures the fact that my point/realization is, at its heart, trivial. And still, possibly deconfusing about the observer-dependence of world-modelling.

More Hyphenation

Kaarel1y1815

I find [the use of square brackets to show the merge structure of [a linguistic entity that might otherwise be confusing to parse]] delightful :)

leogao1y156

hot take: if you find that your sentences can't be parsed reliably without brackets, that's a sign you should probably refactor your writing to be clearer

8Arjun Panickssery1y

Powerful

Does davidad's uploading moonshot work?

Kaarel1y72

I'd be quite interested in elaboration on getting faster alignment researchers not being alignment-hard — it currently seems likely to me that a research community of unupgraded alignment researchers with a hundred years is capable of solving alignment (conditional on alignment being solvable). (And having faster general researchers, a goal that seems roughly equivalent, is surely alignment-hard (again, conditional on alignment being solvable), because we can then get the researchers to quickly do whatever it is that we could do — e.g., upgrading?)

Eliezer Yudkowsky1y192

I currently guess that a research community of non-upgraded alignment researchers with a hundred years to work, picks out a plausible-sounding non-solution and kills everyone at the end of the hundred years.

AI Regulation May Be More Important Than AI Alignment For Existential Safety

Kaarel1y20

I was just claiming that your description of pivotal acts / of people that support pivotal acts was incorrect in a way that people that think pivotal acts are worth considering would consider very significant and in a way that significantly reduces the power of your argument as applying to what people mean by pivotal acts — I don't see anything in your comment as a response to that claim. I would like it to be a separate discussion whether pivotal acts are a good idea with this in mind.

Now, in this separate discussion: I agree that executing a pivotal act ... (read more)

2dr_s1y

I disagree, I think in many ways the current race already seems motivated by something of the sort - "if I don't get to it first, they will, and they're sure to fuck it up". Though with no apparent planning for pivotal acts in sight (but who knows). Oh, agreed. It's a choice between shitty options all around.

AI Regulation May Be More Important Than AI Alignment For Existential Safety

Kaarel1y30

In this comment, I will be assuming that you intended to talk of "pivotal acts" in the standard (distribution of) sense(s) people use the term — if your comment is better described as using a different definition of "pivotal act", including when "pivotal act" is used by the people in the dialogue you present, then my present comment applies less.

I think that this is a significant mischaracterization of what most (? or definitely at least a substantial fraction of) pivotal activists mean by "pivotal act" (in particular, I think this is a significant mischar... (read more)

5dr_s1y

I don't think this revolutionises my argument. First, there's a lot of talking about example possible pivotal acts and they're mostly just not that believable on their own. The typical "melt all GPUs" is obviously incredibly hostile and disruptive, but yes, of course, it's only an example. The problem is that without an actual outline for what a perfect pivotal act is, you can't even hope to do it with "just" a narrow superintelligence, because in that case, you need to work out the details yourself, and the details are likely horribly complicated. But the core, fundamental problem with the "pivotal act" notion is that it tries to turn a political problem into a technological one. "Do not build AGIs" is fundamentally a political problem: it's about restricting human freedom. Now you can either do that voluntarily, by consensus, with some enforcement mechanism for the majority to impose its will on the minority, or you can do that by force, with a minority using overwhelming power to make the majority go along even against their will. That's it. A pivotal act is just a nice name for the latter thing. The essence of the notion is "we can't get everyone on board quickly enough; therefore, we should just build some kind of superweapon that allows us to stop everyone else from building unsafe AGI as we define it, whether they like it or not". It's not a lethal weapon, and you can argue the utilitarian trade-off from your viewpoint is quite good, but it is undeniably a weapon. And therefore it's just not something that can be politically acceptable because people don't like to have weapons pointed at them, not even when the person making the weapon assures them it's for their own good. If "pivotal act" became the main paradigm the race dynamics would only intensify because then everyone knows they'll only have one shot and they won't trust the others to either get it right or actually limit themselves to just the pivotal act once they're the only ones with AI power in th

Polysemanticity and Capacity in Neural Networks

Kaarel2yΩ240

A few notes/questions about things that seem like errors in the paper (or maybe I'm confused — anyway, none of this invalidates any conclusions of the paper, but if I'm right or at least justifiably confused, then these do probably significantly hinder reading the paper; I'm partly posting this comment to possibly prevent some readers in the future from wasting a lot of time on the same issues):

1) The formula for $~ y$ here seems incorrect:

This is because W_i is a feature corresponding to the i'th coordinate of x (this is not evident from the screen... (read more)

3Buck2y

Thanks for this careful review! And sorry for wasting your time with these, assuming you're right. We'll hopefully look into this at some point soon.

Question for Prediction Market people: where is the money supposed to come from?

Kaarel2y1-2

At least ignoring legislation, an exchange could offer a contract with the same return as S&P 500 (for the aggregate of a pair of traders entering a Kalshi-style event contract); mechanistically, this index-tracking could be supported by just using the money put into a prediction market to buy VOO and selling when the market settles. (I think.)

kh's Shortform

Kaarel2y*40

An attempt at a specification of virtue ethics

I will be appropriating terminology from the Waluigi post. I hereby put forward the hypothesis that virtue ethics endorses an action iff it is what the better one of Luigi and Waluigi would do, where Luigi and Waluigi are the ones given by the posterior semiotic measure in the given situation, and "better" is defined according to what some [possibly vaguely specified] consequentialist theory thinks about the long-term expected effects of this particular Luigi vs the long-term effects of this particular Waluigi... (read more)

kh's Shortform

Kaarel2y30

A small observation about the AI arms race in conditions of good infosec and collaboration

Suppose we are in a world where most top AI capabilities organizations are refraining from publishing their work (this could be the case because of safety concerns, or because of profit motives) + have strong infosec which prevents them from leaking insights about capabilities in other ways. In this world, it seems sort of plausible that the union of the capabilities insights of people at top labs would allow one to train significantly more capable models than the in... (read more)

How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Kaarel2y21

First, suppose GPT-n literally just has a “what a human would say” feature and a “what do I [as GPT-n] actually believe” feature, and those are the only two consistently useful truth-like features that it represents, and that using our method we can find both of them. This means we literally only need one more bit of information to identify the model’s beliefs.
One difference between “what a human would say” and “what GPT-n believes” is that humans will know less than GPT-n. In particular, there should be hard inputs that only a superhuman model

... (read more)

3Bogdan Ionut Cirstea2y

It might be useful to have a look at Language models show human-like content effects on reasoning, they empirically test for human-like incoherences / biases in LMs performing some logical reasoning tasks (twitter summary thread; video presentation)

Finite Factored Sets in Pictures

Kaarel2y51

I think $W$ does not have to be a variable which we can observe, i.e. it is not necessarily the case that we can deterministically infer the value of $W$ from the values of $X$ and $Y$ . For example, let's say the two binary variables we observe are $X = [whether smoke is coming out of the kitchen window of a given house]$ and $Y = [whether screams are emanating from the house]$ . We'd intuitively want to consider a causal model where $W = [whether the house is on fire]$ is causing both, but in a way that makes all triples of variable values have nonzero probability (which is t... (read more)

2Magdalena Wache2y

I see! You are right, then my argument wasn't correct! I edited the post partially based on your argument above. New version:

2David Johnston2y

I basically agree with this: ruling out unobserved variables is an unusual way to use causal graphical models. Also, taking the set of variables that are allowed to be in the graph to be the set of variables defined on a given sample space makes the notion of "intervention" more difficult to parse (what happens to F:=(X,Y) after you intervene on X?), though it might be possible with cyclic causal relationships. So basically, "causal variables" in acyclic graphical models are neither a subset nor a superset of observed random variables.

Finite Factored Sets in Pictures

Kaarel2y*41

I agree with you regarding 0 lebesgue. My impression is that the Pearl paradigm has some [statistics -> causal graph] inference rules which basically do the job of ruling out causal graphs for which having certain properties seen in the data has 0 lebesgue measure. (The inference from two variables being independent to them having no common ancestors in the underlying causal graph, stated earlier in the post, is also of this kind.) So I think it's correct to say "X has to cause Y", where this is understood as a valid inference inside the Pearl (or Garra... (read more)

Finite Factored Sets in Pictures

Kaarel2y61

I don't understand why 1 is true – in general, couldn't the variable $W$ be defined on a more refined sample space? Also, I think all $4$ conditions are technically satisfied if you set $W=X$ (or well, maybe it's better to think of it as a copy of $X$).

I think the following argument works though. Note that the distribution of $X$ given $(Z,Y,W)$ is just the deterministic distribution $X=Y \xor Z$ (this follows from the definition of Z). By the structure of the causal graph, the distribution of $X$ given $(Z,Y,W)$ must be the same as the distribution of $X$... (read more)

6David Johnston2y

I agree that 1. is unjustified (and would cause lots of problems for graphical causal models if it was). Further, I’m pretty sure the result is not “X has to cause Y” but “this distribution has measure 0 WRT lebesgue in models where X does not cause Y” (and deterministic relationships satisfy this) Finally, you can enable markdown comments on account settings (I believe)

Why bet Kelly?

Kaarel2y62

I took the main point of the post to be that there are fairly general conditions (on the utility function and on the bets you are offered) in which you should place each bet like your utility is linear, and fairly general conditions in which you should place each bet like your utility is logarithmic. In particular, the conditions are much weaker than your utility actually being linear, or than your utility actually being logarithmic, respectively, and I think this is a cool point. I don't see the post as saying anything beyond what's implied by this about Kelly betting vs max-linear-EV betting in general.

Quantum Suicide and Aumann's Agreement Theorem

Kaarel2y10

(By the way, I'm pretty sure the position I outline is compatible with changing usual forecasting procedures in the presence of observer selection effects, in cases where secondary evidence which does not kill us is available. E.g. one can probably still justify [looking at the base rate of near misses to understand the probability of nuclear war instead of relying solely on the observed rate of nuclear war itself].)

Quantum Suicide and Aumann's Agreement Theorem

Kaarel2y*10

I'm inside-view fairly confident that Bob should be putting a probability of 0.01% on surviving conditional on many worlds being true, but it seems possible I'm missing some crucial considerations having to do with observer selection stuff in general, so I'll phrase the rest of this as more of a question.

What's wrong with saying that Bob should put a probability of 0.01% of surviving conditional on many-worlds being true – doesn't this just follow from the usual way that a many-worlder would put probabilities on things, or at least the simplest way for doi... (read more)

1Kaarel2y

Superintelligent AI is necessary for an amazing future, but far from sufficient

Kaarel2y30

A big chunk of my uncertainty about whether at least 95% of the future’s potential value is realized comes from uncertainty about "the order of magnitude at which utility is bounded". That is, if unbounded total utilitarianism is roughly true, I think there is a <1% chance in any of these scenarios that >95% of the future's potential value would be realized. If decreasing marginal returns in the [amount of hedonium -> utility] conversion kick in fast enough for 10^20 slightly conscious humans on heroin for a million years to yield 95% of max utili... (read more)

Possible miracles

Kaarel2y20

Great post, thanks for writing this! In the version of "Alignment might be easier than we expect" in my head, I also have the following:

Value might not be that fragile. We might "get sufficiently many bits in the value specification right" sort of by default to have an imperfect but still really valuable future.
- For instance, maybe IRL would just learn something close enough to pCEV-utility from human behavior, and then training an agent with that as the reward would make it close enough to a human-value-maximizer. We'd get some misalignment on both steps (

Kaarel2y*10

I still disagree / am confused. If it's indeed the case that $emb (chocolate ice cream and vanilla ice cream) \neq emb (chocolate ice cream) + emb (vanilla ice cream)$ , then why would we expect $u (emb (chocolate ice cream and vanilla ice cream)) = u (emb (chocolate ice cream)) + u (emb (vanilla ice cream))$ ? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn't it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utilit... (read more)

Continental Philosophy as Undergraduate Mathematics

Kaarel2y20

The link in this sentence is broken for me: "Second, it was proven recently that utilitarianism is the “correct” moral philosophy." Unless this is intentional, I'm curious to know where it directed to.

I don't know of a category-theoretic treatment of Heidegger, but here's one of Hegel: https://ncatlab.org/nlab/show/Science+of+Logic. I think it's mostly due to Urs Schreiber, but I'm not sure – in any case, we can be certain it was written by an Absolute madlad :)

A gentle primer on caring, including in strange senses, with applications

Kaarel2y10

Why should I care about similarities to pCEV when valuing people?

It seems to me that this matters in case your metaethical view is that one should do pCEV, or more generally if you think matching pCEV is evidence of moral correctness. If you don't hold such metaethical views, then I might agree that (at least in the instrumentally rational sense, at least conditional on not holding any metametalevel views that contradict these) you shouldn't care.

> Why is the first example explaining why someone could support taking money from people you value less to g... (read more)

kh's Shortform

Kaarel2y10

I proposed a method for detecting cheating in chess; cross-posting it here in the hopes of maybe getting better feedback than on reddit: https://www.reddit.com/r/chess/comments/xrs31z/a_proposal_for_an_experiment_well_data_analysis/