All of Thane Ruthenis's Comments + Replies

Thane Ruthenis18d6-1

I think this is overall reasonable if you interpret "hard-to-verify" as "substantially harder to verify" and I think this probably how many people would read this by default

Not sure about this. The kind of "hard-to-verify" I care about is e. g. agenty behavior in real-world conditions. I assume many other people are also watching out for that specifically, and that capability researchers are deliberately aiming for it.

And I don't think the proofs are any evidence for that. The issue is that there exists, in principle, a way to easily verify math proofs: by... (read more)

Double's Shortform

Thane Ruthenis18d*61

Singular Learning Theory and Simplex's work (e. g. this), maybe? Cartesian Frames and Finite Factored Sets might also work, but I'm less sure about those.

It's actually pretty hard to come up with agendas in the intersection of "seems like an alignment-relevant topic it'd be useful to popularize" and "has complicated math which would be insightful and useful to visualize/simulate".

Natural abstractions, ARC's ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
Infrabayesianism, O

... (read more)

Thane Ruthenis18d20

what would it even mean to have 10^30 times more shrimp than atoms?

Oh, easy, it just implies you're engaging in acausal trade with a godlike entity residing in some universe dramatically bigger than this one. This interpretation introduces no additional questions or complications whatsoever.

Thane Ruthenis18d20

I just really don't buy the whole "let's add up qualia" as any basis of moral calculation

Same, honestly. To me, many of these thought experiments seem decoupled from anything practically relevant. But it still seems to me that people often do argue from those abstracted-out frames I'd outlined, and these arguments are probably sometimes useful for establishing at least some agreement on ethics. (I'm not sure how a full-complexity godshatter-on-godshatter argument would even look like (a fistfight, maybe?), and am very skeptical it'd yield any useful results.)

Anyway, it sounds like we mostly figured out what the initial drastic disconnect between our views here was caused by?

2habryka18d

Yeah, I think so, though not sure. But I feel good stopping here.

Thane Ruthenis19d70

I agree that this is a thing people often like to invoke, but it feels to me a lot like people talking about billionaires and not noticing the classical crazy arithmetic errors like

Isn't it the opposite? It's a defence against providing too-low numbers, it's specifically to ensure that even infinitesimally small preferences are elicited with certainty.

Bundling up all "this seems like a lot" numbers into the same mental bucket, and then failing to recognize when a real number is not actually as high as in your hypothetical, is certainly an error one could m... (read more)

4habryka18d

I agree probably I implied a bit too much contextualization. Like, I agree the post has a utilitarian bend, but man, I just really don't buy the whole "let's add up qualia" as any basis of moral calculation, that I find attempts at trying to create a "pure qualia shrimp" about as confused and meaningless as trying to argue that 7 bees are more important than a human. "qualia" isn't a thing that exists. The only thing that exists are your values in all of their complexity and godshatteredness. You can't make a "pure qualia shrimp", it doesn't many any philosophical sense, pure qualia isn't real. And I agree that maybe the post was imagining some pure qualia juice, and I don't know, maybe in that case it makes sense to dismiss it by doing a reductio ad absurdum on qualia juice, but I don't currently buy it. I think that both wouldn't be engaging with the good parts of the author, and also be kind of a bad step in the discourse (like, the previous step was understanding why it doesn't make sense for 7 bees to be more important than a human, for a lot of different reasons and very robustly and within that discourse, it's actually quite important to understand why 10^100 shrimp might actually be more important than a human, under at least a lot of reasonable set of assumptions).

Thane Ruthenis19d50

I think there is also a real conversation going on here about whether maybe, even if you isolated each individual shrimp into a tiny pocket universe, and you had no way of ever seeing them or visiting the great shrimp rift (a natural wonder clearly greater than any natural wonder on earth), and all you knew for sure was that it existed somewhere outside of your sphere of causal influence, and the shrimp never did anything more interesting than current alive shrimp, whether it would still be worth it to kill a human

Yeah, that's more what I had in mind. Illu... (read more)

2habryka19d

I agree that this is a thing people often like to invoke, but it feels to me a lot like people talking about billionaires and not noticing the classical crazy arithmetic errors like: Like, in those discussions people are almost always trying to invoke numbers like "$1 trillion" as "a number so big that the force of the conclusion must be inevitable", but like most of the time they just fail because the number isn't big enough. If someone was like "man, are you really that confident that a shrimp does not have morally relevant experience that you wouldn't trade a human for a million shrimp?", my response is "nope, sorry, 1 million isn't big enough, that's just really not that big of a number". But if you give me a number a trillion trillion trillion trillion trillion trillion trillion trillion times bigger, IDK, yeah, that is a much bigger number. And correspondingly, for every thought experiment of this kind, I do think there is often a number that will just rip through your assumptions and your tradeoffs. There are just really very very very big numbers. Like, sure, we all agree our abstraction break here, and I am not confident you can't find any hardening of abstraction that make the tradeoff come out in the direction of the size of the number really absolutely not mattering at all, but I think that would be a violation of the whole point of the exercise. Like, clearly we can agree that we assign a non-zero value to a marginal shrimp. We value that marginal shrimp for a lot of different reasons, but like, you probably value it for reasons that does include things like the richness of its internal experience, and the degree to which it differs from other shrimp, and the degree to which it contributes to an ecosystem, and the degree to which it's an interesting object of trade, and all kinds of reasons. Now, if we want to extrapolate that value to 10^100, those things still are there, we can't just start ignoring them. Like, I would feel more sympathetic t

Thane Ruthenis19d52

One can argue it's meaningless to talk about numbers this big, and while I would dispute that, it's definitely a much more sensible position than trying to take a confident stance to destroy or substantially alter a set of things so large that it vastly eclipses in complexity and volume and mass and energy all that has ever or will ever exist by a trillion-fold.

Okay, while I'm hastily backpedaling from the general claims I made, I am interested in your take on the first half of this post. I think there's a difference between talking about an actual situati... (read more)

4habryka19d

I agree there is something to this, but when actually thinking about tradeoffs that do actually have orders of magnitude of variance in them, which is ultimately where this kind of reasoning is most useful (not 100 orders of magnitude, but you know 30-50 are not unheard of), this kind of abstraction would mostly lead you astray, and so I don't think it's a good norm for how to take thought experiments like this. Like, I agree there are versions of the hypothetical that are too removed, but ultimately, I think a central lesson of scope sensitivity is that having a lot more of something often means drastic qualitative changes that come with that drastic change in quantity. Having 10 flop/s of computation is qualitatively different to having 10^10 flop/s. I can easily imagine someone before the onset of modern computing saying "look, how many numbers do you really need to add in everyday life? What is even the plausible purpose of having 10^10 flop/s available? For what purpose would you need to possibly perform 10 billion operations per second? This just seems completely absurd. Clearly the value of a marginal flop goes to zero long before that. That is more operations than all computers[1] in the world have ever ever done, in all of history, combined. What could possibly be the point of this?" And of course, such a person would be sorely mistaken. And framing the thought experiment as "well, no, I think if you want to take this thought experiment seriously you should think about how much you would be willing to pay for the 10 billionth operation of the kind that you are currently doing, which is clearly zero. I don't want you to hypothesize some kind of new art forms or applications or computing infrastructure or human culture, which feel like they are not the point of this exercise, I want you to think about the marginal item in isolation" would be pointless. It would be emptying the exercise and tradeoff of any of its meaning. If we ever face a choice like this

Thane Ruthenis19d20

No, being extremely overwhelmingly confident about morality such that even if you are given a choice to drastically alter 99.999999999999999999999% of the matter in the universe, you call the side of not destroying it "insane" for not wanting to give up a single human life, a thing we do routinely for much weaker considerations, is insane.

Hm. Okay, so my reasoning there went as follows:

Substitute shrimp for rocks. $10^{100}$ rocks would also be an amount of matter bigger than exists in the observable universe, and we presumably should assign a nonzero

... (read more)

8habryka19d

You should be able to strike out the text manually and get the same-ish effect, or leave a retraction notice. The text being hard to read is intentional so that it really cannot be the case that someone screenshots it or skims it without noticing that it is retracted.

Thane Ruthenis19d*-1-15

Edit: Nevermind, evidently I've not thought this through properly. I'm retracting the below.

~~The naïve formulations of utilitarianism assume that all possible experiences can be mapped to scalar utilities lying on~~ ~~the same,~~ ~~continuous~~ ~~spectrum, and that experiences' utility is additive. I think that's an error.~~

~~This is how we get the frankly insane conclusions like "you should save~~ $10^{100}$ ~~shrimps instead of one human" or~~ ~~everyone's perennial favorite, "if you're choosing between one person getting tortured for 50 years or some amount of people~~ ... (read more)

3quetzal_rainbow19d

We can easily ban speed above 15km/h for any vehicles except ambulances. Nobody starves to death in this scenario, it's just very inconvenient. We value convenience lost in this scenario more than lives lost in our reality, so we don't ban high-speed vehicles. Ordinal preferences are bad and insane and they are to be avoided. What's really wrong with utilitarianism is that you can't, actually, sum utilities: it's a type error, because utilities are invariant up to affine transform, what would their sum mean? The problem, I think, that humans naturally conflate two types of altruism. First type is caring about other entities mental state. Second type is "game-theoretic" or "alignment-theoretic" altruism: generalized notion of what does that mean to care about someone else's values. Roughly, I think that good type of the second type of altruism requires you to do fair bargaining in interests of entity you are being altruistic towards. Let's take "World Z" thought experiment. The problem from the second type altruism perspective is that total utilitarian gets very large utility from this world, while all inhabitants of this world, by premise, get very small utility per person, which is unfair division of gains. One may object: why not create entities who think that very small share of gains is fair? My answer is that if entity can be satisfied with infinitesimal share of gains, it also can be satisfied with infinitesimal share of anthropic measure, i.e., non-existence, and it's more altruistic to look for more demanding entities to fill universe with. My general problem with animal welfare from bargaining perspective is that most of animals probably don't have sufficient agency to have any sort of representative in bargaining. We can imagine CEV of shrimp which is negative utilitarian and wants to kill all shrimp, or positive utilitarian which thinks that even very painful existence is worth it, or CEV that prefers shrimp swimming in heroin, or something human

4Garrett Baker19d

Ok, but if you don't drive to the store one day to get your chocolate, then that is not a major pain for you, yes? Why not just decide that next time you want chocolate at the store, you're not going to go out and get it because you may run over a pedestrian? Your decision there doesn't need to impact your other decisions. Then you ought to keep on making that choice until you are right on the edge of those choices adding up to a first-tier experience, but certainly below. This logic generalizes. You will always be pushing the lower tiers of experience as low as they can go before they enter the upper-tiers of experience. I think the fact that your paragraph above is clearly motivated reasoning here (instead of "how can I actually get the most bang for my buck within this moral theory" style reasoning) shows that you agree with me (and many others) that this is flawed.

6Nick_Tarleton19d

Besides uncertainty, there's the problem of needing to pick cutoffs between tiers in a ~continuous space of 'how much effect does this have on a person's life?', with things slightly on one side or the other of a cutoff being treated very differently. I agree with the intuition that this is important, but I think that points toward just rejecting utilitarianism (as in utility-as-a-function-purely-of-local-experiences, not consequentialism).

-2habryka19d

Huh, I expected better from you. No, it is absolutely not insane to save 10100 shrimp instead of one human! I think the case for insanity for the opposite is much stronger! Please, actually think about how big 10100 is. We are talking about more shrimp than atoms in the universe. Trillions upon trillions of shrimp more than atoms in the universe. This is a completely different kind of statement than "you should trade of seven bees against a human". No, being extremely overwhelmingly confident about morality such that even if you are given a choice to drastically alter 99.999999999999999999999% of the matter in the universe, you call the side of not destroying it "insane" for not wanting to give up a single human life, a thing we do routinely for much weaker considerations, is insane. The whole "tier" thing obviously fails. You always end up dominated by spurious effects on the highest tier. In a universe with any appreciable uncertainty you basically just ignore any lower tiers, because you can always tell some causal story of how your actions might infinitesimally affect something, and so you completely ignore it. You might as well just throw away all morality except the highest tier, it will never change any of your actions.

6ryan_greenblatt19d

It's worth noting that everything funges: some large number of experiences of eating a chocolate bar can be exchanged for avoiding extreme human suffering or death. So, if you lexicographically put higher weight on extreme human suffering or death, then you're willing to make extreme tradeoffs (e.g. 1030 chocolate bar experiences) in terms of mundane utility for saving a single life. I think this easily leads to extremely unintuitive conclusions, e.g. you shouldn't ever be willing to drive to a nice place. See also Trading off Lives. I find your response to this sort of argument under "Relevance: There's reasoning that goes" in the footnote very uncompelling as it doesn't apply to marginal impacts.

Thane Ruthenis19d20

Incidentally, your Intelligence as Privilege Escalation is pretty relevant to that picture. I had it in mind when writing that.

Shortform

Thane Ruthenis19d152

Not necessarily. If humans don't die or end up depowered in the first few weeks of it, it might instead be a continuous high-intensity stress state, because you'll need to be paying attention 24/7 to constant world-upturning developments, frantically figuring out what process/trend/entity you should be hitching your wagon to in order to not be drowned by the ever-rising tide, with the correct choice dynamically changing at an ever-increasing pace.

"Not being depowered" would actually make the Singularity experience massively worse in the short term, precise... (read more)

5S. Alex Bradt15d

This comment has been tumbling around in my head for a few days now. It seems to be both true and bad. Is there any hope at all that the Singularity could be a pleasant event to live through?

Thane Ruthenis19d51

It does sound like it may be a new and in a narrow sense unexpected technical development

I buy that, sure. I even buy that they're as excited about it as they present, that they believe/hope it unlocks generalization to hard-ot-verify domains. And yes, they may or may not be right. But I'm skeptical on priors/based on my model of ML, and their excitement isn't very credible evidence, so I've not moved far from said priors.

3Amalthea19d

Got it! I'm more inclined to generally expect that various half-decent ideas may unlock surprising advances (for no good reason in particular), so I'm less skeptical that this may be true. Also, while math is of course easy to verify, assuming they haven't significantly used verification in the training process, it makes their claims more reasonable.

Thane Ruthenis19d31

Oh, yeah, he's not superintelligence-pilled or anything. I was implicitly comparing with a relatively low baseline, yes.

Thane Ruthenis20d*2-5

Honestly, that thread did initially sound kind of copium-y to me too, which I was surprised by, since his AI takes are usually pretty good^[1] and level-headed. But it makes much more sense under the interpretation that this isn't him being in denial about AI performance, but him undermining OpenAI in response to them defecting against IMO. That's why he's pushing the "this isn't a fair human-AI comparison" line.

^{^}
Edit: For someone who doesn't "feel the ASI", I mean.

5Amalthea19d

I would not characterize Tao's usual takes on AI as particularly good (unless you compare with a relatively low baseline). He's been overall pretty conservative and mostly stuck to reasonable claims about current AI. So there's not much to criticize in particular, but it has come at the cost of him not appreciating the possible/likely trajectories of where things are going, which I think misses the forest for the trees.

Thane Ruthenis20d*179

The claim I'm squinting at real hard is this one:

We developed new techniques that make LLMs a lot better at hard-to-verify tasks.

Like, there's some murkiness with them apparently awarding gold to themselves instead of IMO organizers doing it, and with that other competitive-programming contest at which presumably-the-same model did well being OpenAI-funded. But whatever, I'm willing to buy that they have a model that legitimately achieved roughly this performance (even if a fairer set of IMO judges would've docked points to slightly below the unimpor... (read more)

6ryan_greenblatt19d

When Noam Brown says "hard-to-verify", I think he means that natural language IMO proofs are "substantially harder to verify": he says "proofs are pages long and take experts hours to grade". (Yes, there are also things which are much harder to verify like things that experts strongly disagree about after years of discussion. Also, for IMO programs, "hours to grade" is probably overstated?) Also, I interpreted this as mostly in contrast to cases where outputs are trivial to programmatically verify (or reliably verify with a dumb LLM) in the context of large scale RL. E.g., you can trivially verify the answers to purely numerical math problems (or competitive programming or other programming situations where you have test cases). Indeed, OpenAI LLMs have historically been much better at numerical math problems than proofs, though possibly this gap has now been closed (at least partially). I think this is overall reasonable if you interpret "hard-to-verify" as "substantially harder to verify" and I think this probably how many people would read this by default. I don't have a strong view about whether this method will actually generalize to other cases where experts can verify things with high agreement in a few hours. (Noam Brown doesn't say anything about competitive programming, so I'm not sure why you mentioned that. Competitive programming is trivial to verify.)

2Amalthea19d

Sure, math is not an example of a hard-to-verify task, but I think you're getting unnecessarily hung up on these things. It does sound like it may be a new and in a narrow sense unexpected technical development, and it's unclear how significant it is. I wouldn't try to read into their communications much more.

Thane Ruthenis20d20

Silently sponsoring FrontierMath and receiving access to the question sets, and, if I remember correctly, o3 and o3-mini performing worse on a later evaluation done on a newer private question set of some sort

IIRC, that worse performance was due to using a worse/less adapted agency scaffold, rather than OpenAI making the numbers up or engaging in any other egregious tampering. Regarding ARC-AGI, the December-2024 o3 and the public o3 are indeed entirely different models, but I don't think it implies the December one was tailored for ARC-AGI.

I'm not saying ... (read more)

3lwreader13220d

Are you sure? I'm pretty sure that was cited as *one* of the possible reasons, but not confirmed anywhere. I don't know if some minor scaffolding differences could have that much of an effect on the results (-15%?) in a math benchmark, but if they did, that should have been accounted for in the first place. I don't think other models were tested with scaffolds specifically engineered for them getting a higher score. As per Arc Prize and what they said OpenAI told them, the December version ("o3-preview", as Arc Prize named it) had a compute tier above that of any publicly released model. Not only that, they say that the public version of o3 didn't undergo any RL for ARC-AGI, "not even on the train set". That seems suspicious to me, because once you train a model on something, you can't easily untrain it; as per OpenAI, the ARC-AGI train set was "just a tiny fraction of the o3 train set" and, once again, the model used for evaluations is "fully general". This means that either o3-preview was trained on the ARC-AGI train set somewhere close to the end of the training run and OpenAI was easily able to load an earlier checkpoint to undo that, then not train it on that again for unknown reasons, OR that the public version of o3 was retrained from scratch/a very early checkpoint, then again, not trained on the ARC-AGI data again for unknown reasons, OR that o3-preview was somehow specifically tailored towards ARC-AGI. The latter option seems the most likely to me, especially considering the custom compute tier used in the December evaluation.

Thane Ruthenis20d30

I'd guess it has something to do with whatever they're using to automatically evaluate the performance in "hard-to-verify domains". My understanding is that, during training, those entire proofs would have been the final outputs which the reward function (or whatever) would have taken in and mapped to training signals. So their shape is precisely what the training loop optimized – and if so, this shape is downstream of some peculiarities on that end, the training loop preferring/enforcing this output format.

Thane Ruthenis20d20

Pretty much everybody is looking into test-time compute and RLVR right now. How come (seemingly) nobody else has found out about this "new general-purpose method" before OpenAI?

Well, someone has to be the first, and they got to RLVR itself first last September.

OpenAI has been shown to not be particularly trustworthy when it comes to test and benchmark results

They have? How so?

1lwreader13220d

Silently sponsoring FrontierMath and receiving access to the question sets, and, if I remember correctly, o3 and o3-mini performing worse on a later evaluation done on a newer private question set of some sort. Also whatever happened with their irreproducible ARC-AGI results and them later explicitly confirming that the model that Arc Prize got access to in December was different from the released versions, with different training and a special compute tier, despite OpenAI employees claiming that the version of o3 used in the evaluations was fully general and not tailored towards specific tasks. Sure, but I'm just quite skeptical that it's specifically the lab known for endless hype that does. Besides, a lot less people were looking into RLVR at the time o1-preview was released, so the situations aren't exactly comparable.

Thane Ruthenis21d84

Eh. Scaffolds that involve agents privately iterating on ideas and then outputting a single result are a known approach, see e. g. this, or Deep Research, or possibly o1 pro/o3 pro. I expect it's something along the same lines, except with some trick that makes it work better than ever before... Oh, come to think of it, Noam Brown did have that interview I was meaning to watch, about "scaling test-time compute to multi-agent civilizations". That sounds relevant.

I mean, it can be scary, for sure; no way to be certain until we see the details.

Thane Ruthenis21d30

Misunderstood the resolution terms. ARC-AGI-2 submissions that are eligible for prizes are constrained as follows:

Unlike the public leaderboard on arcprize.org, Kaggle rules restrict you from using internet APIs, and you only get ~$50 worth of compute per submission. In order to be eligible for prizes, contestants must open source and share their solution and work into the public domain at the end of the competition.

Grok 4 doesn't count, and whatever frontier model beats it won't count either. The relevant resolution criterion for frontier model performanc... (read more)