All of utilistrutil's Comments + Replies

MIT Tech Review doesn't break much news. Try Techmeme.

Re "what people are talking about"

Sure, the news is biased toward topics people already think are important because you need readers to click etc etc. But you are people, so you might also think that at least some of those topics are important. Even if the overall news is mostly uncorrelated with your interests, you can filter aggressively.

Re "what they're saying about it"

I think you have in mind articles that are mostly commentary, analysis, opinion. News in the sense I mean it here tells you about som... (read more)

Conditioning as a Crux Finding Device

Say you disagree with someone, e.g. they have low pdoom and you have high pdoom. You might be interested in finding cruxes with them.

You can keep imagining narrower and narrower scenarios in which your beliefs still diverge. Then you can back out properties of the final scenario to identify cruxes.

For example, you start by conditioning on AGI being achieved - both of your pdooms tick up a bit. Then you also condition on that AGI being misaligned, and again your pdooms increase a bit (if the beliefs move in opposite dire... (read more)

(eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined intuition)

Do we have evidence that this is what's going on? My understanding is that distilling from CoT is very sensitive—reordering the reasoning, or even pulling out the successful reasoning, causes the student to be unable to learn from it.

I agree o1 creates training data, but that might just be high quality pre-training data for GPT-5.

Because you are training the CoT to look nice, instead of letting it look however is naturally most efficient for conveying information from past-AI-to-future-AI. The hope of Faithful CoT is that if we let it just be whatever's most efficient, it'll end up being relatively easy to interpret, such that insofar as the system is thinking problematic thoughts, they'll just be right there for us to see. By contrast if we train the CoT to look nice, then it'll e.g. learn euphemisms and other roundabout ways of conveying the same information to its future self, that don't trigger any warnings or appear problematic to humans.

My favored version of this project would involve >50% of the work going into the econ literature and models on investor incentives, with attention to

  • Principal-agent problems
  • Information asymmetry
  • Risk preferences
  • Time discounting

And then a smaller fraction of the work would involve looking into AI labs, specifically. I'm curious if this matches your intentions for the project or whether you think there are important lessons about the labs that will not be found in the existing econ literature.

2Lucie Philippon
I expect that basic econ models and their consequences on the motivations of investors are already mostly known in the AI safety community, even if only through vague statements like "VCs are more risk tolerant than pension funds". My main point in this post is that it might be the case that AI labs successfully removed themselves from the influence of investors, so that it actually matters very little what the investors of AI Labs want or do. I think that determining whether this is the case is important, as in this case our intuitions about how companies work generally would not apply to AI labs.

How does the fiduciary duty of companies to investors work?

OpenAI instructs investors to view their investments "in the spirit of a donation," which might be relevant for this question.

1Lucie Philippon
The link does not work. I don't think a written disclaimer would amount to much in a court case without corresponding provisions in the corporate structure.

I would really like to see a post from someone in AI policy on "Grading Possible Comprehensive AI Legislation." The post would lay out what kind of safety stipulations would earn a bill an "A-" vs a "B+", for example. 

I'm imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don't want to be left wondering "did the safety lobby get everything it asked for, or did it get shafted?" and trying to construct an answer ex-post. 

I don't know how I hadn't seen this post before now! A couple weeks after you published this, I put out my own post arguing against most applications of analogies in explanations of AI risk. I've added a couple references to your post in mine. 

Adult brains are capable of telekinesis, if you fully believe in your ability to move objects with your mind. Adults are generally too jaded to believe such things. Children have the necessary unreserved belief, but their minds are not developed enough to exercise the ability.

Scott Alexander says:

Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think - it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).

I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out.... (read more)

6cubefox
This point was recently elaborated on here: Pascal's Mugging and the Order of Quantification
4Zane
While you're quite right about numbers on the scale of billions or trillions, I don't think it makes sense in the limit for the prior probability of X people existing in the world to fall faster than X grows in size. Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead. *I produced this number via keyboard-mashing; it's not anything special.   Consider the proposition "A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people." If this happened, the probability that their program outputs at least X would fall much slower than X rises, in the limit. The sum doesn't converge at all; the expected number of people created would be literally infinite. So as long as you assign greater than literally zero probability to that proposition - and there's no such thing as zero probability - there must exist some num
5JBlack
There's no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won't. My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don't know - maybe not? Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical "universal" prior like Solomonoff is useless since it's theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as "America's population". On the whole though, your point is quite correct that for many priors you can't "integrate the extreme tails" to get a significant effect. The tails of some priors are just too thin.

Jacob Steinhardt on predicting emergent capabilities:

There’s two principles I find useful for reasoning about future emergent capabilities:

  1. If a capability would help get lower training loss, it will likely emerge in the future, even if we don’t observe much of it now.
  2. As ML models get larger and are trained on more and better data, simpler heuristics will tend to get replaced by more complex heuristics. . . This points to one general driver of emergence: when one heuristic starts to outcompete another. Usually, a simple heuristic (e.g. answering directly) w
... (read more)
2eggsyntax
I think the trouble with that argument is that it seems equally compelling for any useful capabilities, regardless of how achievable they are (eg it works for 'deterministically simulate the behavior of the user' even though that seems awfully unlikely to emerge in the foreseeable future). So I don't think we can take it as evidence that we'll see general reasoning emerge at any nearby scale.

I think you could also push to make government liable as part of this proposal

6DanielFilan
You could but (a) it's much harder constitutionally in the US (governments can only be sued if they consent to being sued, maybe unless other governments are suing them) and (b) the reason for thinking this proposal works is modelling affected actors as profit-maximizing, which the government probably isn't.

There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I'm not even sure about the sign.

Sign of the effect of open source on hype? Or of hype on timelines? I'm not sure why either would be negative.

Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines

  • The example I've heard cited is Stable Diffusion leading to LORA.

There's a countervailing effect of democratizing safety research, which one might think outweighs because it's so much more neglected than capabilities, more low-hanging fruit.

5Erik Jenner
By "those effects" I meant a collection of indirect "release weights → capability landscape changes" effects in general, not just hype/investment. And by "sign" I meant whether those effects taken together are good or bad. Sorry, I realize that wasn't very clear. As examples, there might be a mildly bad effect through increased investment, and/or there might be mildly good effects through more products and more continuous takeoff. I agree that releasing weights probably increases hype and investment if anything. I also think that right now, democratizing safety research probably outweighs all those concerns, which is why I'm mainly worried about Meta etc. not having very clear (and reasonable) decision criteria for when they'll stop releasing weights.
5Garrett Baker
I take this argument very seriously. It in fact does seem the case that very much of the safety research I'm excited about happens on open source models. Perhaps I'm more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there's a big reason to believe open sourcing your model will have disproportionate safety gains: Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model. Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.
2[comment deleted]

GDP is an absolute quantity. If GDP doubles, then that means something. So readers should be thinking about the distance between the curve and the x-axis.

But 1980 is arbitrary. When comparing 2020 to 2000, all that matters is that they’re 20 years apart. No one cares that “2020 is twice as far from 1980 as 2000” because time did not start in 1980.

This is the difference between a ratio scale and a cardinal scale. In a cardinal scale, the distance between points is meaningful, e.g., "The gap between 1 and 2 is twice as big as the gap between 2 and 4." In a r... (read more)

I just came across this word from John Koenig's Dictionary of Obscure Sorrows, that nicely capture the thesis of All Debates Are Bravery Debates.

redesis n. a feeling of queasiness while offering someone advice, knowing they might well face a totally different set of constraints and capabilities, any of which might propel them to a wildly different outcome—which makes you wonder if all of your hard-earned wisdom's fundamentally nonstraferable, like handing someone a gift card in your name that probably expired years ago.

(and perhaps also reversing some past value-drift due to the structure of civilization and so on)

Can you say more about why this would be desirable? 

jessicata10-1

Most civilizations in the past have had "bad values" by our standards. People have been in preference falsification equilibria where they feel like they have to endorse certain values or face social censure. They probably still are falsifying preferences and our civilizational values are probably still bad. E.g. high incidence of people right now saying they're traumatized. CEV probably tends more towards the values of untraumatized than traumatized humans, even from a somewhat traumatized starting point.

The idea that civilization is "oppressive" and some ... (read more)

A lot of this piece is unique to high school debate formats. In the college context, every judge is themself a current or previous debater, so some of these tricks don't work. (There are of course still times when optimizing for competitive success distracts from truth-seeking.)

1Lyrongolem
Hm? Is it? Feel free to correct me if I'm wrong, but in my experience flow judges (who tend to be debaters) tend to grade more on the quality of the arguments as opposed to the quality of the evidence. If you raise a sound rebuttal to a good argument it doesn't score, but if you fail to rebut a bad argument it's still points in your favor.  Is it different in college? 

Here are some responses to Rawls from my debate files:

A2 Rawls

  • Ahistorical
    • Violates property rights
    • Does not account for past injustices eg slavery, just asks what kind of society would you design from scratch. Thus not a useful guide for action in our fucked world.
  • Acontextual
    • Veil of ignorance removes contextual understanding, which makes it impossible to assess different states of the world. Eg from the original position, Rawls prohibits me from using my gender to inform my understanding of gender in different states of the world
    • Identity is not arbitrary! It
... (read more)

1. It’s pretty much a complete guide to action? Maybe there are decisions where it is silent, but that’s true of like every ethical theory like this (“but util doesn’t care about X!”). I don’t think the burden is on him to incorporate all the other concepts that we typically associate with justice. At very least not a problem for “justifying the kind of society he supports”

2. Like the two responses to this are either “Rawls tells you the true conception of the good, ignore the other ones” or “just allow for other-regarding preferences and proceed as usual”... (read more)

What % evals/demos and what % mech interp would you expect to see if there wasn't Goodharting? 1/3 and 1/5 doesn't seem that high to me, given the value of these agendas and the advantages of touching reality that Ryan named.

0Bogdan Ionut Cirstea
Hard to be confident here, but maybe half those numbers or even less (especially for evals/demos)?  If you could choose the perfect portfolio allocation, does it seem reasonable to you that > 1/2 (assuming no overlap) should go to evals/demos and mech interp?

these are the two main ways I would expect MATS to have impact: research output during the program and future research output/career trajectories of scholars.

We expect to achieve more impact through (2) than (1); per the theory of change section above, MATS' mission is to expand the talent pipeline. Of course, we hope that scholars produce great research through the program, but often we are excited about scholars doing research (1) as a means for scholars to become better researchers (2). Other times, these goals are in tension. For example, some mentors ... (read more)

6Orpheus16
I would also suspect that #2 (finding/generating good researchers) is more valuable than #1 (generating or accelerating good research during the MATS program itself). One problem with #2 is that it's usually harder to evaluate and takes longer to evaluate. #2 requires projections, often over the course of years. #1 is still difficult to evaluate (what is "good alignment research" anyways?) but seems easier in comparison. Also, I would expect correlations between #1 and #2. Like, one way to evaluate "how good are we doing at training researchers//who are the best researchers" is to ask "how good is the research they are producing//who produced the best research in this 3-month period?" This process is (of course) imperfect. For example, someone might have great output because their mentor handed them a bunch of ready-to-go-projects, but the scholar didn't actually have to learn the important skills of "forming novel ideas" or "figuring out how to prioritize between many different directions."  But in general, I think it's a pretty decent way to evaluate things. If someone has produced high-quality and original research during the MATS program, that sure does seem like a strong signal for their future potential. Likewise, in the opposite extreme, if during the entire summer cohort there were 0 instances of useful original work, that doesn't necessarily mean something is wrong, but it would make me go "hmmm, maybe we should brainstorm possible changes to the program that could make it more likely that we see high-quality original output next time, and then we see how much those proposed changes trade-off against other desireada." (It seems quite likely to me that the MATS team has already considered all of this; just responding on the off-chance that something here is useful!)

Thanks, Aaron! That's helpful to hear. I think "forgetting" is a good candidate explanation because scholars answered that question right after competing Alignment 201, which is designed for breadth. Especially given the expedited pace of the course, I wouldn't be surprised if people forgot a decent chunk of A201 material over the next couple months. Maybe for those two scholars, forgetting some A201 content outweighed the other sources of breadth they were afforded, like seminars, networking, etc.

Today I am thankful that Bayes' Rule is unintuitive. 

Much ink has been spilled complaining that Bayes' Rule can yield surprising results. As anyone who has taken an introductory statistics class knows, it is difficult to solve a problem that requires an application of Bayes' Rule without plugging values into the formula, at least for a beginner. Eventually, the student of Bayes may gain an intuition for the Rule (perhaps in odds form), but at that point they can be trusted to wield their intuition responsibly because it was won through disciplined pra... (read more)

Update: We have finalized our selection of mentors.

I have a friend Balaam who has a very hard time saying no. If I ask him, “Would it bother you if I eat the last slice of pizza?” he will say “Of course that’s fine!” even if it would be somewhat upsetting or costly to him.

 

I think this is a reference to Guess/Ask/Tell Culture, so I'm linking that post for anyone interested :)

This happens in chess all the time!