LESSWRONG
LW

All of MichaelDickens's Comments + Replies

METR: Measuring AI Ability to Complete Long Tasks

Thanks, that's useful info!

I thought you could post images by dragging and dropping files into the comment box, I seem to recall doing that in the past, but now it doesn't seem to work for me. Maybe that only works for top-level posts?

3habryka17h

Maybe you switched to the Markdown editor at some point. It still works in the (default) WYSIWYG editor.

MichaelDickens's Shortform

MichaelDickens2d120

Is Claude "more aligned" than Llama?

Anthropic seems to be the AI company that cares the most about AI risk, and Meta cares the least. If Anthropic is doing more alignment research than Meta, do the results of that research visibly show up in the behavior of Claude vs. Llama?

I am not sure how you would test this. The first thing that comes to mind is to test how easily different LLMs can be tricked into doing things they were trained not to do, but I don't know if that's a great example of an "alignment failure". You could test model deception but you'd nee... (read more)

5Jozdien2d

I think we're nearing - or at - the point where it'll be hard to get general consensus on this. I think that Anthropic's models being more prone to alignment fake makes them "more aligned" than other models (and in particular, that it vindicates Claude 3 Opus as the most aligned model), but others may disagree. I can think of ways you could measure this if you conditioned on thinking alignment faking (and other such behaviours) was good, and ways you could measure if you conditioned on the opposite, but few really interesting and easy ways to measure in a way that's agnostic.

Shortform

MichaelDickens13d10

Hmm I wonder if this is why so many April Fools posts have >200 upvotes. April Fools Day in cahoots with itself?

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens15d1-2

isn't your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?

Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with "completely pause" on one end and "race as fast as possible" on the other. Pushing more toward the "pause" side of the spectrum has the ~opposite effect as pushing toward the "race" side.

I wish you'd try modeling this with more granularity than "is alignment hard" or whatever

I've n

MichaelDickens15d60

I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn't simultaneously somehow slow down non-US development.

It seems to me that to believe this, you have to believe all of these four things are true:

Solving AI alignment is basically easy
Non-US frontier AI developers are not interested in safety
Non-US frontier AI developers will quickly catch up to the US
If US developers slow down, then non-US developers are very unlikely to also slow down—either voluntarily, or because the US strong-arms them i

... (read more)

6Buck15d

I disagree that you have to believe those four things in order to believe what I said. I believe some of those and find others too ambiguously phrased to evaluate. Re your model: I think your model is basically just: if we race, we go from 70% chance that US "wins" to a 75% chance the US wins, and we go from a 50% chance of "solving alignment" to a 25% chance? Idk how to apply that here: isn't your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good? Maybe you're using "race" to mean "not pause" and "not race" to mean "pause"; if so, that's super confusing terminology. If we unilaterally paused indefinitely, surely we'd have less than 70% chance of winning. In general, I think you're modeling this extremely superficially in your comments on the topic. I wish you'd try modeling this with more granularity than "is alignment hard" or whatever. I think that if you try to actually make such a model, you'll likely end up with a much better sense of where other people are coming from. If you're trying to do this, I recommend reading posts where people explain strategies for passing safely through the singularity, e.g. like this.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens15d10

That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!

The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I'd say it was net positive.

(My guess is it's more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)

2Davidmanheim15d

I'm sure it played some nonzero role, but is it anything like enough of an impact, and enough of a role to compensate for all the marginal harms of global warming because of stopping deployment of nuclear power (which they are definitely largely responsible for)?

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens15d40

Yeah I actually agree with that, I don't think it was sufficient, I just think it was pretty good. I wrote the comment too quickly without thinking about my wording.

Why do many people who care about AI Safety not clearly endorse PauseAI?

Answer by MichaelDickensMar 30, 2025243

I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.

(I do in fact support PauseAI US, and I have defended it publicly because I think it's important to do so, but it makes me feel silly whenever I do.)

That's not the only reason why people don't endorse PauseAI, but I think it's an important reason that should be mentioned.

3mako yass16d

I notice they have a Why do you protest section in their FAQ. I hadn't heard of these studies before Regardless, I still think there's room to make protests cooler and more fun and less alienating, and when I mentioned this to them they seemed very open to it.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens16d*5-4

Well -- I'm gonna speak broadly -- if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI -- for instance the kind of policy measures proposed by people working at AI companies isn't enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on.

They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI devel... (read more)

5Davidmanheim15d

"sufficient to stop unsafe AI development? I think there is indeed exactly one such policy measure, which is SB 1047," I think it's obviously untrue that this would stop unsafe AI - it is as close as any measure I've seen, and would provide some material reduction in risk in the very near term, but (even if applied universally, and no-one tried to circumvent it,) it would not stop future unsafe AI.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens16d5-1

If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it see bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.

A couple notes on this:

AFAICT PauseAI US does not do the thing you describe.
I've looked at a good amount of research on protest effectiveness. There are many obser

... (read more)

51a3orn15d

I'm not trying to get into the object level here. But people could both: * Believe that making such hard-to-defend claims could backfire, disagreeing with those experiments that you point out or * Believe that making such claims violates virtue-ethics-adjacent commitments to truth or * Just not want to be associated, in an instinctive yuck kinda way, with people who make these kinds of dubious-to-them claims. Of course people could be wrong about the above points. But if you believed these things, then they'd be intelligible reasons not to be associated with someone, and I think a lot of the claims PauseAI makes are such that a large number of people people would have these reactions.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens16d4-2

B. "Pausing AI" is indeed more popular than PauseAI, but it's not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think "yeah I support pausing AI."

This strikes me as a very strange claim. You're essentially saying, even if a general policy is widely supported, it's practically impossible to implement any specific version of that policy? Why would that be true?

For example I think a better alternative to "nobody fund... (read more)

2Davidmanheim15d

Banning nuclear weapons is exactly like this. If it could be done universally and effectively, it would be great, but any specific version seems likely to tilt the balance of power without accomplishing the goal. That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle

MichaelDickens16d10

I don't think you could refute it. I believe you could construct a binary polynomial function that gives the correct answer to every example.

For example it is difficult to reconcile the cases of 3, 12, and 19 using a reasonable-looking function, but you could solve all three cases by defining E E as the left-associative binary operation

f(x, y) = -1/9 x^2 + 32/9 x - 22/9 + y

2Czynski16d

If you give it the up-front caveat "this can represent all rational numbers and at least some algebraic irrationals", I think that rules out the polynomial appromixation approach, since you can't give arbitrary arguments and get intermediate values by continuity. But I'm not certain of that.

AI #109: Google Fails Marketing Forever

MichaelDickens19d32

You could technically say Google is a marketing company, but Google's ability to sell search ads doesn't depend on being good at marketing in the traditional sense. It's not like Google is writing ads themselves and selling the ad copy to companies.

1Mis-Understandings19d

Exactly. It is notable that google hosts so much ad copy, but is bad at it. You would think that they could get good by imitation, but turns out that no, imitating good marketing is hard.

Will Jesus Christ return in an election year?

MichaelDickens19d20

I believe the correct way to do this, at least in theory, is to simply have bets denominated in the risk-free rate—and if anyone wants more risk, they can use leverage to simultaneously invest in equities and prediction markets.

Right now I don't know if it's possible to use margin loans to invest in prediction markets.

AI "Deep Research" Tools Reviewed

MichaelDickens20d11

I looked through ChatGPT again and I figured out that I did in fact do it wrong. I found Deep Research by going to the "Explore GPTs" button in the top right, which AFAICT searches through custom modules made by 3rd parties. The OpenAI-brand Deep Research is accessed by clicking the "Deep research" button below the chat input text box.

On (Not) Feeling the AGI

MichaelDickens20d30

I don't really get the point in releasing a report that explicitly assumes x-risk doesn't happen. Seems to me that x-risk is the only outcome worth thinking about given the current state of the AI safety field (i.e. given how little funding goes to x-risk). Extinction is so catastrophically worse than any other outcome* that more "normal" problems aren't worth spending time on.

I don't mean this as a strong criticism of Epoch, more that I just don't understand their worldview at all.

*except S-risks but Epoch isn't doing anything related to those AFAIK

Vladimir_Nesov20d122

Working through a model of the future in a better-understood hypothetical refines gears applicable outside the hypothetical. Exploratory engineering for example is about designing machines that can't be currently built in practice and often never will be worthwhile to build as designed. It still gives a sense of what's possible.

(Attributing value to steps of a useful activity is not always practical. Research is like that, very useful that it's happening overall, but individual efforts are hard to judge, and so acting on attempts to judge them risks goodhart curse.)

Will Jesus Christ return in an election year?

MichaelDickens21d40

for example, by having bets denominated in S&P 500 or other stock portfolios rather than $s

Bets should be denominated in the risk-free rate. Prediction markets should invest traders' money into T-bills and pay back the winnings plus interest.

I believe that should be a good enough incentive to make prediction markets a good investment if you can find positive-EV bets that aren't perfectly correlated with equities (or other risky assets).

(For Polymarket the situation is a bit more complicated because it uses crypto.)

3Linch20d

I think in an ideal world we'd have prediction markets structured around several different levels of investment risk, so that people with different levels of investment risk tolerance can make bets (and we might also observe fascinating differences if the odds diverge, eg if AGI probabilities are massively different between S&P 500 bets and T-bills bets, for example).

AI "Deep Research" Tools Reviewed

MichaelDickens21d1-1

Thanks, this is helpful! After reading this post I bought ChatGPT Plus and tried a question on Deep Research:

Please find literature reviews / meta-analyses on the best intensity at which to train HIIT (i.e. maximum sustainable speed vs. leaving some in the tank)

I got much worse results than you did:

ChatGPT misunderstood my question. Its response answered the question "is HIIT better than MICT for improving fitness".
Even allowing that we're talking about HIIT vs. MICT: I was previously aware of 3 meta-analyses on that question. ChatGPT cited none of

... (read more)

1MichaelDickens20d

Will Jesus Christ return in an election year?

MichaelDickens22d20

[Time Value of Money] The Yes people are betting that, later this year, their counterparties (the No betters) will want cash (to bet on other markets), and so will sell out of their No positions at a higher price.

How does this strategy compare to shorting bonds? Both have the same payoff structure (they make money if the discount rate goes up) but it's not clear to me which is a better deal. I suppose it depends on whether you expect Polymarket investors to have especially high demand for cash.

6Eric Neyman22d

Yeah, I think the time value of Polymarket cash doesn't track the time value of money in the global economy especially closely: And so it's not unreasonable to have opinions on the future time value of Polymarket cash that differs substantially from your opinions on the future time value of money.

Mo Putera's Shortform

MichaelDickens24d20

I'm glad to hear that! I often don't hear much response to my essays so it's good to know you've read some of them :)

2Mo Putera22d

You're welcome :) in particular, your 2015 cause selection essay was I thought a particularly high-quality writeup of the end-to-end process from personal values to actual donation choice and (I appreciated this) where you were most likely to change your mind, so I recommended it to a few folks as well as used it as a template myself back in the day. In general I think theory-practice gap bridging via writeups like those are undersupplied, especially the end-to-end ones — more writeups bridge parts of the "pipeline", but "full pipeline integration" done well is rare and underappreciated, which combined with how effortful it is to do it makes me not surprised there isn't more of it.

Mo Putera's Shortform

MichaelDickens25d41

I don't have a mistakes page but last year I wrote a one-off post of things I've changed my mind on.

2Mo Putera25d

Thanks Michael. On another note, I've recommended some of your essays to others, so thanks for writing them as well.

METR: Measuring AI Ability to Complete Long Tasks

MichaelDickens1mo164

I have a few potential criticisms of this paper. I think my criticisms are probably wrong and the paper's conclusion is right, but I'll just put them out there:

Nearly half the tasks in the benchmark take 1 to 30 seconds (the ones from the SWAA set). According to the fitted task time <> P(success) curve, most tested LLMs should be able to complete those with high probability, so they don't provide much independent signal.
- However, I expect task time <> P(success) curve would look largely the same if you excluded the SWAA tasks.
SWAA tasks t

... (read more)

chelsea2d*100

I think this criticism is wrong—if it were true, the across-dataset correlation between time and LLM-difficulty should be higher than the within-dataset correlation, but from eyeballing Figure 4 (page 10), it looks like it's not higher (or at least not much).

It is much higher. ~~I'm not sure how/if I can post images of the graph here, but~~ the R^2 for SWAA only is 0.27, HCAST only is 0.48, and RE-bench only is 0.01.

Graph with log(human time-to-complete) on the x-axis and Mean Model Success Rate on the y-axis. It shows all SWAA tasks, with a linear negative trend line.

Graph with log(human time-to-complete) on the x-axis and Mean Model Success Rate on the y-axis. It shows all HCAST tasks, with a linear negative trend line.

Graph with log(human time-to-complete) on the x-axis and Mean Model Success Rate on the y-axis. It shows all RE-bench tasks, and a positive trend line that doesn't really fit the data (R^2 = 0.01).

Also, HCAST R^2 goes down to 0.41 if you exclude the 21/97 data points where the human time source is an estimate. I'm not really sure why t... (read more)

6Thomas Kwa26d

Regarding 1 and 2, I basically agree that SWAA doesn't provide much independent signal. The reason we made SWAA was that models before GPT-4 got ~0% on HCAST, so we needed shorter tasks to measure their time horizon. 3 is definitely a concern and we're currently collecting data on open-source PRs to get a more representative sample of long tasks.

3Julian Bradshaw1mo

Re: HCAST tasks, most are being kept private since it's a benchmark. If you want to learn more here's the METR's paper on HCAST.

METR: Measuring AI Ability to Complete Long Tasks

MichaelDickens1mo10

Why do you think this narrows the distribution?

I can see an argument for why, tell me if this is what you're thinking–

The biggest reason why LLM paradigm might never reach AI takeoff is that LLMs can only complete short-term tasks, and can't maintain coherence over longer time scales (e.g. if an LLM writes something long, it will often start contradicting itself). And intuitively it seems that scaling up LLMs hasn't fixed this problem. However, this paper shows that LLMs have been getting better at longer-term tasks, so LLMs probably will scale to AGI.

How should TurnTrout handle his DeepMind equity situation?

MichaelDickens1mo10

A few miscellaneous thoughts:

I agree with Dagon that the most straightforward solution is simply to sell your equity as soon as it vests. If you don't do anything else then I think at least you should do that—it's a good idea just on the basis of diversification, not even considering conflicts of interest.
I think you should be willing to take quite a large loss to divest. In a blog post, I estimated that for an investor with normal-ish risk aversion, it's worth paying ~4% per year to avoid the concentration risk of holding a single mega-cap stock (so yo

... (read more)

nikola's Shortform

MichaelDickens2mo1617

This is the belief of basically everyone running a major AGI lab. Obviously all but one of them must be mistaken, but it's natural that they would all share the same delusion.

I agree with this description and I don't think this is sane behavior.

nikola's Shortform

MichaelDickens2mo1811

Actions speak louder than words, and their actions are far less sane than these words.

For example, if Demis regularly lies awake at night worrying about how the thing he's building could kill everyone, why is he still putting so much more effort into building it than into making it safe?

7GeneSmith2mo

Probably because he thinks there's a lower chance of it killing everyone if he makes it. And that if it doesn't kill everyone then he'll do a better job managing it than the other lab heads. This is the belief of basically everyone running a major AGI lab. Obviously all but one of them must be mistaken, but it's natural that they would all share the same delusion.

Quinn's Shortform

MichaelDickens2mo30

I was familiar enough to recognize that it was an edit of something I had seen before, but not familiar enough to remember what the original was

sarahconstantin's Shortform

MichaelDickens2mo10

I'm really not convinced that public markets do reliably move in the predictable (downward) direction in response to "bad news" (wars, coups, pandemics, etc).

Also, market movements are hard to detect. How much would Trump violating a court order decrease the total (time-discounted) future value of the US economy? Probably less than 5%? And what is the probability that he violates a court order? Maybe 40%? So the market should move <2%, and evidence about this potential event so far has come in slowly instead of at a single dramatic moment so this <2% drop could have been spread over multiple weeks.

Quinn's Shortform

MichaelDickens2mo1111

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the t

... (read more)

harfe's Shortform

MichaelDickens2mo10

I don't know what the regulatory plan is, I was just referring to this poll, which I didn't read in full, I just read the title. Reading it now, it's not so much a plan as a vision, and it's not so much "Musk's vision" as it is a viewpoint (that the poll claims is associated with Musk) in favor of regulating the risks of AI. Which is very different from JD Vance's position; Vance's position is closer to the one that does not poll well.

harfe's Shortform

MichaelDickens2mo20

I guess I'm expressing doubt about the viability of wise or cautious AI strategies, given our new e/acc world order, in which everyone who can, is sprinting uninhibitedly towards superintelligence.

e/acc does not poll well and there is widespread popular support for regulating AI (see AIPI polls). If the current government favors minimal regulations, that's evidence that an AI safety candidate is more likely to succeed, not less.

(Although I'm not sure that follows because I think the non-notkilleveryonism variety of AI safety is more popular. Also Musk's regulatory plan is polling well and I'm not sure if it differs from e.g. Vance's plan.)

harfe2mo120

Also Musk's regulatory plan is polling well

What plan are you referring to? Is this something AI safety specific?

Nisan's Shortform

MichaelDickens2mo94

If you publicly commit to something, taking down the written text does not constitute a de-commitment. Violating a prior commitment is unethical regardless of whether the text of the commitment is still on your website.

(Not that there's any mechanism to hold Google to its commitments, or that these commitments ever meant anything—Google was always going to do whatever it wanted anyway.)

2cubefox2mo

Which shows that "commitments" without any sort of punishment are worth basically nothing. They can all just be silently deleted from your website without generating significant backlash. There is also a more general point about humans: People can't really "commit" to doing something. You can't force your future self to do anything. Our present self treats past "commitments" as recommendations at best.

Dario Amodei: On DeepSeek and Export Controls

MichaelDickens2mo10

Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train

I don't get this, if frontier(ish) models cost $10M–$100M, why is Nvidia's projected revenue more like $1T–$10T? Is the market projecting 100,000x growth in spending on frontier models within the next few years? I would have guessed more like 100x–1000x growth but at least one of my numbers must be wrong. (Or maybe they're all wrong by ~1 OOM?)

Drake Thomas's Shortform

MichaelDickens3mo58

This is actually a crazy big effect size? Preventing ~10–50% of a cold for taking a few pills a day seems like a great deal to me.

5Drake Thomas3mo

I agree, zinc lozenges seem like they're probably really worthwhile (even in the milder-benefit worlds)! My less-ecstatic tone is only relative to the promise of older lesswrong posts that suggested it could basically solve all viral respiratory infections, but maybe I should have made the "but actually though, buy some zinc lozenges" takeaway more explicit.

evhub's Shortform

MichaelDickens4mo11-2

Don't push the frontier of regulations. Obviously this is basically saying that Anthropic should stop making money and therefore stop existing. The more nuanced version is that for Anthropic to justify its existence, each time it pushes the frontier of capabilities should be earned by substantial progress on the other three points.

I think I have a stronger position on this than you do. I don't think Anthropic should push the frontier of capabilities, even given the tradeoff it faces.

If their argument is "we know arms races are bad, but we have to accele... (read more)

4JustinShovelain3mo

I agree. Anthropic's marginal contribution to safety (compared to what we would have in a world without Anthropic) probably doesn't offset Anthropic's contribution to the AI race. I think there are more worlds where Anthropic is contributing to the race in a negative fashion than there are worlds where Anthropic's marginal safety improvement over OpenAI/DeepMind-ish orgs is critical for securing a good future with AGI (weighing things according to the impact sizes and probabilities).

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

MichaelDickens4mo10

If lysine is your problem but you don't want to eat beans, you can also buy lysine supplements.

1Gordon Seidoh Worley4mo

If someone has gone so far as to buy supplements, they have already done far more to engineer their nutrition than the vegans who I've known who struggle with nutrition.

Remap your caps lock key

MichaelDickens4mo20

I primarily use a weird ergonomic keyboard (the Kinesis Advantage 2) with custom key bindings. But my laptop keyboard has normal key bindings, so my "normal keyboard" muscle memory still works.

Remap your caps lock key

MichaelDickens4mo30

On Linux Mint with Cinnamon, you can do this in system settings by going to Keyboard -> Layouts -> Options -> Caps Lock behavior. (You can also put that line in a shell script and set the script to run at startup.)

Remap your caps lock key

MichaelDickens4mo10

I use a Kinesis Advantage keyboard with the keys rebound to look like this (apologies for my poor graphic design skills):

https://i.imgur.com/Mv9FI7a.png

Caps Lock is rebound to Backspace and Backspace is rebound to Shift.
Right Shift is rebound to Ctrl + Alt + Super, which I use as a command prefix for window manager commands.
"custom macro" uses the keyboard's built-in macro feature to send a sequence of four keypresses (Alt-G Ctrl-`), which I use as a prefix for some Emacs commands.
By default, the keyboard has two backslash (\) keys. I use the OS keybo

... (read more)

Effective Altruism FAQ

MichaelDickens4mo1-2

There were two different clauses, one about malaria and the other about chickens. "Helping people is really important" clearly applies to the malaria clause, and there's a modified version of the statement ("helping animals is really important") that applies to the chickens clause. I think writing it that way was an acceptable compromise to simplify the language and it's pretty obvious to me what it was supposed to mean.
"We should help more rather than less, with no bounds/limitations" is not a necessary claim. It's only necessary to claim "we should help more rather than less if we are currently helping at an extremely low level".

2Said Achmiz4mo

A strange objection—since if you are correct and this is what was meant, then it strengthens my point. If thinking that helping people is really important AND that we should help more rather than less doesn’t suffice to conclude that we should give to chicken-related charities, then still less does merely one of those two premises suffice. (And “helping animals is really important” is, of course, quite far from an uncontroversial claim.) No, this does not suffice. It would only suffice if giving to chicken-related charities were the first (or close to the first) charity that we’d wish to give to, if we were increasing our helping from an extremely low level to a higher one. Otherwise, if we believe, for instance, that helping a little is better than none, and helping a reasonable and moderate amount is better than helping a little, but helping a very large amount is worse (or even just no better) than the preceding, then this threshold may easily be reached long before we get anywhere near chickens (or any other specific cause). In order to guarantee the “we should give to chicken-related charities” conclusion, the “helping more is better than helping less” principle must be unbounded and unlimited.

What is MIRI currently doing?

Answer by MichaelDickensDec 14, 202450

MIRI's communications strategy update published in May explained what they were planning on working on. I emailed them a month or so ago and they said they are continuing to work on the things in that blog post. They are the sorts of things that can take longer than a year so I'm not surprised that they haven't released anything substantial in the way of comms this year.

1Roko4mo

This?

Analysis of Global AI Governance Strategies

MichaelDickens4mo33

That's only true if a single GPU (or small number of GPUs) is sufficient to build a superintelligence, right? I expect it to take many years to go from "it's possible to build superintelligence with a huge multi-billion-dollar project" and "it's possible to build superintelligence on a few consumer GPUs". (Unless of course someone does build a superintelligence which then figures out how to make GPUs many orders of magnitude cheaper, but at that point it's moot.)

2Nathan Helm-Burger4mo

Sadly, no. It doesn't take superintelligence to be deadly. Even current open-weight LLMs, like Llama 3 70B, know quite a lot about genetic engineering. The combination of a clever and malicious human, and an LLM able to offer help and advice is sufficient. Furthermore, there is the consideration of "seed AI" which is competent enough to improve and not plateau. If you have a competent human helping it and getting it unstuck, then the bar is even lower. My prediction is that the bar for "seed AI" is lower than the bar for AGI.

Analysis of Global AI Governance Strategies

MichaelDickens4mo10

I don't think controlling compute would be qualitatively harder than controlling, say, pseudoephedrine.

(I think it would be harder, but not qualitatively harder—the same sorts of strategies would work.)

2Nathan Helm-Burger4mo

I agree that some amount of control is possible. But if we are in a scenario in the future where the offense-defense balance of bioweapons remains similar to how it is today, then a single dose of pseudoephedrine going unregulated by the government and getting turned into methamphetamine could result in the majority of humanity being wiped out. Pseudoephedrine is regulated, yes, but not so strongly that literally none slips past the enforcement. With the stakes so high, a mostly effective enforcement scheme doesn't cut it.

Analysis of Global AI Governance Strategies

MichaelDickens4mo104

Also, I don't feel that this article adequately addressed the downside of SA that it accelerates an arms race. SA is only favored when alignment is easy with high probability and you're confident that you will win the arms race, and you're confident that it's better for you to win than for the other guy[1], and you're talking about a specific kind of alignment where an "aligned" AI doesn't necessarily behave ethically, it just does what its creator intends.

[1] How likely is a US-controlled (or, more accurately, Sam Altman/Dario Amodei/Mark Zuckerberg-contr... (read more)

4Sammy Martin4mo

We do discuss this in the article and tried to convey that it is a very significant downside of SA. All 3 plans have enormous downsides though, so a plan posing massive risks is not disqualifying. The key is understanding when these risks might be worth taking given the alternatives. * CD might be too weak if TAI is offense-dominant, regardless of regulations or cooperative partnerships, and result in misuse or misalignment catastrophe * If GM fails it might blow any chance of producing protective TAI and hand over the lead to the most reckless actors. * SA might directly provoke a world war or produce unaligned AGI ahead of schedule. SA is favored when alignment is easy or moderately difficult (e.g. at the level where interpretability probes, scalable oversight etc. help) with high probability, and you expect to win the arms race. But it doesn't require you to be the 'best'. The key isn't whether US control is better than Chinese control, but whether centralized development under any actor is preferable to widespread proliferation of TAI capabilities to potentially malicious actors Regarding whether the US (remember on SA there's assumed to be extensive government oversight) is better than the CCP: I think the answer is yes and I talk a bit more about why here. I don't consider US AI control being better than Chinese AI control to be the most important argument in favor of SA, however. That fact alone doesn't remotely justify SA: you also need easy/moderate alignment and you need good evidence than an arms race is likely unavoidable regardless of what we recommend.

Analysis of Global AI Governance Strategies

MichaelDickens4mo80

Cooperative Development (CD) is favored when alignment is easy and timelines are longer. [...]
Strategic Advantage (SA) is more favored when alignment is easy but timelines are short (under 5 years)

I somewhat disagree with this. CD is favored when alignment is easy with extremely high probability. A moratorium is better given even a modest probability that alignment is hard, because the downside to misalignment is so much larger than the downside to a moratorium.[1] The same goes for SA—it's only favored when you are extremely confident about alignment +... (read more)

2Sammy Martin4mo

Let me clarify an important point: The strategy preferences outlined in the paper are conditional statements - they describe what strategy is optimal given certainty about timeline and alignment difficulty scenarios. When we account for uncertainty and the asymmetric downside risks - where misalignment could be catastrophic - the calculation changes significantly. However, it's not true that GM's only downside is that it might delay the benefits of TAI. Misalignment (or catastrophic misuse) has a much larger downside than a successful moratorium. That is true, but trying to do a moratorium, losing your lead, and then someone else developing catastrophically misaligned AI when you could have developed a defense against it if you'd adopted CD or SA has just as large a downside. And GM has a lower chance of being adopted than CD or SA, so the downside to pushing for a moratorium is not necessarily lower. Since a half-successful moratorium is the worst of all worlds (assuming that alignment is feasible) because you lose out on your chances of developing defenses against unaligned or misused AGI, it's not always true that the moratorium plan has fewer downsides than the others. However, I agree with your core point - if we were to model this with full probability distributions over timeline and alignment difficulty, GM would likely be favored more heavily than our conditional analysis suggests, especially if we place significant probability on short timelines or hard alignment

4Nathan Helm-Burger4mo

I think moratorium is basically intractable short of a totalitarian world government cracking down on all personal computers. Unless you mean just a moratorium on large training runs, in which case I think it buys a minor delay at best, and comes with counterproductive pressures on researchers to focus heavily on diverse small-scale algorithmic efficiency experiments.

MichaelDickens4mo104

[1] How likely is a US-controlled (or, more accurately, Sam Altman/Dario Amodei/Mark Zuckerberg-contr... (read more)

A few questions about recent developments in EA

MichaelDickens5mo21

I was asking a descriptive question here, not a normative one. Guilt by association, even if weak, is a very commonly used form of argument, and so I would expect it to be in used in this case.

I intended my answer to be descriptive. EAs generally avoid making weak arguments (or at least I like to think we do).

A few questions about recent developments in EA

MichaelDickens5mo111

I will attempt to answer a few of these.

Why has EV made many moves in the direction of decentralizing EA, rather than in the direction of centralizing it?

Power within EA is currently highly centralized. It seems very likely that the correct amount of centralization is less than the current amount.

Why, as an organization aiming to ensure the health of a community that is majority male and includes many people of color, does the CEA Community Health team consist of seven white women, no men, and no people of color?

This sounds like a rhetorical qu... (read more)

3Viliam5mo

Working hard together with similarly minded people seems great. Never taking a break, and isolating yourself from the world, is not. People working at startups usually get at least free weekends, and often have a partner at home who is not a member of the startup. If you never take a break, I suspect that you are optimizing for appearing to work hard, rather than for actually being productive.

2Mateusz Bagiński5mo

Not necessarily guilt-by-association, but maybe rather pointing out that the two arguments/conspiracy theories share a similar flawed structure, so if you discredit one, you should discredit the other. Still, I'm also unsure how much structure they share, and even if they did, I don't think this would be discursively effective because I don't think most people care that much about (that kind of) consistency (happy to be updated in the direction of most people caring about it).

2Peter Berggren5mo

Thanks for giving some answers here to these questions; it was really helpful to have them laid out like this. 1. In hindsight, I was probably talking more about moves towards decentralization of leadership, rather than decentralization of funding. I agree that greater decentralization of funding is a good thing, but it seems to me like, within the organizations funded by a given funder, decentralization of leadership is likely useless (if leadership decisions are still being made by informal networks between orgs rather than formal ones), or it may lead to a lack of clarity and direction. 3. I understand the dynamics that may cause the overrepresentation of women. However, that still doesn't completely explain why there is an overrepresentation of white women, even when compared to racial demographics within EA at large. Additionally, this also doesn't explain why the overrepresentation of women here isn't seen as a problem on CEA's part, if even just from an optics perspective. 4. Makes sense, but I'm still concerned that, say, if CEA had an anti-Stalinism team, they'd be reluctant to ever say "Stalinism isn't a problem in EA." 5. Again, this was a question that was badly worded on my end. I was referring more specifically to organizations within AI safety, more than EA at large. I know that AMF, GiveDirectly, The Humane League, etc. fundraise outside EA. 6. I was asking a descriptive question here, not a normative one. Guilt by association, even if weak, is a very commonly used form of argument, and so I would expect it to be in used in this case. 7. That makes sense. That was one of my hypotheses (hence my phrase "at least upon initial examination"), and I guess in hindsight it's probably the best one. 10. Starting an AI capabilities company that does AI safety as a side project generally hasn't gone well, and yet people keep doing it. The fact that something hasn't gone well in the past doesn't seem to me to be a sufficient explanation for why people do

Zvi’s Thoughts on His 2nd Round of SFF

MichaelDickens5mo30

Thank you for writing about your experiences! I really like reading these posts.

How big an issue do you think the time constraints were? For example, how much better a job could you have done if all the recommenders got twice as much time? And what would it take to set things up so the recommenders could have twice as much time?

4Zvi5mo

I think twice as much time actually spent would have improved decisions substantially, but is tough - everyone is very busy these days, so it would require both a longer working window, and also probably higher compensation for recommenders. At minimum, it would allow a lot more investigations especially of non-connected outsider proposals.

Announcing turntrout.com, my new digital home

MichaelDickens5mo30

Do you think a 3-state dark mode selector is better than a 1-state (where "auto" is the only state)? My website is 1-state, on the assumption that auto will work for almost everyone and it lets me skip the UI clutter of having a lighting toggle that most people won't use.

Also, I don't know if the site has been updated but it looks to me like turntrout.com's two modes aren't dark and light, they're auto and light. When I set Firefox's appearance to dark or auto, turntrout.com's dark mode appears dark, but when I set Firefox to light, turntrout.com appears l... (read more)

2TurnTrout5mo

IIRC my site checks (in descending priority): 1. localStorage to see if they've already told my site a light/dark preference; 2. whether the user's browser indicates a global light/dark preference (this is the "auto"); 3. if there's no preference, the site defaults to light. The idea is "I'll try doing the right thing (auto), and if the user doesn't like it they can change it and I'll listen to that choice." Possibly it will still be counterintuitive to many folks, as Said quoted in a sibling comment.

Said Achmiz5mo102

Do you think a 3-state dark mode selector is better than a 1-state (where “auto” is the only state)? My website is 1-state, on the assumption that auto will work for almost everyone and it lets me skip the UI clutter of having a lighting toggle that most people won’t use.

Gwern discusses this on his “Design Graveyard” page:

Auto-dark mode: a good idea but “readers are why we can’t have nice things”.
OSes/browsers have defined a ‘global dark mode’ toggle the reader can set if they want dark mode everywhere, and this is available to a web page; if you are

MichaelDickens5mo219

OP did the work to collect these emails and put them into a post. When people do work for you, you shouldn't punish them by giving them even more work.