Drake Thomas

Interested in math puzzles, fermi estimation, strange facts about the world, toy models of weird scenarios, unusual social technologies, and deep dives into the details of random phenomena.

Working on the pretraining team at Anthropic as of October 2024; before that I did independent alignment research of various flavors and worked in quantitative finance.

Posts

Sorted by New

5Drake Thomas's Shortform

9mo

141Auditing language models for hidden objectives

4mo

5Drake Thomas's Shortform

9mo

25Catastrophic Regressional Goodhart: Appendix

180When is Goodhart catastrophic?

Wikitag Contributions

Try Things

(+2/-3)

Comments

Sorted by

Newest

A case for courage, when speaking of AI danger

Drake Thomas2d52

I agree with you that a typical instance of working at an AI lab has worse consequences in expectation than working at a tobacco company, and I think that for a person who shares all your epistemic beliefs to work in a typical role at an AI lab would indeed be a worse failure of moral character than to work at a tobacco company.

I also agree that in many cases people at AI labs have been exposed at least once to arguments which, if they had better epistemics and dedicated more time to thinking about the consequences of their work, could have convinced them that it was bad for the world for them to do such work. And I do think the failure to engage with such arguments and seriously consider them, in situations like these, is a stain on someone's character! But I think it's the sort of ethical failure which a majority of humans will make by default, rather than something indicative of remarkably bad morality.

Tobacco kills about 8 million people a year globally. ASI could kill about 8 billion.

I just don't think this sort of utilitarian calculus makes sense to apply when considering the actions of people who don't share the object-level beliefs at hand! I think people who worked to promulgate communism in the late 19th century were not unusually evil, for instance.

Literature Review: Risks of MDMA

Drake Thomas4d60

I asked Claude to research it and looked at some of the studies it turned up.

Ma et al administer (nothing, MDMA, DXM 5 minutes before MDMA) to 3 monkeys each and use radioisotopes to look for sertonergic abnormality 24 and 30 months after administration. The DXM sure looks like it helps but also there are literally just three monkeys in this image, maybe the middle guy just didn't have very many serotonin transporters to start with?? Maybe I can email the authors and get the raw data.

(Also they give these monkeys a lot of DXM - these are not cough suppressant dosages, the monkeys are probably tripping balls even without the MDMA. And DXM+MDMA is generally considered a dangerous combination - if you look up trip reports on Erowid they reliably describe having a bad time that lasts for days - so I would not try this on humans.)

Malberg et al find that after giving rats high-dose MDMA (recreational dose would be like 1mg/kg and they do 20 or 40) and autopsying brains 2 weeks later, serotonin levels were much lower on the rats who spent their MDMA trip in a high-temperature room than in a low-temperature room. 8 rats for each of the 18 experimental conditions.

Aguirre et al find that alpha-lipolic acid cuts serotonin decrease in half a week after administration, though they used a temperature-controlled room at 22ºC which the above study claims doesn't decrease serotonin, so idk. They also use 8 rats per experimental condition, but they only have 4 such conditions so I trust them a bit less.

Shankaran et al find that vitamin C substantially decreases 2,3-DHBA during the trip but don't look at effects afterwards.

As far as Claude could tell, no one's ever done human RCTs for MDMA mitigations and looked at cognitive or neurochemical effects afterwards.

Literature Review: Risks of MDMA

Drake Thomas4d10

Seems like a very useful animal study would be to compare high-dose or rapidly-repeated-low-dose MDMA with and without various common mitigations and supplements, and look at autopsied brain damage - I assume none of these have been done yet?

A case for courage, when speaking of AI danger

Drake Thomas4d42

Yeah, sorry - I agree that was a bit sloppy of me. I think it is very reasonable to accuse people working at major AI labs of something like negligence / willful ignorance, and I agree that can be a pretty serious moral failing (indeed I think it's plausibly the primary moral failing of many AI lab employees). My objection is more to the way the parent comment is connoting "evil" just from one's employer leading to bad outcomes as if those outcomes are the known intent of such employees.

A case for courage, when speaking of AI danger

Drake Thomas4d101

I think this is a pretty unhelpful frame. Most people working at an AI lab are somewhere between "person of unremarkable moral character who tells themselves a vague story about how they're doing good things" and "deeply principled person trying their best to improve the world as best they can". I think working at an AI lab requires less failure of moral character than, say, working at a tobacco company, for all that the former can have much worse effects on the world.

There are a few people I think it is fair to describe as actively morally bad, and willfully violating deontology - it seems likely to me that this is true of Sam Altman, for instance - but I think "evil" is just not a very helpful word here, will not usefully model the actions of AI lab employees, and will come across as obviously disingenuous to anyone who hears such rhetoric if they actually interact with any of the people you're denigrating. If you had to be evil to end the world, the world would be a lot safer!

I think it's fine and good to concentrate moral opprobrium at specific actions people take that are unprincipled or clear violations of deontology - companies going back on commitments, people taking on roles or supporting positions that violate principles they've previously expressed, people making cowardly statements that don't accurately reflect their beliefs for the sake of currying favor. I think it's also fine and good to try and convince people that what they're doing is harmful, and that they should quit their jobs or turn whistleblower or otherwise change course. But the mere choice of job title is usually not a deontology violation for these people, because they don't think it has the harms to the world you think it does! (I think at this point it is probably somewhat of a deontological violation to work in most roles at OpenAI or Meta AI even under typical x-risk-skeptical worldviews, but only one that indicates ethical mediocrity rather than ethical bankruptcy.)

(For context, I work on capabilities at Anthropic, because I think that reduces existential risk on net; I think there's around a 25% chance that this is a horrible mistake and immensely harmful for the world. I think it's probably quite bad for the world to work on capabilities at other AI labs.)

Drake Thomas's Shortform

Drake Thomas4d120

Assorted followup thoughts:

There are nonzero transaction costs to specifying your price in the first place.
This is probably too complicated to explain to the general population.
In practice the survey-giver doesn't have unbounded bankroll so they'll have to cap payouts at some value and give up on survey-takers who quote prices that are too high. I think it's fine if they do this dynamically based on how much they've had to spend so far?
You can tweak the function from stated price to payment amount and probability of selection here - eg one thing you can do is collect data with probability proportional to for $k > 1$ and pay them $\frac{k Y}{k - 1}$ . I haven't thought a lot about which functions have especially good properties here; it might be possible to improve substantially on the formula I gave here.
You might, as a survey-giver, end up in a situation where you have more than enough datapoints from $10-value respondents and you care about getting high-value-of-time datapoints by a large margin, like you'd happily pay $1000 to get a random $100-value-of-time respondent but you need to pay $5000 in total rewards to get enough people that you locate a $100-value-of-time respondent who you end up actually selecting.
- Intuitively it feels like there should be some way for you to just target these high-value-of-time respondents, but I think this is fundamentally kind of impossible? Let's suppose 5% of respondents hate surveys, and disvalue the survey at $100, while the rest don't mind surveys and would do it for free. Any strategy which ends up collecting a rich respondent has to pay them at least $100, which means that in order to distinguish such respondents you have to make it worth a survey-liker's time to not pretend to be a survey-hater. So every survey-liker needs to be paid at least $100*p(you end up surveying someone who quotes you $100) in expectation, which means you have to dole out at least $1900 to survey-likers before you can get your survey-hater datapoint.
- Having said the above though, I think it might be fair game to look for observed patterns in time value from data you already have (eg maybe you know which zip codes have more high-value-of-time respondents) and disproportionately target those types of respondents? But now you're introducing new sources of bias into your data, which you could again correct with stochastic sampling and inverse weighting, and it'd be a question of which source of bias/noise you're more worried about in your survey.

Drake Thomas's Shortform

Drake Thomas4d1185

Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).

One thing you could do is offer to pay everyone dollars for data collection. But this will only capture the people whose cost of providing data is below $X$ , which will distort your sample.

Here's another proposal: ask everyone for their fair price to provide the data. If they quote you $Y$ , pay them $2 Y$ to collect the data with probability ${(\frac{X}{2 Y})}^{2}$ , or $X$ with certainty if they quote you a value less than $X / 2$ . (If your RNG doesn't return yes, do nothing.) Then upweight the data from your randomly-chosen respondents in inverse proportion to the odds that they were selected. You can do a bit of calculus to see that this scheme incentivizes respondents to quote their fair value, and will provide an expected surplus of $max (X^{2} / 4 Y, X - Y)$ dollars to a respondent who disvalues providing data at $Y$ .

Now you have an unbiased sample of your population and you'll pay at most $N X$ dollars in expectation if you reach out to $N$ people. The cost is that you'll have a noisier sample of the high-reluctance population, but that's a lot better than definitely having none of that population in your study.

evhub's Shortform

Drake Thomas17d30

Coming to this comment late, but off the cuff takes:

Seems good.
Unclear to me if the decisionmaking friction of implementing this now is worth the costs; agree there are some worlds where you want to do it when things really heat up.
Not a lawyer but my guess is that there's no way to do this which doesn't let the US legal system access the recordings before 10y are up in the event of litigation.
Kinda torn on this; I think this is what organizations in a sane world should do, but on actual Earth I don't know if it's worth the weirdness points. I care a lot about Anthropic spending down its credibility as a normal serious organization on actions that have a low ROI, and I think the audience that would actually understand and respect this in the right way is pretty small.
This sort of stuff generally seems good, haven't thought about the doc's particular proposal in detail. I think stuff like welfare reports in the Claude 4 model card are moving in this direction already.
Agree you want this option, not sure what concrete actions you think would be good besides "think about the fact that you might do this"?
Yep, would happy about more of this on the margin.
I think as of June 2025 A\ has basically done this?
Agree this would be good.

Racial Dating Preferences and Sexual Racism

Drake Thomas18d40

Thanks for writing this up! Two questions I'm curious if you have good data on:

How do silent preferences line up with stated preferences? Eg "average attractiveness ratings assigned on a survey" vs "what people put in their dating profile". I assume more people will have strong revealed preferences than will have strong stated preferences, but I'm curious by how much and if it ends up being in somewhat different directions (and if there are people who state a preference but seem to be wrong about themselves, or for whom that preference isn't a function of their assessments of individual attractiveness).
I know you deliberately simplified to broad categories for this post (I assume also because a lot of data is on eg dating profiles where people use broad categories to describe their preferences), but if you know of more granular data on how the weird asymmetries among Asian people break down, I'd be curious - I assume the full story is a lot more complicated.

Histograms are to CDFs as calibration plots are to...

Drake Thomas1mo40

I think something's off in the log-odds plot here? It shouldn't be bounded below by 0, log-odds go from -inf to +inf.