Tamsin Leake

I'm Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.

Wiki Contributions


sigh I wish people realized how useless it is to have money when the singularity happens. Either we die or we get a utopia in which it's pretty unlikely that pre-singularity wealth matters. What you want to maximize is not your wealth but your utility function, and you sure as hell are gonna get more from LDT handshakes with aligned superintelligences in saved worlds, if you don't help OpenAI reduce the amount of saved worlds.

I believe that ChatGPT was not released with the expectation that it would become as popular as it did.

Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.

I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

I made guesses about my values a while ago, here.

but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

Rather, " us" — the good alignment researchers who will be careful at all about the long term effects of our actions, unlike capabilities researchers who are happy to accelerate race dynamics and increase p(doom) if they make a quick profit out of it in the short term.

I am a utilitarian and agree with your comment.

The intent of the post was

  • to make people weigh whether to publish or not, because I think some people are not weighing this enough
  • to give some arguments in favor of "you might be systematically overestimating the utility of publishing", because I think some people are doing that

I agree people should take the utilitalianly optimal action, I just think they're doing the utilitarian calculus wrong or not doing the calculus at all.

I think research that is mostly about outer alignment (what to point the AI to) rather than inner alignment (how to point the AI to it) tends to be good — quantilizers, corrigibility, QACI, decision theory, embedded agency, indirect normativity, infra bayesianism, things like that. Though I could see some of those backfiring the way RLHF did — in the hands of a very irresponsible org, even not very capabilities-related research can be used to accelerate timelines and increase race dynamics if the org doing it thinks it can get a quick buck out of it.

I don't buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs

I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.

That said, if someone hasn't thought at all about concepts like "differentially advancing safety" or "capabilities externalities," then reading this post would probably be helpful, and I'd endorse thinking about those issues.

That's a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.

One straightforward alternative is to just not do that; I agree it's not very satisfying but it should still be the action that's pursued if it's the one that has more utility.

I wish I had better alternatives, but I don't. But the null action is an alternative.

It certainly is possible! In more decision-theoritic terms, I'd describe this as "it sure would suck if agents in my reference class just optimized for their own happiness; it seems like the instrumental thing for agents in my reference class to do is maximize for everyone's happiness". Which is probly correct!

But as per my post, I'd describe this position as "not intrinsically altruistic" — you're optimizing for everyone's happiness because "it sure would sure if agents in my reference class didn't do that", not because you intrinsically value that everyone be happy, regardless of reasoning about agents and reference classes and veils of ignorance.

Load More