LESSWRONG
LW

All of Tamsin Leake's Comments + Replies

10M$ sounds like it'd be a lot for PauseAI-type orgs imo, though admittedly this is not a very informed take.

Anyways, I stand by my comment; I expect throwing money at PauseAI-type orgs is better utility per dollar than nvidia even after taking into account that investing in nvidia to donate to PauseAI later is a possibility.

6Anthony Bailey2mo

Pause AI has a lot of opportunity for growth. Especially the “increase public awareness” lever is hugely underfunded. Almost no paid staff or advertising budget. Our game plan is simple but not naive, and is most importantly a disjunct, value-add bet. Please help us execute it well: explore, join, talk with us, donate whatever combination of time, skills, ideas and funds makes sense (Excuse dearth of kudos, am not a regular LW person, just an old EA adjacent nerd who quit Amazon to volunteer full-time for the movement.)

5nc2mo

I do think it's conceptually nicer to donate to PauseAI now rather than rely on the investment appreciating enough to offset the time-delay in donation. Not that it's necessarily the wrong thing to do, but it injects a lot more uncertainty into the model that is difficult to quantify.

Orienting to 3 year AGI timelines

Tamsin Leake2mo*144

Thoroughly agree except for what to do with money. I expect that throwing money at orgs that are trying to slow down AI progress (eg PauseAI, or better if someone makes something better) gets you more utility per dollar than nvidia (and also it's more ethical).

Edit: to be clear, I mean actual utility in your utility function. Even if you're fully self-interested and not altruistic at all, I still think your interests are better served by donating to PauseAI-type orgs than investing in nvidia.

3Nikola Jurkovic2mo

I think if the question is "what do I do with my altruistic budget," then investing some of it to cash out later (with large returns) and donate much more is a valid option (as long as you have systems in place that actually make sure that happens). At small amounts (<$10M), I think the marginal negative effects on AGI timelines and similar factors are basically negligible compared to other factors.

Tamsin Leake2mo42

"why would they be doing that?"

same reason people make poor decisions all the time. if they had a clear head and hadn't already sunk some cost into AI, they could see that working on AI might make them wealthy in the short term but it'll increase {the risk that they die soon} enough that they go "not worth it", as they should. but once you're already working in AI stuff, it's tempting and easy to retroactively justify why doing that is safe. or to just not worry about it and enjoy the money, even though if you thought about the impact of your actions on your own survival in the next few years you'd decide to quit.

at least that's my vague best guess.

Tamsin Leake2mo92

people usually think of corporations as either {advancing their own interests and also the public's interests} or {advancing their own interests at cost to the public} — ime mostly the latter. what's actually going on with AI frontier labs, i.e. {going against the interests of everyone including themselves}, is very un-memetic and very far from the overton window.

in fiction, the heads of big organizations are either good (making things good for everyone) or evil (worsening everyone else's outcomes, but improving their own). most of the time, just evil. ver... (read more)

4Morphism2mo

Moral Maze dynamics push corporations not just to pursue profit at all other costs, but also to be extremely myopic. As long as the death doesn't happen before the end of the quarter, the big labs, being immoral mazes, have no reason to give a shit about x-risk. Of course, every individual member of a big lab has reason to care, but the organization as an egregore does not (and so there is strong selection pressure for these organizations to have people that have low P(doom) and/or don't (think they) value the future lives of themselves and others).

Vanessa Kosoy2mo110

There are plenty examples in fiction of greed and hubris leading to a disaster that takes down its own architects. The dwarves who mined too deep and awoke the Balrog, the creators of Skynet, Peter Isherwell in "Don't Look Up", Frankenstein and his Creature...

4Tamsin Leake2mo

"why would they be doing that?" same reason people make poor decisions all the time. if they had a clear head and hadn't already sunk some cost into AI, they could see that working on AI might make them wealthy in the short term but it'll increase {the risk that they die soon} enough that they go "not worth it", as they should. but once you're already working in AI stuff, it's tempting and easy to retroactively justify why doing that is safe. or to just not worry about it and enjoy the money, even though if you thought about the impact of your actions on your own survival in the next few years you'd decide to quit. at least that's my vague best guess.

(draft) Cyborg software should be open (?)

Tamsin Leake3mo61

^^ Why wouldn't people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?

You imply a cyborg tool is a "powerful unaligned AI", it's not, it's a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human

I was making a more general argument that applies mainly to powerful AI but also to all other things that might help one build powerful AI (such as: insights about AI, cyborg tools, etc). These things-that-help have the... (read more)

1AtillaYasar3mo

Things I learned/changed my mind about thanks to your reply: 1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research. o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model's weights 4 (ish?) years later by informing its training. 2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable. This is a crux in my opinion: I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day. (But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn't hinge on narrowing down this ratio, it should overwhelmingly favor alignment research) Cheers.

(draft) Cyborg software should be open (?)

Tamsin Leake3mo107

I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it". It only takes one strong enough one of the former to kill everyone, and the latter is gonna have a very hard time stopping all of them.

By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to... (read more)

1AtillaYasar3mo

(edit: thank you for your comment! I genuinely appreciate it.) """I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it".""" ^^ Why wouldn't people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building? I agree that in general, cyborg tools increase hype about the black boxes and will accelerate timelines. But it still reduces discourse lag. And part of what's bad about accelerating timelines is that you don't have time to talk to people and build institutions --- and, reducing discourse lag would help with that. """By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they'd also be more likely to think "hmm maybe I shouldn't build unaligned powerful AI").""" ^^ You make 3 assumptions that I disagree with: 1) Only reasonable people who think hard about AI safety will understand the power of cyborgs 2) You imply a cyborg tool is a "powerful unaligned AI", it's not, it's a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human 3) That people won't eventually find out. One obvious way is that a weak superintelligence will just build it for them. (I should've made this explicit, that I believe that capabilities overhang is temporary, that inevitably "the dam will burst", that then humanity will face a level of power they're unaware of and didn't get a chance to coordinate against. (And again, why assume it would be in the hands of the good guys?))

The Hopium Wars: the AGI Entente Delusion

Tamsin Leake4mo156

Even if tool AI is controllable, tool AI can be used to assist in building non-tool AI. A benign superassistant is one query away from outputting world-ending code.

Max Tegmark4mo100

Right, Tamsin: so reasonable safety standards would presumably ban fully unrestricted superassistants too, but allow more limited assistants that could still be incredibly helpful. I'm curious what AI safety standards you'd propose – it's not a hypothetical question, since many politicians would like to know.

If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?

Tamsin Leake4mo18-6

In my opinion the hard part would not be figuring out where to donate to {decrease P(doom) a lot} rather than {decrease P(doom) a little}, but figuring out where to donate to {decrease P(doom)} rather than {increase P(doom)}.

1KvmanThinking4mo

so, don't donate to people who will take my money and go buy OpenAI more supercomputers while thinking that they're doing a good thing? and even if I do donate to some people who work on alignment, they might publish it and make OpenAI even more confident that by the time they finish we'll have it under control? or some other weird way donating might increase P(doom) that I haven't even thought of? that's a good point now i really don't know what to do

Shortform

Tamsin Leake5mo30

(oops, this ended up being fairly long-winded! hope you don't mind. feel free to ask for further clarifications.)

There's a bunch of things wrong with your description, so I'll first try to rewrite it in my own words, but still as close to the way you wrote it (so as to try to bridge the gap to your ontology) as possible. Note that I might post QACI 2 somewhat soon, which simplifies a bunch of QACI by locating the user as {whatever is interacting with the computer the AI is running on} rather than by using a beacon.

A first pass is to correct your descriptio... (read more)

4Cleo Nardo5mo

Thanks Tamsin! Okay, round 2. My current understanding of QACI: 1. We assume a set Ω of hypotheses about the world. We assume the oracle's beliefs are given by a probability distribution μ∈ΔΩ. 2. We assume sets Q and A of possible queries and answers respectively. Maybe these are exabyte files, i.e. Q≅A≅{0,1}N for N=260. 3. Let Φ be the set of mathematical formula that Joe might submit. These formulae are given semantics eval(ϕ):Ω×Q→ΔA for each formula ϕ∈Φ.[1] 4. We assume a function H:Ω×Q→ΔΦ where H(α,q)(ϕ)∈[0,1] is the probability that Joe submits formula ϕ after reading query q, under hypothesis α.[2] 5. We define QACI:Ω×Q→ΔA as follows: sample ϕ∼H(α,q), then sample a∼eval(ϕ)(α,q), then return a. 6. For a fixed hypothesis α, we can interpret the answer a∼QACI(α,‘‘Best utility function?")as a utility function uα:Π→R via some semantics eval-u:A→(Π→R). 7. Then we define u:Π→R via integrating over μ, i.e. u(π):=∫uα(π)dμ(α). 8. A policy π∈Π is optimal if and only if π∗∈argmaxΠ(u). The hope is that μ, eval, eval-u, and H can be defined mathematically. Then the optimality condition can be defined mathematically. Question 0 What if there's no policy which maximises u:Π→R? That is, for every policy π there is another policy π′ such that u(π′)>u(π). I suppose this is less worrying, but what if there are multiple policies which maximises u? Question 1 In Step 7 above, you average all the utility functions together, whereas I suggested sampling a utility function. I think my solution might be safer. Suppose the oracle puts 5% chance on hypotheses such that QACI(α,−) is malign. I think this is pretty conservative, because Solomonoff predictor is malign, and some of the concerns Evhub raises here. And the QACI amplification might not preserve benignancy. It follows that, under your solution, u:Π→R is influenced by a coalition of malign agents, and similarly π∗∈argmax(u) is influenced by the malign coalition. By contrast, I suggest sampling α∼μ and then finding

The case for more Alignment Target Analysis (ATA)

Tamsin Leake5mo60

Hi !

ATA is extremely neglected. The field of ATA is at a very early stage, and currently there does not exist any research project dedicated to ATA. The present post argues that this lack of progress is dangerous and that this neglect is a serious mistake.

I agree it's neglected, but there is in fact at least one researh project dedicated to at least designing alignment targets: the part of the formal alignment agenda dedicated to formal outer alignment, which is the design of math problems to which solutions would be world-saving. Our notable attempts ... (read more)

1ThomasCederborg5mo

I think I see your point. Attempting to design a good alignment target could lead to developing intuitions that would be useful for ATA. A project trying to design an alignment target might result in people learning skills that allows them to notice flaws in alignment targets proposed by others. Such projects can therefore contribute to the type of risk mitigation that I think is lacking. I think that this is true. But I do not think that such projects can be a substitute for an ATA project with a risk mitigation focus. Regarding Orthogonal: It is difficult for me to estimate how much effort Orthogonal spends on different types of work. But it seems to me that your published results are mostly about methods for hitting alignment targets. This also seems to me to be the case for your research goals. If you are successful, it seems to me that your methods could be used to hit almost any alignment target (subject to constraints related to finding individuals that want to hit specific alignment targets). I appreciate you engaging on this, and I would be very interested in hearing more about how the work done by Orthogonal could contribute to the type of risk mitigation effort discussed in the post. I would, for example, be very happy to have a voice chat with you about this.

Being nicer than Clippy

Tamsin Leake5mo40

I wonder how much of those seemingly idealistic people retained power when it was available because they were indeed only pretending to be idealistic. Assuming one is actually initially idealistic but then gets corrupted by having power in some way, one thing someone can do in CEV that you can't do in real life is reuse the CEV process to come up with even better CEV processes which will be even more likely to retain/recover their just-before-launching-CEV values. Yes, many people would mess this up or fail in some other way in CEV; but we only need one

... (read more)

4Wei Dai5mo

Why do you think this, and how would you convince skeptics? And there are two separate issues here. One is how to know their CEV won't be corrupted relative to what their values really are or should be, and the other is how to know that their real/normative values are actually highly altruistic. It seems hard to know both of these, and perhaps even harder to persuade others who may be very distrustful of such person/group from the start. Would be interested in understanding your perspective on this better. I feel like aside from AI, our world is not being eaten by molochs very quickly, and I prefer something like stopping AI development and doing (voluntary and subsidized) embryo selection to increase human intelligence for a few generations, then letting the smarter humans decide what to do next. (Please contact me via PM if you want to have a chat about this.)

Being nicer than Clippy

Tamsin Leake5mo20

the main arguments for the programmers including all of [current?] humanity in the CEV "extrapolation base" […] apply symmetrically to AIs-we're-sharing-the-world-with at the time

I think timeless values might possibly help resolve this; if some {AIs that are around at the time} are moral patients, then sure, just like other moral patients around they should get a fair share of the future.

If an AI grabs more resources than is fair, you do the exact same thing as if a human grabs more resources than is fair: satisfy the values of moral patients (including... (read more)

Being nicer than Clippy

Tamsin Leake5mo20

trying to solve morality by themselves

It doesn't have to be by themselves; they can defer to others inside CEV, or come up with better schemes that their initial CEV inside CEV and then defer to that. Whatever other solutions than "solve everything on your own inside CEV" might exist, they can figure those out and defer to them from inside CEV. At least that's the case in my own attempts at implementing CEV in math (eg QACI).

2Wei Dai5mo

1. Once they get into CEV, they may not want to defer to others anymore, or may set things up with a large power/status imbalance between themselves and everyone else which may be detrimental to moral/philosophical progress. There are plenty of seemingly idealistic people in history refusing to give up or share power once they got power. The prudent thing to do seems to never get that much power in the first place, or to share it as soon as possible. 2. If you're pretty sure you will defer to others once inside CEV, then you might as well do it outside CEV due to #1 in my grandparent comment.

Lucius Bushnaq's Shortform

Tamsin Leake5mo40

Seems really wonky and like there could be a lot of things that could go wrong in hard-to-predict ways, but I guess I sorta get the idea.

I guess one of the main things I'm worried about is that it seems to require that we either:

Be really good at timing when we pause it to look at its internals, such that we look at the internals after it's had long enough to think about things that there are indeed such representations, but not long enough that it started optimizing really hard such that we either {die before we get to look at the internals} or {the int

... (read more)

Lucius Bushnaq's Shortform

Tamsin Leake5mo40

So the formalized concept is Get_Simplest_Concept_Which_Can_Be_Informally_Described_As("QACI is an outer alignment scheme consisting of…") ? Is an informal definition written in english?

It seems like "natural latent" here just means "simple (in some simplicity prior)". If I read the first line of your post as:

Has anyone thought about QACI could be located in some simplicity prior, by searching the prior for concepts matching(??in some way??) some informal description in english?

It sure sounds like I should read the two posts you linked (perhaps especia... (read more)

6Lucius Bushnaq5mo

More like the formalised concept is the thing you get if you poke through the AGI’s internals searching for its representation of the concept combination pointed to by an english sentence plus simulation code, and then point its values at that concept combination.

Lucius Bushnaq's Shortform

Tamsin Leake5mo130

To me kinda the whole point of QACI is that it tries to actually be fully formalized. Informal definitions seem very much not robust to when superintelligences think about them; fully formalized definitions are the only thing I know of that keep meaning the same thing regardless of what kind of AI looks at it or with what kind of ontology.

I don't really get the whole natural latents ontology at all, and mostly expect it to be too weak for us to be able to get reflectively stable goal-content integrity even as the AI becomes vastly superintelligent. If defi... (read more)

8Lucius Bushnaq5mo

The idea would be that an informal definition of a concept conditioned on that informal definition being a pointer to a natural concept, is ≈ a formal specification of that concept. Where the ≈ is close enough to a = that it'd hold up to basically arbitrary optimization power.

Epistemic states as a potential benign prior

Tamsin Leake5mo20

The knightian in IB is related to limits of what hypotheses you can possibly find/write down, not - if i understand so far - about an adversary. The adversary stuff is afaict mostly to make proofs work.

I don't think this makes a difference here? If you say "what's the best not-blacklisted-by-any-knightian-hypothesis action", then it doesn't really matter if you're thinking of your knightian hypotheses as adversaries trying to screw you over by blacklisting actions that are fine, or if you're thinking of your knightian hypotheses as a more abstract worst... (read more)

Morphism's Shortform

Tamsin Leake6mo20

This is indeed a meaningful distinction! I'd phrase it as:

Values about what the entire cosmos should be like
Values about what kind of places one wants one's (future) selves to inhabit (eg, in an internet-like upload-utopia, "what servers does one want to hang out on")

"Global" and "local" is not the worst nomenclature. Maybe "global" vs "personal" values? I dunno.

my best idea is to call the former "global preferences" and the latter "local preferences", but that clashes with the pre-existing notion of locality of preferences as the quality of termina

... (read more)

Wei Dai's Shortform

Tamsin Leake6mo78

That is, in fact, a helpful elaboration! When you said

Most people who "work on AI alignment" don't even think that thinking is a thing.

my leading hypotheses for what you could mean were:

Using thought, as a tool, has not occured to most such people
Most such people have no concept whatsoever of cognition as being a thing, the way people in the year 1000 had no concept whatsoever of javascript being a thing.

Now, instead, my leading hypothesis is that you mean:

Most such people are failing to notice that there's an important process, called "thinking

... (read more)

quila's Shortform

Tamsin Leake7mo53

To be more precise: extrapolated over time, for any undesired selection process or other problem of that kind, either the problem is large enough that it gets exarcerbated over time so much that it eats everything — and then that's just extinction, but slower — or it's not large enough to win out and aligned superintelligence(s) + coordinated human action is enough to stamp it out in the long run, which means they won't be an issue for almost all of the future.

It seems like for a problem to be just large enough that coordination doesn't stamp it away, but ... (read more)

quila's Shortform

Tamsin Leake8mo*42

single-use

Considering how loog it took me to get that by this you mean "not dual-use", I expect some others just won't get it.

Tamsin Leake8mo61

Some people who are very concerned about suffering might be considering building an unaligned AI that kills everyone just to avoid the risk of an AI takeover by an AI aligned to values which want some people to suffer.

Let this be me being on the record saying: I believe the probability of {alignment to values that strongly diswant suffering for all moral patients} is high enough, and the probability of {alignment to values that want some moral patients to suffer} is low enough, that this action is not worth it.

I think this applies to approximately anyone w... (read more)

3quila8mo

I've replied to/written my current beliefs about this subject here

Non-Disparagement Canaries for OpenAI

Tamsin Leake8mo3-1

sigh I wish people realized how useless it is to have money when the singularity happens. Either we die or we get a utopia in which it's pretty unlikely that pre-singularity wealth matters. What you want to maximize is not your wealth but your utility function, and you sure as hell are gonna get more from LDT handshakes with aligned superintelligences in saved worlds, if you don't help OpenAI reduce the amount of saved worlds.

4Thomas Kwa8mo

I care about my wealth post-singularity and would be wiling to make bets consistent with this preference, e.g. I pay 1 share of QQQ now, you pay me 3 shares of QQQ 6 months after the world GDP has 10xed if we are not all dead then.

-4RedMan8mo

Based on your recent post here: https://www.lesswrong.com/posts/55rc6LJcqRmyaEr9T/please-stop-publishing-ideas-insights-research-about-ai Can I mark you down as in favor of AI related NDAs? In your ideal world, would a perfect solution be for a single large company to hire all the capable AI researchers, give them aggressive non disclosure and non compete agreements, then shut down every part of the company except the legal department that enforces the agreements?

the gears to ascension8mo1710

downvote and agree. but being financially ruined makes it harder to do other things, and it's probably pretty aversive to go through even if you expect things to turn out better in expectation because of it. the canaries thing seems pretty reasonable to me in light of this.

TsviBT8mo1610

I wish you would realize that whatever we're looking at, it isn't people not realizing this.

Tamsin Leake9mo74

I believe that ChatGPT was not released with the expectation that it would become as popular as it did.

Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.

mesaoptimizer9mo5412

You continue to model OpenAI as this black box monolith instead of trying to unravel the dynamics inside it and understand the incentive structures that lead these things to occur. Its a common pattern I notice in the way you interface with certain parts of reality.

I don't consider OpenAI as responsible for this as much as Paul Christiano and Jan Leike and his team. Back in 2016 or 2017, when they initiated and led research into RLHF, they focused on LLMs because they expected that LLMs would be significantly more amenable to RLHF. This means that instruct... (read more)

Tamsin Leake9mo2931

I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

1Shelby Stryker9mo

I disagree. This whole saga has introduced the Effective Altruism movement to people at labs that hadn't thought about alignment. From my understanding openai isn't anywhere close to breaking even from chatgpt and I can't think of any way a chatbot could actually be monetized.

1Ebenezer Dukakis9mo

In the spirit of trying to understand what actually went wrong here -- IIRC, OpenAI didn't expect ChatGPT to blow up the way it did. Seems like they were playing a strategy of "release cool demos" as opposed to "create a giant competitive market".

2Garrett Baker9mo

Who is updating? I haven't seen anyone change their mind yet.

Amalthea9mo161

Half a year ago, I'd have guessed that OpenAI leadership, while likely misguided, was essentially well-meaning and driven by a genuine desire to confront a difficult situation. The recent series of events has made me update significantly against the general trustworthiness and general epistemic reliability of Altman and his circle. While my overall view of OpenAI's strategy hasn't really changed, my likelihood of them possibly "knowing better" has dramatically gone down now.

mesaoptimizer9mo1011

I still parse that move as devastating the commons in order to make a quick buck.

I believe that ChatGPT was not released with the expectation that it would become as popular as it did. OpenAI pivoted hard when it saw the results.

Also, I think you are misinterpreting the sort of 'updates' people are making here.

What do you value ?

Tamsin Leake9mo40

I made guesses about my values a while ago, here.

Tamsin Leake9mo44

but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

Rather, " us" — the good alignment researchers who will be careful at all about the long term effects of our actions, unlike capabilities researchers who are happy to accelerate race dynamics and increase p(doom) if they make a quick profit out of it in the short term.

Zack_M_Davis9mo2413

I think these judgements would benefit from more concreteness: that rather than proposing a dichotomy of "capabilities research" (them, Bad) and "alignment research" (us, Good), you could be more specific about what kinds of work you want to see more and less of.

I agree that (say) Carmack and Sutton are doing a bad thing by declaring a goal to "build AGI" while dismissing the reasons that this is incredibly dangerous. But the thing that makes infohazard concerns so fraught is that there's a lot of work that potentially affects our civilization's trajectory... (read more)

Tamsin Leake9mo*131

I am a utilitarian and agree with your comment.

The intent of the post was

to make people weigh whether to publish or not, because I think some people are not weighing this enough
to give some arguments in favor of "you might be systematically overestimating the utility of publishing", because I think some people are doing that

I agree people should take the utilitalianly optimal action, I just think they're doing the utilitarian calculus wrong or not doing the calculus at all.

Tamsin Leake9mo95

I think research that is mostly about outer alignment (what to point the AI to) rather than inner alignment (how to point the AI to it) tends to be good — quantilizers, corrigibility, QACI, decision theory, embedded agency, indirect normativity, infra bayesianism, things like that. Though I could see some of those backfiring the way RLHF did — in the hands of a very irresponsible org, even not very capabilities-related research can be used to accelerate timelines and increase race dynamics if the org doing it thinks it can get a quick buck out of it.

4mako yass9mo

You think that studying agency and infrabayesianism wont make small contributions to capabilities? Even just saying "agency" in the context of AI makes capabilities progress.

3Morphism9mo

I could see embedded agency being harmful though, since an actual implementation of it would be really useful for inner alignment

Tamsin Leake9mo181

I don't buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs

I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.

That said, if someone hasn't thought at all about concepts like "differentially advancing safety" or "capabilities externalities," then reading this post would probably be helpful, and I'd endorse thinking about those issues.

That's... (read more)

ryan_greenblatt9mo2922

I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.

There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.

Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)

Then, if all the safety people switched to working full time on maximally advancing capabilities, this woul... (read more)

Tamsin Leake9mo94

One straightforward alternative is to just not do that; I agree it's not very satisfying but it should still be the action that's pursued if it's the one that has more utility.

I wish I had better alternatives, but I don't. But the null action is an alternative.

Tamsin Leake9mo20

It certainly is possible! In more decision-theoritic terms, I'd describe this as "it sure would suck if agents in my reference class just optimized for their own happiness; it seems like the instrumental thing for agents in my reference class to do is maximize for everyone's happiness". Which is probly correct!

But as per my post, I'd describe this position as "not intrinsically altruistic" — you're optimizing for everyone's happiness because "it sure would sure if agents in my reference class didn't do that", not because you intrinsically value that everyone be happy, regardless of reasoning about agents and reference classes and veils of ignorance.

Tamsin Leake10mo4824

decision theory is no substitute for utility function

some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:

my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.

it's possible that this is true for some people, but in general i expect that to be a mistaken anal... (read more)

1Morphism9mo

What about the following: My utility function is pretty much just my own happiness (in a fun-theoretic rather than purely hedonistic sense). However, my decision theory is updateless with respect to which sentient being I ended up as, so once you factor that in, I'm a multiverse-wide realityfluid-weighted average utilitarian. I'm not sure how correct this is, but it's possible.

7Viliam10mo

ah, it also annoys me when people say that caring about others can only be instrumental. what does it even mean? helping other people makes me feel happy. watching a nice movie makes me feel happy. the argument that I don't "really" care about other people would also prove that I don't "really" care about movies etc. I am happy for the lucky coincidence that decision theories sometimes endorse cooperation, but I would probably do that regardless. for example, if I had an option to donate something useful to million people, or sell it to dozen people, I would probably choose the former option even if it meant no money for me. (and yes, I would hope there would be some win/win solution, such as the million people paying me via Kickstarter. but in the inconvenient universe where Kickstarter is somehow not an option, I am going to donate anyway.)

1MinusGix10mo

I agree, though I haven't seen many proposing that, but also see So8res' Decision theory does not imply that we get to have nice things, though this is coming from the opposite direction (with the start being about people invalidly assuming too much out of LDT cooperation) Though for our morals, I do think there's an active question of which pieces we feel better replacing with the more formal understanding, because there isn't a sharp distinction between our utility function and our decision theory. Some values trump others when given better tools. Though I agree that replacing all the altruism components is many steps farther than is the best solution in that regard.

mako yass10mo*12-1

An interesting question for me is how much true altruism is required to give rise to a generally altruistic society under high quality coordination frameworks. I suspect it's quite small.

Another question is whether building coordination frameworks to any degree requires some background of altruism. I suspect that this is the case. It's the hypothesis I've accreted for explaining the success of post-war economies (guessing that war leads to a boom in nationalistic altruism, generally increased fairness and mutual faith).

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")

Tamsin Leake10mo*112

I would feel better about this if there was something closer to (1) on which to discuss what is probably the most important topic in history (AI alignment). But noted.

7Ruby10mo

Over the years the idea of a closed forum for more sensitive discussion has been raised, but never seemed to quite make sense. Significant issues included: - It seems really hard or impossible to make it secure from nation state attacks - It seems that members would likely leak stuff (even if it's via their own devices not being adequately secure or what) I'm thinking you can get some degree of inconvenience (and therefore delay), but hard to have large shared infrastructure that's that secure from attack.

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")

Tamsin Leake10mo*186

I'm generally not a fan of increasing the amount of illegible selection effects.

On the privacy side, can lesswrong guarantee that, if I never click on Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?

2kave10mo

I am sad to see you getting so downvoted. I am glad you are bringing this perspective up in the comments.

9Ruby10mo

Typo? Do you mean "click on Recommended"? I think the answer is no, in order to have recommendations for individuals (and everyone), they have browsing data. 1) LessWrong itself doesn't aim for a super high degree of infosec. I don't believe our data is sensitive to warrant large security overhead. 2) I trust Recombee with our data about as much as our trust ourselves to not have a security breach. Maybe actually I could imagine LessWrong being of more interest to someone or some group and getting attacked. It might help to understand what your specific privacy concerns are.

Tamsin Leake10mo14-2

Here the thing that I'm calling evil is pursuing short-term profits at the cost of non-negligeably higher risk that everyone dies.

Tamsin Leake10mo8-16

Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety.

There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies.

(No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)

2kave10mo

This seems insufficiently argued; the existence of any alignment research that can be done without huge profits is not enough to establish that you don't need huge profits to solve alignment (particularly when considering things like how long timelines are even absent your intervention). To be clear, I agree that OpenAI are doing evil by creating AI hype.

7Morphism10mo

OpenAI is not evil. They are just defecting on an epistemic prisoner's dilemma.

dr_s10mo110

It's generally also very questionable that they started creating models for research, then seamlessly pivoted to commercial exploitation without changing any of their practices. A prototype meant as proof of concept isn't the same as a safe finished product you can sell. Honestly, only in software and ML we get people doing such shoddy engineering.

2Zach Stein-Perlman10mo

This is too strong. For example, releasing the product would be correct if someone else would do something similar soon anyway and you're safer than them and releasing first lets you capture more of the free energy. (That's not the case here, but it's not as straightforward as you suggest, especially with your "Regardless of how good their alignment plans are" and your claim "There's just no good reason to do that, except short-term greed".)

1Seth Herd10mo

This doesn't even address their stated reason/excuse for pushing straight for AGI. I don't have a link handy, but Altman has said that short timelines and a slow takeoff is a good scenario for AI safety. Pushing for AGI now raises the odds that, when we get it near it, it won't get 100x smarter or more prolific rapidly. And I think that's right, as far as it goes. It needs to be weighed against the argument for more alignment research before approaching AGI, but doing that weighing is not trivial. I don't think there's a clear winner. Now, Altman pursuing more compute with his "7T investment" push really undercuts that argument being his sincere opinion, at least now (he said bit about that a while ago, maybe 5 years?). But even if Altman was or is lying, that doesn't make that thesis wrong. This might be the safest route to AGI. I haven't seen anyone even try in good faith to weigh the complexities of the two arguments against each other. Now, you can still say that this is evil, because the obviously better path is to do decades and generations of alignment work prior to getting anywhere near AGI. But that's simply not going to happen. One reason that goes overlooked is that most human beings are not utilitarians. Even if they realize we're lowering the odds of future humans having an amazing, abundant future, they are pursuing AGI right now because it might prevent tham and many of those they love from dying painfully. This is terribly selfish from a utilitarian perspective, but reason does not cross the is/ought gap to make utilitarianism any more rational than selfishness. I think calling selfishness "evil" is ultimately correct, but it's not obvious. And by that standard, most of humanity is currently evil. And in this case, evil intentions still might have good outcomes. While OpenAI has no good alignment plan, neither does anyone else. Humanity is simply not going to pause all AI work to study alignment for generations, so plans that include substantia

1pathos_bot10mo

IMO the proportion of effort into AI alignment research scales with total AI investment. Lots of AI labs themselves do alignment research and open source/release research on the matter. OpenAI at least ostensibly has a mission. If OpenAI didn't make the moves they did, Google would have their spot, and Google is closer to the "evil self-serving corporation" archetype than OpenAI

Amalthea10mo2317

Unambiguously evil seems unnecessarily strong. Something like "almost certainly misguided" might be more appropriate? (still strong, but arguably defensible)

1O O10mo

Can we quantify the value of theoretical alignment research before and after ChatGPT? For example, mech interp research seems much more practical now. If alignment proves to be more of an engineering problem than a theoretical one, then I don’t see how you can meaningfully make progress without precursor models. Furthermore, given how nearly everyone with a lot of GPUs is getting similar results to OAI, where similar means within 1 OOM, it’s likely that in the future someone would have stumbled upon AGI with the compute of the 2030s. Let’s say their secret sauce gives them the equivalent of 1 extra hardware generation (even this is pretty generous). That’s only ~2-3 years. Meta built a $10B data center to match TikTok’s content algorithm. This datacenter meant to decide which videos to show to users happened to catch up to GPT-4! I suspect the “ease” of making GPT-3/4 informed OAI’s choice to publicize their results.

What convincing warning shot could help prevent extinction from AI?

Tamsin Leake10mo203

There's also the case of harmful warning shots: for example, if it turns out that, upon seeing an AI do a scary but impressive thing, enough people/orgs/states go "woah, AI is powerful, I should make one!" or "I guess we're doomed anyways, might as well stop thinking about safety and just enjoy making profit with AI while we're still alive", to offset the positive effect. This is totally the kind of thing that could be the case in our civilization.

2cozyfractal10mo

I agree, that's an important point. I probably worry more about your first possibility, as we are already seeing this effect today, and worry less about the second, which would require a level of resignation that I've rarely seen. Entities that are responsible would likely try to do something about it, but the ways this “we're doomed, let's profit” might happen are: * The warning shot comes from a small player and a bigger player feels urgency or feels threatened, in a situation where they have little control * There is no clear responsibility and there are many entities at the frontier, who think others are responsible and there's no way to prevent them. Another case of harmful warning shot is if the lesson learnt from it is “we need stronger AI systems to prevent this”. This probably goes in hand with a poor credit assignment.

Tamsin Leake10mo20

There could be a difference but only after a certain point in time, which you're trying to predict / plan for.

Tamsin Leake10mo20

What you propose, ≈"weigh indices by kolmogorov complexity" is indeed a way to go about picking indices, but "weigh indices by one over their square" feels a lot more natural to me; a lot simpler than invoking the universal prior twice.

4interstice10mo

I think using the universal prior again is more natural. It's simpler to use the same complexity metric for everything; it's more consistent with Solomonoff induction, in that the weight assigned by Solomonoff induction to a given (world, claw) pair would be approximately the sum of their Kolmogorov complexities; and the universal prior dominates the inverse square measure but the converse doesn't hold.

Tamsin Leake10mo30

If you use the UTMs for cartesian-framed inputs/outputs, sure; but if you're running the programs as entire worlds, then you still have the issue of "where are you in time".

Say there's an infinitely growing conway's-game-of-life program, or some universal program, which contains a copy of me at infinitely many locations. How do I weigh which ones are me?

It doesn't matter that the UTM has a fixed amount of weight, there's still infinitely many locations within it.

1quetzal_rainbow10mo

It doesn't matter? Like, if your locations are identical (say, simulations of entire observable universe and you never find any difference no matter "where" you are), your weight is exactly the weight of program. If you expect dfferences, you can select some kind of simplicity prior to weight this differences, because there is basically no difference between "list all programs for this UTM, run in parallel".

3interstice10mo

If you want to pick out locations within some particular computation, you can just use the universal prior again, applied to indices to parts of the computation.

Tamsin Leake10mo18-4

(cross-posted from my blog)

Is quantum phenomena anthropic evidence for BQP=BPP? Is existing evidence against many-worlds?

Suppose I live inside a simulation ran by a computer over which I have some control.

Scenario 1: I make the computer run the following:
```
pause simulation

if is even(calculate billionth digit of pi):
	resume simulation
```
Suppose, after running this program, that I observe that I still exist. This is some anthropic evidence for the billionth digit of pi being even.
Thus, one can get anthropic evidence about logical facts.
Scenario 2: I

... (read more)

1robo10mo

Interesting idea. I don't think using a classical Turing machine in this way would be the right prior for the multiverse. Classical Turing machines are a way for ape brains to think about computation using the circuitry we have available ("imagine other apes following these social contentions about marking long tapes of paper"). They aren't the cosmically simplest form of computation. For example, the (microscopic non-course-grained) laws of physics are deeply time reversible, where Turing machines are not. I suspect this computation speed prior would lead to Boltzmann-brain problems. Your brain at this moment might be computed at high fidelity, but everything else in the universe would be approximated for the computational speed-up.

3interstice10mo

This isn't true, you can get perfectly fine probabilities and expected utilities from ordinary Solmonoff induction(barring computability issues, ofc). The key here is that SI is defined in terms of a prefix-free UTM whose set of valid programs forms a prefix-free code, which automatically grants probabilities adding up to less than 1, etc. This issue is often glossed over in popular accounts.

LessWrong's (first) album: I Have Been A Good Bing

Tamsin Leake10mo181

I didn't see a clear indication in the post about whether the music is AI-generated or not, and I'd like to know; was there an indication I missed?

(I care because I'll want to listen to that music less if it's AI-generated.)

Daniel Kokotajlo10mo*279

Huh I had the opposite reaction -- I was listening to it and was like "meh these voices are a bit bland, the beats are too but that's fine I guess. Makes sense for an amateur band. Good effort though, and great April Fools joke." Now I'm like "wait this is AI? Cooooooool"

UPDATE: I judged them too harshly. I think the voices and beats are not bland in general, I think just for the first song or two that I happened to listen to. Also, most of the songs are growing on me as I listen to them.

Yoav Ravid10mo121

Yes, it doesn't say so explicitly, but it's very clear from the post that it is.

On expected utility, part 1: Skyscrapers and madmen

Tamsin Leake10mo20

Unlike on your blog, the images on the lesswrong version of this post are now broken.

Orthogonality Thesis seems wrong

Tamsin Leake11mo145

Taboo the word "intelligence".

An agent can superhumanly-optimize any utility function. Even if there are objective values, a superhuman-optimizer can ignore them and superhuman-optimize paperclips instead (and then we die because it optimized for that harder than we optimized for what we want).

-4Donatas Lučiūnas11mo

I am familiar with this thinking, but I find it flawed. Could you please read my comment here? Please let me know what you think.

Tamsin Leake11mo49

(I'm gonna interpret these disagree-votes as "I also don't think this is the case" rather than "I disagree with you tamsin, I think this is the case".)

Tamsin Leake11mo-1-45

I don't think this is the case, but I'm mentioning this possibility because I'm surprised I've never seen someone suggest it before:

Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he's a pure negative utilitarian (and he doesn't know-about/believe-in acausal trade).

4Tamsin Leake11mo

(I'm gonna interpret these disagree-votes as "I also don't think this is the case" rather than "I disagree with you tamsin, I think this is the case".)

Toki pona FAQ

Tamsin Leake11mo20

For writing, there's also jan misali's ASCII toki pona syllabary.

Tamsin Leake11mo*Ω4272

Reposting myself from discord, on the topic of donating 5000$ to EA causes.

if you're doing alignment research, even just a bit, then the 5000$ are probly better spent on yourself
if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself
if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if gi

... (read more)

Zac Hatfield-Dodds11moΩ82412

I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative ... (read more)

More people getting into AI safety should do a PhD

Tamsin Leake11mo20

yes, edited