LESSWRONG
LW

All of yams's Comments + Replies

Preliminary thoughts from Ryan Greenblatt on this here.

[errant thought pointing a direction, low-confidence musing, likely retreading old ground]

There’s a disagreement that crops up in conversations about changing people’s minds. Sides are roughly:

You should explain things by walking someone through your entire thought process, as it actually unfolded. Changing minds is best done by offering an account of how your own mind was changed.
You should explain things by back-chaining the most viable (valid) argument, from your conclusions, with respect to your specific audience.

This first strategy invites framing you... (read more)

yams's Shortform

yams1mo10

Do you think of rationality as a similar sort of 'object' or 'discipline' to philosophy? If not, what kind of object do you think of it as being?

(I am no great advocate for academic philosophy; I left that shit way behind ~a decade ago after going quite a ways down the path. I just want to better understand whether folks consider Rationality as a replacement for philosophy, a replacement for some of philosophy, a subset of philosophical commitments, a series of cognitive practices, or something else entirely. I can model it, internally, as aiming to be any... (read more)

The Failed Strategy of Artificial Intelligence Doomers

yams1mo50

Question for Ben:

Are you inviting us to engage with the object level argument, or are you drawing attention to the existence of this argument from a not-obviously-unreasonable-source as a phenomenon we are responsible for (and asking us to update on that basis)?

On my read, he’s not saying anything new (concerns around military application are why ‘we’ mostly didn’t start going to the government until ~2-3 years ago), but that he’s saying it, while knowing enough to paint a reasonable-even-to-me picture of How This Thing Is Going, is the real tragedy.

2Ben Pace1mo

The former, but the latter is a valid response too. Someone doing a good job of painting an overall picture is a good opportunity to reflect on the overall picture and what changes to make, or what counter-arguments to present to this account.

The Field of AI Alignment: A Postmortem, and What To Do About It

yams1mo70

I think the reason nobody will do anything useful-to-John as a result of the control critique post is that control is explicitly not aiming at the hard parts of the problem, and knows this about itself. In that way, control is an especially poorly selected target if the goal is getting people to do anything useful-to-John. I'd be interested in a similar post on the Alignment Faking paper (or model organisms more broadly), on RAT, on debate, on faithful CoT, on specific interpretability paradigms (circuits v SAEs, vs some coherentist approach vs shards vs..... (read more)

The Field of AI Alignment: A Postmortem, and What To Do About It

yams1mo50

there are plenty of cases where we can look at what people are doing and see pretty clearly that it is not progress toward the hard problem

There are plenty of cases where John can glance at what people are doing and see pretty clearly that it is not progress toward the hard problem.

Importantly, people with the agent foundations class of anxieties (which I embrace; I think John is worried about the right things!) do not spend time engaging on a gears level with prominent prosaic paradigms and connecting the high level objection ("it ignores the hard part of... (read more)

5johnswentworth1mo

Big crux here: I don't actually expect useful research to occur as a result of my control-critique post. Even having updated on the discussion remaining more civil than I expected, I still expect basically-zero people to do anything useful as a result. As a comparison: I wrote a couple posts on my AI model delta with Yudkowsky and with Christiano. For each of them, I can imagine changing ~one big piece in my model, and end up with a model which looks basically like theirs. By contrast, when I read the stuff written on the control agenda... it feels like there is no model there at all. (Directionally-correct but probably not quite accurate description:) it feels like whoever's writing, or whoever would buy the control agenda, is just kinda pattern-matching natural language strings without tracking the underlying concepts those strings are supposed to represent. (Joe's recent post on "fake vs real thinking" feels like it's pointing at the right thing here; the posts on control feel strongly like "fake" thinking.) And that's not a problem which gets fixed by engaging at the object level; that type of cognition will mostly not produce useful work, so getting useful work out of such people would require getting them to think in entirely different ways. ... so mostly I've tried to argue at a different level, like e.g. in the Why Not Just... posts. The goal there isn't really to engage the sort of people who would otherwise buy the control agenda, but rather to communicate the underlying problems to the sort of people who would already instinctively feel something is off about the control agenda, and give them more useful frames to work with. Because those are the people who might have any hope of doing something useful, without the whole structure of their cognition needing to change first.

The Case Against AI Control Research

yams1mo20

If you wrote this exact post, it would have been upvoted enough for the Redwood team to see it, and they would have engaged with you similarly to how they engaged with John here (modulo some familiarity, because theyse people all know each other at least somewhat, and in some pairs very well actually).

If you wrote several posts like this, that were of some quality, you would lose the ability to appeal to your own standing as a reason not to write a post.

This is all I'm trying to transmit.

[edit: I see you already made the update I was encouraging, an hour after leaving the above comment to me. Yay!]

The Case Against AI Control Research

yams1mo10

Writing (good) critiques is, in fact, a way many people gain standing. I’d push back on the part of you that thinks all of your good ideas will be ignored (some of them probably will be, but not all of them; don’t know until you try, etc).

2Seth Herd1mo

I'm not worried about my ideas being ignored so much as actively doing harm to the group epistemics by making people irritated with my pushback, and by association, irritated with the questions I raise and therefore resistant to thinking about them. I am pretty sure that motivated reasoning does that, and it's a huge problem for progress in existing fields. More here: Motivated reasoning/confirmation bias as the most important cognitive bias LessWrong does seem way less prone to motivated reasoning. I think this is because rationalism demands actually being proud of changing your mind. This value provides resistance but not immunity to motivated reasoning. I want to write a post about this.

Grading my 2024 AI predictions

yams2mo10

More partial credit on the second to last point:

https://home.treasury.gov/news/press-releases/jy2766

Aside: I don’t think it’s just that real world impacts take time to unfold. Lately I’ve felt that evals are only very weakly predictive of impact (because making great ones is extremely difficult). Could be that models available now don’t have substantially more mundane utility (economic potential stemming from first order effects), outside of the domains the labs are explicitly targeting (like math and code), than models available 1 year ago.

Buck's Shortform

yams2mo10

Is the context on “reliable prediction and ELK via empirical route” just “read the existing ELK literature and actually follow it” or is it stuff that’s not written down? I assume you’ve omitted it to save time, and so no worries if the latter.

EDIT: I was slightly tempted to think of this also as ‘Ryan’s ranking of live agendas that aren’t control’, but I’m not sure if ‘what you expect to work conditional on delegating to AIs’ is similar to ‘what you expect to work if humans are doing most of it?’ (my guess is the lists would look similar, but with notable exceptions, eg humans pursuing GOFAI feels less viable than ML agents pursuing GOFAI)

2ryan_greenblatt2mo

Stuff that's not written down, sorry. This isn't really my ranking of live agendas that aren't control. For instance, various things focused on demonstrating/measuring risks don't appear on this list but seem pretty good. And, a bunch of these look much more compelling when you have a ton of AI labor. Also, these aren't really agendas more like targets. It is related to my ranking on non-control agendas though.

Buck's Shortform

yams2mo50

My understanding is that ~6 months ago y’all were looking for an account of the tasks an automated AI safety researcher would hopefully perform, as part of answering the strategic question ‘what’s the next step after building [controlled] AGI?’ (with ‘actually stop there indefinitely’ being a live possibility)

This comment makes me think you’ve got that account of safety tasks to be automated, and are feeling optimistic about automated safety research.

Is that right and can you share a decently mechanistic account of how automated safety research might work?... (read more)

ryan_greenblatt2mo*410

We're probably more skeptical than AI companies and less skeptical than the general lesswrong crowd. We have made such a list and thought about it in a a bit of detail (but still pretty undercooked).

My current sense is that automated AI safety/alignment research to reach a reasonable exit condition seems hard, but substantially via the mechanism that achieving the desirable exit condition is an objectively very difficult research tasks for which our best ideas aren't amazing, and less so via the mechanism that its impossible to get the relevant AIs to help... (read more)

evhub's Shortform

yams2mo30

Thanks for the clarification — this is in fact very different from what I thought you were saying, which was something more like "FATE-esque concerns fundamentally increase x-risk in ways that aren't just about (1) resource tradeoffs or (2) side-effects of poorly considered implementation details."

6habryka2mo

I mean, it's related. FATE stuff tends to center around misuse. I think it makes sense for organizations like Anthropic to commit to heavily prioritize accident risk over misuse risk, since most forms of misuse risk mitigation involve getting involved in various more zero-sum-ish conflicts, and it makes sense for there to be safety-focused institutions that are committed to prioritizing the things that really all stakeholders can agree on are definitely bad, like human extinction or permanent disempowerment.

evhub's Shortform

yams2mo10

Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter

Can you say more about the section I've bolded or link me to a canonical text on this tradeoff?

habryka2mo188

OpenAI, Anthropic, and xAI were all founded substantially because their founders were worried that other people would get to AGI first, and then use that to impose their values on the world.

In-general, if you view developing AGI as a path to godlike-power (as opposed to a doomsday device that will destroy most value independently of who gets their first), it makes a lot of sense to rush towards it. As such, the concern that people will "do bad things with the AI that they will endorse, but I won't" is the cause of a substantial fraction of worlds where we recklessly race past the precipice.

The Field of AI Alignment: A Postmortem, and What To Do About It

yams2mo92

[was a manager at MATS until recently and want to flesh out the thing Buck said a bit more]

It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.)

Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless y... (read more)

Communications in Hard Mode (My new job at MIRI)

yams3mo30

Your version of events requires a change of heart (for 'them to get a whole lot more serious'). I'm just looking at the default outcome. Whether alignment is hard or easy (although not if it's totally trivial), it appears to be progressing substantially more slowly than capabilities (and the parts of it that are advancing are the most capabilities-synergizing, so it's unclear what the oft-lauded 'differential advancement of safety' really looks like).

3Seth Herd3mo

I consider at least a modest change of heart to be the default. And I think it's really hard to say how fast alignment is progressing relative to capabilities. If by "alignment" you mean formal proofs of safety then definitely we're not on track. But there's a real chance that we don't need those. We are training networks to follow instructions, and it's possible that weak type of tool "alignment" can be leveraged into true agent alignment for instruction-following or corrigibility. If so, we have solved AGI alignment. That would give us superhuman help solving ASI alignment, and the "societal alignment" problem of surviving intent-aligned AGIs with different masters. This seems like the default for how we'll try to align AGI. We don't know if it will work. When I get MIRI-style thinkers to fully engage with this set of ideas, they tend to say "hm maybe". But I haven't gotten enough engagement to have any confidence. Prosaic alignment, LLM thinkers usually aren't engaging with the hard problems of alignment that crop up when we hit fully autonomous AGI entities, like strong optimization's effects on goal misgeneralization, reflection and learning-based alignment shifts. And almost nobody is thinking that far ahead in societal coordination dynamics. So I'd really like to see agent foundations and prosaic alignment thinking converge on the types of LLM-based AGI agents we seem likely to get in the near future. We just really don't know if we can align them or not, because we just really haven't thought about it deeply yet. Links to all of those ideas in depth can be found in a couple link hops from my recent, brief Intent alignment as a stepping-stone to value alignment.

Communications in Hard Mode (My new job at MIRI)

yams3mo10

By bad I mean dishonest, and by 'we' I mean the speaker (in this case, MIRI).

I take myself to have two central claims across this thread:

Your initial comment was straw manning the 'if we build [ASI], we all die' position.
MIRI is likely not a natural fit to consign itself to service as the neutral mouthpiece of scientific consensus.

I do not see where your most recent comment has any surface area with either of these claims.

I do want to offer some reassurance, though:

I do not take "One guy who's thought about this for a long time and some other people he recruited think it's definitely going to fail" to be descriptive of the MIRI comms strategy.

1Seth Herd3mo

I think we're talking past each other, so we'd better park it and come back to the topic later and more carefully. I do feel like you're misrepresenting my position, so I am going to respond and then quit there. You're welcome to respond; I'll try to resist carrying on, and move on to more productive things. I apologize for my somewhat argumentative tone. These are things I feel strongly about, since I think MIRIs communication might matter quite a lot, but that's not a good reason to get argumentative. 1. Strawmanning: I'm afraid you'r right that I'm probably exaggerating MIRI's claims. I don't think it's quite a strawman; "if we build it we all die" is very much the tone I get from MIRI comms on LW and X (mostly EY), but I do note that I haven't seen him use 99.9%+ in some time, so maybe he's already doing some of what I suggest. And I haven't surveyed all of MIRIs official comms. But what we're discussing is a change in comms strategy. I have gotten more strident in repeated attempts to make my central point clearer. That's my fault; you weren't addressing my actual concern so I kept trying to highlight it. I still am not sure if you're understanding my main point, but that's fine; I can try to say it better in future iterations. This is the first place I can see you suggesting that I'm exaggerating MIRIs tone, so if it's your central concern that's weird. But again, it's a valid complaint; I won't make that characterization in more public places, lest it hurt MIRI's credibility. 1. MIRI claiming to accurately represent scientific consensus was never my suggestion, I don't know where you got that. I clarified that I expect zero additional effort or strong claims, just "different experts believe a lot of different things". Honesty: I tried to specify from the first that I'm not suggesting dishonesty by any normal standard. Accurately reporting a (vague) range of others' opinions is just as honest as reporting your own opinion. Not saying the least convinci

Communications in Hard Mode (My new job at MIRI)

yams3mo10

Oh, I feel fine about saying ‘draft artifacts currently under production by the comms team ever cite someone who is not Eliezer, including experts with a lower p(doom)’ which, based on this comment, is what I take to be the goalpost. This is just regular coalition signaling though and not positioning yourself as, terminally, a neutral observer of consensus.

“You haven’t really disagreed that [claiming to speak for scientific consensus] would be more effective.”

That’s right! I’m really not sure about this. My experience has been that ~every take someone offe... (read more)

3Seth Herd3mo

MIRI leadership is famously very wrong about how sure they think they are. That's my concern. It's obvious to any rationalist that it's not rational to believe >99% in something that's highly theoretical. It's almost certainly epistemic hubris if not outright foolishness. I have immense respect for EYs intellect. He seems to be the smartest human I've engaged with enough to judge their intellect. On this point he is either obviously or seemingly wrong. I have personally spent at least a hundred hours following his specific logic, (and lots more on the background knowledge it's based on), and I'm personally quite sure he's overestimating his certainty. His discussions with other experts always end up falling back on differing intuitions.He got there first, but a bunch of us have now put real time into following and extending his logic. I have a whole theory on how he wound up so wrong, involving massive frustration and underappreciating how biased people are to short-term thinking and motivated reasoning, but that's beside the point. Whether he's right doesn't really matter; what matters is that >99.9% doom sounds crazy, and it's really complex to argue it even could be right, let alone that it actually is. Since it sounds crazy, leaning on that point is the very best way to harm MIRIs credibility. And because they are one of the most publicly visible advocates of AGI x-risk caution (and planning to become even higher profile it seems), it may make the whole thing sound less credible - maybe by a lot. Please, please don't do it or encourage others to do it. I'm actually starting to worry that MIRI could make us worse off if they insist on shouting loudly and leaning on our least credible point. Public discourse isn't rational, so focusing on the worst point could make the vibes-based public discussion go against what is otherwise a very simple and sane viewpoint: don't make a smarter species unless you're pretty sure it won't turn on you. Hopefully I needn't w

Communications in Hard Mode (My new job at MIRI)

yams3mo36

I do mean ASI, not AGI. I know Pope + Belrose also mean to include ASI in their analysis, but it’s still helpful to me if we just use ASI here, so I’m not constantly wondering if you’ve switched to thinking about AGI.

Obligatory ‘no really, I am not speaking for MIRI here.’

My impression is that MIRI is not trying to speak for anyone else. Representing the complete scientific consensus is an undue burden to place on an org that has not made that claim about itself. MIRI represents MIRI, and is one component voice of the ‘broad view guiding public policy’, no... (read more)

2Seth Herd3mo

I thought the point of this post was that MIRI is still developing its comms strategy, and one criteria is preserving credibility. I really hope they'll do that. It's not violating rationalist principles to talk about beliefs beyond your own. You're half right about what I think. I want to live, so I want MIRI to do a good job of comms. Lots of people are shouting their own opinion. I assumed MIRI wanted to be effective, not just shout along with the chorus. MIRI wouldn't have to do a bit of extra work to do what I'm suggesting. They'd just have to note their existing knowledge of the (lack of) expert consensus, instead of just giving their own opinion. You haven't really disagreed that that would be more effective. To put it this way: people (largely correctly) believe that MIRI's beliefs are a product of one guy, EY. Citing more than one guy's opinions is way more credible, no matter how expert that guy - and it avoids arguing about who's more expert.

Communications in Hard Mode (My new job at MIRI)

yams3mo10

Good point - what I said isn’t true in the case of alignment by default.

Edited my initial comment to reflect this

5Seth Herd3mo

It's not true if alignment is easy, too, right? My timelines are short, but we do still have a little time to do alignment work. And the orgs are going to do a little alignment work. I wonder if there's an assumption here that OpenAI and co don't even believe that alignment is a concern? I don't think that's true, although I do think they probably dramatically underrate x-risk dangers based on incentive-driven biases, but they do seem to appreciate the basic arguments. And I expect them to get a whole lot more serious about it once they're staring a capable agent in the face. It's one thing to dismiss the dangers of tigers from a distance, another when there's just a fence between you and it. I think proximity is going to sharpen everyone's thinking a good bit by inspiring them to spend more time thinking about the dangers.

Communications in Hard Mode (My new job at MIRI)

yams3mo63

(I work at MIRI but views are my own)

I don't think 'if we build it we all die' requires that alignment be hard [edit: although it is incompatible with alignment by default]. It just requires that our default trajectory involves building ASI before solving alignment (and, looking at our present-day resource allocation, this seems very likely to be the world we are in, conditional on building ASI at all).

[I want to note that I'm being very intentional when I say "ASI" and "solving alignment" and not "AGI" and "improving the safety situation"]

4Seth Herd3mo

The people actually building AGI very publicly disagree that we are not on track to solve alignment before building AGI. So do many genuine experts. For instance, I strongly disagree with Pope and Belrose's "AI is easy to control" but it's sitting right there in public, and it's hard to claim they're not actually experts. And I just don't see why you'd want to fight that battle. I'd say it's probably pointless to use the higher probability; an estimated 50% chance of everyone dying on the current trajectory seems like plenty to alarm people. That's vaguely what we'd get if we said "some experts think 99%, others think 1%, so we collectively just don't know". Stating MIRI's collective opinion instead of a reasonable statement of the consensus is unnecessary and costs you credibility. To put it another way: someone who uses their own estimate instead of stating the range of credible estimates is less trustworthy on average to speak for a broad population. They're demonstrating a blinkered, insular style of thinking. The public wants a broad view guiding public policy. And in this case I just don't see why you'd take that credibility hit. Edit: having thought about it a little more, I do actually think that some people would accept a 50% chance of survival and say "roll the dice!". That's largely based on the wildly exaggerated fears of civilizational collapse from global warming. And I think that, if they expressed those beliefs clearly, the majority of humanity would still say "wait what that's insane, we have to make progress on alignment before launching AGI".

ryan_greenblatt3mo1614

ASI alignment could be trivial (happens by default), easy, or hard. If it is trivial, then "if we build it, we all die" is false.

Separately, I don't buy that misaligned ASI with totally alien goals and that full takes over will certainly kill everyone due to^[1] trade arguments like this one. I also think it's plausilbe that such an AI will be at least very slightly kind such that it is willing to spend a tiny amount of resources keeping humans alive if this is cheap. Thus, "the situation is well described as 'we all die' conditional on misaligned ASI with ... (read more)

yams's Shortform

yams3mo20

What text analogizing LLMs to human brains have you found most compelling?

0Canaletto3mo

Shameless self promotion: this one https://www.lesswrong.com/posts/ASmcQYbhcyu5TuXz6/llms-could-be-as-conscious-as-human-emulations-potentially It circumvents object level question and instead looks at epistemic one. This one is about broader direction in "how the things that happened change attitudes and opinions of people" https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai This one too, about consciousness in particular https://dynomight.net/consciousness/ I think it's somewhat productive direction explored in these 3 posts, but it's not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that. E.g. here https://www.neuroai.science/p/brain-scores-dont-mean-what-we-think

leogao's Shortform

yams3mo50

Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?

Daniel Kokotajlo's Shortform

yams3mo30

Only the first few sections of the comment were directed at you; the last bit was a broader point re other commenters in the thread, the fooming shoggoths, and various in-person conversations I’ve had with people in the bay.

That rationalists and EAs tend toward aesthetic bankruptcy is one of my chronic bones to pick, because I do think it indicates the presence of some bias that doesn’t exist in the general population, which results in various blind spots.

Sorry for not signposting and/or limiting myself to a direct reply; that was definitely confusing.

I th... (read more)

Daniel Kokotajlo's Shortform

yams3mo3-2

If this is representative of the kind of music you like, I think you’re wildly overestimating how difficult it is to make that music.

The hard parts are basically infrastructural (knowing how to record a sound, how to make different sounds play well together in a virtual space). Suno is actually pretty bad at that, though, so if you give yourself the affordance to be bad at it, too, then you can just ignore the most time-intensive part of music making.

Pasting things together (as you did here), is largely The Way Music Is Made in the digital age, anyway.

I th... (read more)

3Daniel Kokotajlo3mo

I agree it's not perfect. It has the feel of... well, the musical version of what you get with AI-generated images, where your first impression is 'wow' and then you look more closely and you notice all sorts of aberrant details that sour you on the whole thing. I think you misunderstood me if you think I prefer Suno to music made by humans. I prefer some Suno songs to many songs made by humans. Mainly because of the lyrics -- I can get Suno songs made out of whatever lyrics I like, whereas most really good human-made songs have insipid or banal lyrics about clubbing or casual sex or whatever.

Daniel Kokotajlo's Shortform

yams3mo10

A great many tools like this already exist and are contracted by the major labels.

When you post a song to streaming services, it’s checked against the entire major label catalog before actually listing on the service (the technical process is almost certainly not literally this, but it’s something like this, and they’re very secretive about what’s actually happening under the hood).

1ZY3mo

Yeah nice; I heard youtube also has something similar for checking videos as well

yams's Shortform

yams3mo10

Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.

In more detail, though:

I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.

I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on developm... (read more)

2Noosphere893mo

I generally don't think the Inconvenient truth movie mattered that much for solving climate change, compared to technological solutions like renewable energy, and made the issue a little more partisan (though environmentalism/climate change was unusually partisan by then) and I think social movements to affect AI already had less impact on AI safety than technical work (in a broad sense) for reducing doom, and I expect this trend to continue. I think warning shots could scare the public, but I worry that the level of warning shots necessary to clear AI is in a fairly narrow band, and I also expect AI control to have a reasonable probability of containing human-level scheming models that do work, so I wouldn't pick this at all. I agree it's a sad take that "if we pause it will be for stupid reasons", but I also think this is the very likely attractor, if AI does become a subject that is salient in politics, because people hate nuance, and nuance matters way more than the average person wants to deal with on AI (For example, I think the second species argument critically misses important differences that make the human-AI relationship more friendly than the human-gorilla relationship, and that's without the subject being politicized). To address this: I think the key crux is I believe that the unreliability of GPT-4 would doom any attempt to automate 30% of jobs, and I think at most 0-1% of jobs could be automated, and while in principle you could improve reliability without improving capabilities too much, I also don't think the incentives yet favor this option. I agree with this sort of argument, and in general I am not a fan of collapsing checkpoints between today's AI and God AIs, which is a big mistake I think MIRI did, but my main claim is that the checkpoints would be illegible enough to the average citizen such that they don't notice the progress until it's too late, and that the reliability improvements will in practice also be coupled with capabilities

yams's Shortform

yams3mo10

So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:

We are far and away most likely to get a pause only as a response to unemployment.
An AI that precipitates pause-inducing levels of unemployment is inches from automating AI R+D.
The period between implementing the pause and massive algorithmic advancements is long enough that we're able to increase compute stock...
....but short enough that we're not able to make meaningful safety progress before algorithmic advanceme

... (read more)

2Noosphere893mo

I think 1 and 2 are actually pretty likely, but 3 and 4 is where I'm a lot less confident in actually happening. A big reason for this is that I suspect one of the reasons people aren't reacting to AI progress is they assume it won't take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there's a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.

yams's Shortform

yams3mo32

I've just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don't know that they've been done elsewhere.

I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.

The straw-Pause rhetoric is something like "Just stop until safety catches up!" The overhang argument is usually deployed (as it is in those comments) to the effect of 'there is no stopping.' And yeah, in this ca... (read more)

yams's Shortform

yams3mo10

I think it would be very helpful to me if you broke that sentence up a bit more. I took a stab at it but didn't get very far.

Sorry for my failure to parse!

2Noosphere893mo

Basically, my statement in short terms is that conditional on AI pause happening because of massive job losses from AI that is barely unable to take-over the world, then even small saving in compute via better algorithms due to algorithmic research not being banned would incentivize more algorithmic research, which then lowers the compute enough to make the AI pause untenable and the AI takes over the world.

yams's Shortform

yams3mo10

I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.

1Nathan Helm-Burger3mo

Overhangs, overhangs everywhere. A thousand gleaming threads stretching backwards from the fog of the Future, forwards from the static Past, and ending in a single Gordian knot before us here and now. That knot: understanding, learning, being, thinking. The key, the source, the remaining barrier between us and the infinite, the unknowable, the singularity. When will it break? What holds it steady? Each thread we examine seems so inadequate. Could this be what is holding us back, saving us from ourselves, from our Mind Children? Not this one, nor that, yet some strange mix of many compensating factors. Surely, if we had more compute, we'd be there already? Or better data? The right algorithms? Faster hardware? Neuromorphic chips? Clever scaffolding? Training on a regress of chains of thought, to better solutions, to better chains of thought, to even better solutions? All of these, and none of these. The web strains at the breaking point. How long now? Days? Months? If we had enough ways to utilize inference-time compute, couldn't we just scale that to super-genius, and ask the genius for a more efficient solution? But it doesn't seem like that has been done. Has it been tried? Who can say. Will the first AGI out the gate be so expensive it is unmaintainable for more than a few hours? Will it quickly find efficiency improvements? Or will we again be bound, hung up on novel algorithmic insights hanging just out of sight. Who knows? Surely though, surely.... surely rushing ahead into the danger cannot be the wisest course, the safest course? Can we not agree to take our time, to think through the puzzles that confront us, to enumerate possible consequences and proactively reduce risks? I hope. I fear. I stare in awestruck wonder at our brilliance and stupidity so tightly intermingled. We place the barrel of the gun to our collective head, panting, desperate, asking ourselves if this is it. Will it be? Intelligence is dead, long live intelligence.

yams's Shortform

yams3mo30

Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.

I'm often tempted to comment this in various threads, but it feels like a rabbit hole, it's not an easy one to convince someone of (because it's an argument they've accepted for years), and I've had relatively little success talking about this with people in person (there's some change I should make in how I'm talking about it, I think).

More broadly, I've started using quick takes to catalog random thoughts, because sometimes when I'm meeting... (read more)

yams's Shortform

yams3mo10

Yes this world.

yams's Shortform

yams3mo*81

Please stop appealing to compute overhang. In a world where AI progress has wildly accelerated chip manufacture, this already-tenuous argument has become ~indefensible.

2Noosphere893mo

While I'm not a general fan of compute overhang, I do think that it's at least somewhat relevant in worlds where AI pauses are very close to when a system is able to automate at least the entire AI R&D process, if not the entire AI economy itself, and I do suspect realistic pauses imposed by governments will likely only come once a massive amount of people lose their jobs, which can create incentives to go to algorithmic progress, and even small algorithmic progress might immediately blow up the pause agreement crafted in the aftermath of many people losing their jobs.

2Nathan Helm-Burger3mo

I think I get what you're saying... That the argument you dislike is, "we should rush to AGI sooner, so that there's less compute overhang when we get there." I agree that that argument is a pretty bad one. I personally think that we are already so far into a compute overhang regime that that ship has sailed. We are using very inefficient learning algorithms, and will be able to run millions of inference instances of any model we produce. Does this correspond with what you are thinking?

4RobertM3mo

I tried to make a similar argument here, and I'm not sure it landed. I think the argument has since demonstrated even more predictive validity with e.g. the various attempts to build and restart nuclear power plants, directly motivated by nearby datacenter buildouts, on top of the obvious effects on chip production.

4Eli Tyre3mo

To whom are are you talking?

0robo3mo

This world?

yams's Shortform

yams3mo10

Sometimes people express concern that AIs may replace them in the workplace. This is (mostly) silly. Not that it won't happen, but you've gotta break some eggs to make an industrial revolution. This is just 'how economies work' (whether or not they can / should work this way is a different question altogether).

The intrinsic fear of joblessness-resulting-from-automation is tantamount to worrying that curing infectious diseases would put gravediggers out of business.

There is a special case here, though: double digit unemployment (and youth unemployment... (read more)

yams's Shortform

yams4mo10

I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.

The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.

They're no less tribal or dogmatic, or more crit... (read more)

nikola's Shortform

yams4mo30

I agree with this in the world where people are being epistemically rigorous/honest with themselves about their timelines and where there's a real consensus view on them. I've observed that it's pretty rare for people to make decisions truly grounded in their timelines, or to do so only nominally, and I think there's a lot of social signaling going on when (especially younger) people state their timelines.

I appreciate that more experienced people are willing to give advice within a particular frame ("if timelines were x", "if China did y", "if Anthro... (read more)

yams's Shortform

yams4mo10

I don't think I really understood what it meant for establishment politics to be divisive until this past election.

As good as it feels to sit on the left and say "they want you to hate immigrants" or "they want you to hate queer people", it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).

But there's also a deeper, structural sense in which it's true.

Working on A... (read more)

DanielFilan's Shortform Feed

yams4mo*52

I think the key missing piece you’re pointing at (making sure that our interpretability tools etc actually tell us something alignment-relevant) is one of the big things going on in model organisms of misalignment (iirc there’s a step that’s like ‘ok, but if we do interpretability/control/etc at the model organism does that help?’). Ideally this type of work, or something close to it, could become more common // provide ‘evals for our evals’ // expand in scope and application beyond deep deception.

If that happened, it seems like it would fit the bill here.

Does that seem true to you?

4DanielFilan4mo

Oh except: I did not necessarily mean to claim that any of the things I mentioned were missing from the alignment research scene, or that they were present.

6DanielFilan4mo

Yeah, that seems right to me.

A Brief Explanation of AI Control

yams4mo10

I like this post but I think redwood has varied some on whether control is for getting alignment work out of AIs vs getting generally good-for-humanity work out of them and pushing for a pause once they reach some usefulness/danger threshold (eg well before super intelligence).

[based on my recollection of Buck seminar in MATS 6]

yams's Shortform

yams5mo10

Makes sense. Pretty sure you can remove it (and would appreciate that).

yams's Shortform

yams5mo30

Many MATS scholars go to Anthropic (source: I work there).

Redwood I’m really not sure, but that could be right.

Sam now works at Anthropic.

Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly... (read more)

2niplav5mo

Apologies for the soldier mindset react, I pattern-matched to some more hostile comment. Communication is hard.

yams's Shortform

yams5mo10

updated, thanks!

yams's Shortform

yams5mo*30

The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.

Land started a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.

If anyone knows of explicit connections between the CCRU and contemporary phenomena (beyond Land/Fisher’s immediate influence via their later work), I’d love to hear about them.

2mokmu2mo

I agree that there is more of an influence than seems to be talked about. Besides the obvious influence on e/acc and of Land's AGI opinions (the diagonality thesis, Pythia), the CCRU came up with hyperstitions (https://www.lesswrong.com/tag/hyperstitions), which seems to be a popular concept with the cyborgs. Maybe they also contributed to the popularity of "Cthulhu" and "Shoggoth" as symbols in some AI circles.

3metachirality5mo

Yarvin was not part of the CCRU. I think Land and Yarvin only became associates post-CCRU.

yams's Shortform

yams5mo10

Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?

3niplav5mo

Grants to Redwood Research, SERI MATS, NYU alignment group under Sam Bowman for scalable supervision, Palisade research, and many dozens more, most of which seem net positive wrt TAI risk.

Daniel Kokotajlo's Shortform

yams5mo95

I think a non-zero number of those disagree votes would not have appeared if the same comment were made by someone other than an Anthropic employee, based on seeing how Zac is sometimes treated IRL. My comment is aimed most directly at the people who cast those particular disagree votes.

I agree with your comment to Ryan above that those who identified "Anthropic already does most of these" as "the central part of the comment" were using the disagree button as intended.

The threshold for hitting the button will be different in different situations; I think the threshold many applied here was somewhat low, and a brief look at Zac's comment history, to me, further suggests this.

2Ben Pace5mo

FTR I upvote-disagreed with the comment, in that I was glad that this dialogue was happening and yet disagreed with the comment. I think it likely I am not the only one.

Daniel Kokotajlo's Shortform

yams5mo244

I want to double down on this:

Zac is consistently generous with his time, even when dealing with people who are openly hostile toward him. Of all lab employees, Zac is among the most available for—and eager to engage in—dialogue. He has furnished me personally with >2 dozen hours of extremely informative conversation, even though our views differ significantly (and he has ~no instrumental reason for talking to me in particular, since I am but a humble moisture farmer). I've watched him do the same with countless others at various events.

I've also ... (read more)

5Zac Hatfield-Dodds4mo

Unfortunately we're a fair way into this process, not because of downvotes[1] but rather because the comments are often dominated by uncharitable interpretations that I can't productively engage with.[2]. I've had researchers and policy people tell me that reading the discussion convinced them that engaging when their work was discussed on LessWrong wasn't worth the trouble. I'm still here, sad that I can't recommend it to many others, and wondering whether I'll regret this comment too. ---------------------------------------- 1. I also feel there's a double standard, but don't think it matters much. Span-level reacts would make it a lot easier to tell what people disagree with though. ↩︎ 2. Confidentiality makes any public writing far more effortful than you might expect. Comments which assume ill-faith are deeply unpleasant to engage with, and very rarely have any actionable takeaways. I've written and deleted a lot of other stuff here, and can't find an object-level description that I think is worth posting, but there are plenty of further reasons. ↩︎

habryka5mo3637

Ok, but, that's what we have the whole agreement/approval distinction for.

I absolutely do not want people to hesitate to disagree vote on something because they are worried that this will be taken as disapproval or social punishment, that's the whole reason we have two different dimensions! (And it doesn't look like Zac's comments are at any risk of ending up with a low approval voting score)

Struggling like a Shadowmoth

yams5mo10

This may be an example, but I don't think it's an especially central one, for a few reasons:

1. The linked essay discusses, quite narrowly, the act of making predictions about artificial intelligence/the Actual Future based on the contents of science fiction stories that make (more-or-less) concrete predictions on those topics, thus smuggling in a series of warrants that poison the reasoning process from that point onward. This post, by contrast, is about feelings.

2. The process for reasoning about one's, say, existential disposition, is independent of the ... (read more)

yams's Shortform

yams6mo*30

Sometimes people give a short description of their work. Sometimes they give a long one.

I have an imaginary friend whose work I’m excited about. I recently overheard them introduce and motivate their work to a crowd of young safety researchers, and I took notes. Here’s my best reconstruction of what he’s up to:

"I work on median-case out-with-a-whimper scenarios and automation forecasting, with special attention to the possibility of mass-disempowerment due to wealth disparity and/or centralization of labor power. I identify existing legal and technological... (read more)