This looks closer to 2 to me?
Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them puni...
Rather than make things worse as a means of compelling others to make things better, I would rather just make things better.
Brinksmanship and accelerationism (in the Marxist sense) are high variance strategies ill-suited to the stakes of this particular game.
[one way this makes things worse is stimulating additional investment on the frontier; another is attracting public attention to the wrong problem, which will mostly just generate action on solutions to that problem, and not to the problem we care most about. Importantly, the contingent of people-mostl...
Ah, I think this just reads like you don't think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to 'settle', rather than things that could actually plausibly outweigh sexual satisfaction.
Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).
I'm ...
I read your comment as conflating 'talking about the culture war at all' and 'agreeing with / invoking Curtis Yarvin', which also conflates 'criticizing Yarvin' with 'silencing discussion of the culture war'.
This reinforces a false binary between totally mind-killed wokists and people (like Yarvin) who just literally believe that some folks deserve to suffer, because it's their genetic destiny.
This kind of tribalism is exactly what fuels the culture war, and not what successfully sidesteps, diffuses, or rectifies it. NRx, like the Cathedral, is a min...
I think you're saying something here but I'm going to factor it a bit to be sure.
One and three I'm just going to call 'subjective' (and I think I would just agree with you if the Wikipedia article were actually representative of the contents of the book, which it is not).
Re 4: The book itself is actually largely about his experiences as a professor, being subjected to the forces of elite coordination and bureaucracy, and reads a lot like Yar...
(I basically endorse Daniel and Habryka's comments, but wanted to expand the 'it's tricky' point about donation. Obviously, I don't know what they think, and they likely disagree on some of this stuff.)
There are a few direct-work projects that seem robustly good (METR, Redwood, some others) based on track record, but afaict they're not funding constrained.
Most incoming AI safety researchers are targeting working at the scaling labs, which doesn't feel especially counterfactual or robust against value drift, from my position. For this reason, I don't ...
[errant thought pointing a direction, low-confidence musing, likely retreading old ground]
There’s a disagreement that crops up in conversations about changing people’s minds. Sides are roughly:
This first strategy invites framing you...
Do you think of rationality as a similar sort of 'object' or 'discipline' to philosophy? If not, what kind of object do you think of it as being?
(I am no great advocate for academic philosophy; I left that shit way behind ~a decade ago after going quite a ways down the path. I just want to better understand whether folks consider Rationality as a replacement for philosophy, a replacement for some of philosophy, a subset of philosophical commitments, a series of cognitive practices, or something else entirely. I can model it, internally, as aiming to be any...
Question for Ben:
Are you inviting us to engage with the object level argument, or are you drawing attention to the existence of this argument from a not-obviously-unreasonable-source as a phenomenon we are responsible for (and asking us to update on that basis)?
On my read, he’s not saying anything new (concerns around military application are why ‘we’ mostly didn’t start going to the government until ~2-3 years ago), but that he’s saying it, while knowing enough to paint a reasonable-even-to-me picture of How This Thing Is Going, is the real tragedy.
I think the reason nobody will do anything useful-to-John as a result of the control critique post is that control is explicitly not aiming at the hard parts of the problem, and knows this about itself. In that way, control is an especially poorly selected target if the goal is getting people to do anything useful-to-John. I'd be interested in a similar post on the Alignment Faking paper (or model organisms more broadly), on RAT, on debate, on faithful CoT, on specific interpretability paradigms (circuits v SAEs, vs some coherentist approach vs shards vs.....
there are plenty of cases where we can look at what people are doing and see pretty clearly that it is not progress toward the hard problem
There are plenty of cases where John can glance at what people are doing and see pretty clearly that it is not progress toward the hard problem.
Importantly, people with the agent foundations class of anxieties (which I embrace; I think John is worried about the right things!) do not spend time engaging on a gears level with prominent prosaic paradigms and connecting the high level objection ("it ignores the hard part of...
If you wrote this exact post, it would have been upvoted enough for the Redwood team to see it, and they would have engaged with you similarly to how they engaged with John here (modulo some familiarity, because theyse people all know each other at least somewhat, and in some pairs very well actually).
If you wrote several posts like this, that were of some quality, you would lose the ability to appeal to your own standing as a reason not to write a post.
This is all I'm trying to transmit.
[edit: I see you already made the update I was encouraging, an hour after leaving the above comment to me. Yay!]
Writing (good) critiques is, in fact, a way many people gain standing. I’d push back on the part of you that thinks all of your good ideas will be ignored (some of them probably will be, but not all of them; don’t know until you try, etc).
More partial credit on the second to last point:
https://home.treasury.gov/news/press-releases/jy2766
Aside: I don’t think it’s just that real world impacts take time to unfold. Lately I’ve felt that evals are only very weakly predictive of impact (because making great ones is extremely difficult). Could be that models available now don’t have substantially more mundane utility (economic potential stemming from first order effects), outside of the domains the labs are explicitly targeting (like math and code), than models available 1 year ago.
Is the context on “reliable prediction and ELK via empirical route” just “read the existing ELK literature and actually follow it” or is it stuff that’s not written down? I assume you’ve omitted it to save time, and so no worries if the latter.
EDIT: I was slightly tempted to think of this also as ‘Ryan’s ranking of live agendas that aren’t control’, but I’m not sure if ‘what you expect to work conditional on delegating to AIs’ is similar to ‘what you expect to work if humans are doing most of it?’ (my guess is the lists would look similar, but with notable exceptions, eg humans pursuing GOFAI feels less viable than ML agents pursuing GOFAI)
My understanding is that ~6 months ago y’all were looking for an account of the tasks an automated AI safety researcher would hopefully perform, as part of answering the strategic question ‘what’s the next step after building [controlled] AGI?’ (with ‘actually stop there indefinitely’ being a live possibility)
This comment makes me think you’ve got that account of safety tasks to be automated, and are feeling optimistic about automated safety research.
Is that right and can you share a decently mechanistic account of how automated safety research might work?...
We're probably more skeptical than AI companies and less skeptical than the general lesswrong crowd. We have made such a list and thought about it in a a bit of detail (but still pretty undercooked).
My current sense is that automated AI safety/alignment research to reach a reasonable exit condition seems hard, but substantially via the mechanism that achieving the desirable exit condition is an objectively very difficult research tasks for which our best ideas aren't amazing, and less so via the mechanism that its impossible to get the relevant AIs to help...
Thanks for the clarification — this is in fact very different from what I thought you were saying, which was something more like "FATE-esque concerns fundamentally increase x-risk in ways that aren't just about (1) resource tradeoffs or (2) side-effects of poorly considered implementation details."
Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter
Can you say more about the section I've bolded or link me to a canonical text on this tradeoff?
OpenAI, Anthropic, and xAI were all founded substantially because their founders were worried that other people would get to AGI first, and then use that to impose their values on the world.
In-general, if you view developing AGI as a path to godlike-power (as opposed to a doomsday device that will destroy most value independently of who gets their first), it makes a lot of sense to rush towards it. As such, the concern that people will "do bad things with the AI that they will endorse, but I won't" is the cause of a substantial fraction of worlds where we recklessly race past the precipice.
[was a manager at MATS until recently and want to flesh out the thing Buck said a bit more]
It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.)
Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless y...
Your version of events requires a change of heart (for 'them to get a whole lot more serious'). I'm just looking at the default outcome. Whether alignment is hard or easy (although not if it's totally trivial), it appears to be progressing substantially more slowly than capabilities (and the parts of it that are advancing are the most capabilities-synergizing, so it's unclear what the oft-lauded 'differential advancement of safety' really looks like).
By bad I mean dishonest, and by 'we' I mean the speaker (in this case, MIRI).
I take myself to have two central claims across this thread:
I do not see where your most recent comment has any surface area with either of these claims.
I do want to offer some reassurance, though:
I do not take "One guy who's thought about this for a long time and some other people he recruited think it's definitely going to fail" to be descriptive of the MIRI comms strategy.
Oh, I feel fine about saying ‘draft artifacts currently under production by the comms team ever cite someone who is not Eliezer, including experts with a lower p(doom)’ which, based on this comment, is what I take to be the goalpost. This is just regular coalition signaling though and not positioning yourself as, terminally, a neutral observer of consensus.
“You haven’t really disagreed that [claiming to speak for scientific consensus] would be more effective.”
That’s right! I’m really not sure about this. My experience has been that ~every take someone offe...
I do mean ASI, not AGI. I know Pope + Belrose also mean to include ASI in their analysis, but it’s still helpful to me if we just use ASI here, so I’m not constantly wondering if you’ve switched to thinking about AGI.
Obligatory ‘no really, I am not speaking for MIRI here.’
My impression is that MIRI is not trying to speak for anyone else. Representing the complete scientific consensus is an undue burden to place on an org that has not made that claim about itself. MIRI represents MIRI, and is one component voice of the ‘broad view guiding public policy’, no...
Good point - what I said isn’t true in the case of alignment by default.
Edited my initial comment to reflect this
(I work at MIRI but views are my own)
I don't think 'if we build it we all die' requires that alignment be hard [edit: although it is incompatible with alignment by default]. It just requires that our default trajectory involves building ASI before solving alignment (and, looking at our present-day resource allocation, this seems very likely to be the world we are in, conditional on building ASI at all).
[I want to note that I'm being very intentional when I say "ASI" and "solving alignment" and not "AGI" and "improving the safety situation"]
ASI alignment could be trivial (happens by default), easy, or hard. If it is trivial, then "if we build it, we all die" is false.
Separately, I don't buy that misaligned ASI with totally alien goals and that full takes over will certainly kill everyone due to[1] trade arguments like this one. I also think it's plausilbe that such an AI will be at least very slightly kind such that it is willing to spend a tiny amount of resources keeping humans alive if this is cheap. Thus, "the situation is well described as 'we all die' conditional on misaligned ASI with ...
Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?
Only the first few sections of the comment were directed at you; the last bit was a broader point re other commenters in the thread, the fooming shoggoths, and various in-person conversations I’ve had with people in the bay.
That rationalists and EAs tend toward aesthetic bankruptcy is one of my chronic bones to pick, because I do think it indicates the presence of some bias that doesn’t exist in the general population, which results in various blind spots.
Sorry for not signposting and/or limiting myself to a direct reply; that was definitely confusing.
I th...
If this is representative of the kind of music you like, I think you’re wildly overestimating how difficult it is to make that music.
The hard parts are basically infrastructural (knowing how to record a sound, how to make different sounds play well together in a virtual space). Suno is actually pretty bad at that, though, so if you give yourself the affordance to be bad at it, too, then you can just ignore the most time-intensive part of music making.
Pasting things together (as you did here), is largely The Way Music Is Made in the digital age, anyway.
I th...
A great many tools like this already exist and are contracted by the major labels.
When you post a song to streaming services, it’s checked against the entire major label catalog before actually listing on the service (the technical process is almost certainly not literally this, but it’s something like this, and they’re very secretive about what’s actually happening under the hood).
Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.
In more detail, though:
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.
I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on developm...
So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:
I've just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don't know that they've been done elsewhere.
I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.
The straw-Pause rhetoric is something like "Just stop until safety catches up!" The overhang argument is usually deployed (as it is in those comments) to the effect of 'there is no stopping.' And yeah, in this ca...
I think it would be very helpful to me if you broke that sentence up a bit more. I took a stab at it but didn't get very far.
Sorry for my failure to parse!
I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.
Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.
I'm often tempted to comment this in various threads, but it feels like a rabbit hole, it's not an easy one to convince someone of (because it's an argument they've accepted for years), and I've had relatively little success talking about this with people in person (there's some change I should make in how I'm talking about it, I think).
More broadly, I've started using quick takes to catalog random thoughts, because sometimes when I'm meeting...
Yes this world.
Sometimes people express concern that AIs may replace them in the workplace. This is (mostly) silly. Not that it won't happen, but you've gotta break some eggs to make an industrial revolution. This is just 'how economies work' (whether or not they can / should work this way is a different question altogether).
The intrinsic fear of joblessness-resulting-from-automation is tantamount to worrying that curing infectious diseases would put gravediggers out of business.
There is a special case here, though: double digit unemployment (and youth unemployment...
I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.
The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.
They're no less tribal or dogmatic, or more crit...
I agree with this in the world where people are being epistemically rigorous/honest with themselves about their timelines and where there's a real consensus view on them. I've observed that it's pretty rare for people to make decisions truly grounded in their timelines, or to do so only nominally, and I think there's a lot of social signaling going on when (especially younger) people state their timelines.
I appreciate that more experienced people are willing to give advice within a particular frame ("if timelines were x", "if China did y", "if Anthro...
I don't think I really understood what it meant for establishment politics to be divisive until this past election.
As good as it feels to sit on the left and say "they want you to hate immigrants" or "they want you to hate queer people", it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).
But there's also a deeper, structural sense in which it's true.
Working on A...
I think the key missing piece you’re pointing at (making sure that our interpretability tools etc actually tell us something alignment-relevant) is one of the big things going on in model organisms of misalignment (iirc there’s a step that’s like ‘ok, but if we do interpretability/control/etc at the model organism does that help?’). Ideally this type of work, or something close to it, could become more common // provide ‘evals for our evals’ // expand in scope and application beyond deep deception.
If that happened, it seems like it would fit the bill here.
Does that seem true to you?
I like this post but I think redwood has varied some on whether control is for getting alignment work out of AIs vs getting generally good-for-humanity work out of them and pushing for a pause once they reach some usefulness/danger threshold (eg well before super intelligence).
[based on my recollection of Buck seminar in MATS 6]
Makes sense. Pretty sure you can remove it (and would appreciate that).
Many MATS scholars go to Anthropic (source: I work there).
Redwood I’m really not sure, but that could be right.
Sam now works at Anthropic.
Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly...
updated, thanks!
Oh man — I sure hope making 'defectors' and lab safety staff walk the metaphorical plank isn't on the table. Then we're really in trouble.