I'd like to strongly assert that you'd want your design spec to be multiplayer from the start so that you can have virtually any arbitrary mix of LLMs and people. You'll probably want this later and there are likely to be some design decisions that you'll make wrong if you assume there's never more than one person
I would strongly, strongly argue that essentially "take all your vacation" is a strategy that would lead to more impact for you on your goals, almost regardless of what they are.
Humans need rest, and humans like the folks on LW tend not to take enough.
"We don't want it to be the case that models can be convinced to blackmail people just by putting them in a situation that the predictor thinks is fictional!"
This is interesting! I guess that in, some sense, means that you see certain ways in which even a future Claude N+1 won't be a truly general intelligence?
I would note that this is, indeed, a very common move done in DC. I would also note that many of these copies end up in, e.g., Little Free Libraries and at the Goodwill. (For example, I currently downstairs have a copy of the President of Microsoft's Board's book with literally still the letter inside saying "Dear Congressman XYZ, I hope you enjoy my book...")
I am not opposed to MIRI doing this, but just want to flag that this is a regular move in DC. (Which might mean you should absolutely do it since it has survivorship bias as a good lindy idea! Just saying it ain't, like, a brand new strat.)
We're hiring at ControlAI for folks who walk to work on UK and US policy advocacy. Come talk to Congress and Parliament and stop risks from unsafe superintelligences! controlai.com/careers
(Admins: I don't tend to see many folks posting this sort of thing here, so feel free to nuke this post if not the sort of content you're going for. But given audience here, figured might be of interest)
I think I am too much inside the DC policy world to understand why this is seen as a gaffe, really. Can you unpack why it's seen as a gaffe to them? In the DC world, by contrast, "yes, of course, this is a major national security threat, and no you of course could never use military capabilities to address it," would be a gaffe.
I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters
One thing I find very confusing about existing gaps between the AI policy community and the national security community is that natsec policymakers have already explicitly said that kinetic (i.e., blowing things up) responses are acceptable for cyberattacks under some circumstances, while the AI policy community seems to somehow unconsciously rule those sorts of responses out of the policy window. (To be clear: any day that American servicemembers go into combat is a bad day, I don't think we should choose such approaches lightly.)
I think a lot of this boils down to the fact that Sam Vimes is a copper, and sees poverty lead to precarity, and precarity lead to Bad Things Happening In Bad Neighborhoods. The most salient fact about Lady Sybil is that she never has to worry, never is on the rattling edge; she's always got more stuff, new stuff, old stuff, good stuff. Vimes (at that point in the Discworld series) isn't especially financially sophisticated, so he narrows it down to the piece he understands best, and builds a theory off of that.
I mean, two points:
1. We all work too many hours, working 70 hours a week persistently is definitely too many to maximize output. You get dumb fast after hour 40 and dive into negative productivity. There's a robust organizational psych literature on this, I'm given to understand, that we all choose to ignore, because the first ~12 weeks or so, you can push beyond and get more done, but then it backfires.
2. You're literally saying statements that I used to say before burning out, and that the average consultant or banker says as part of their path to burnout. And we cannot afford to lose either of you to burnout, especially not right now.
If you're taking a full 4 weeks, great. 2 weeks a year is definitely not enough at a 70 hours a week pace, based on the observed long term health patterns of everyone I've known who works that pace for a long time. I'm willing to assert that you working 48/50ths of the hours a year you'd work otherwise is worth it, assuming fairly trivial speedups in productivity of literally just over 4% from being more refreshed, getting new perspectives from downing tools, etc.