Research Scientist at Google DeepMind. Creator of the Alignment Newsletter. http://rohinshah.com/
Technologies that allow workers to be more isolated from each other gain you both convenience (because your coworkers no longer accidentally mess up what you’re doing) and also security (because you can remove your coworker’s permission to affect the code you’re running), but generally reduce efficiency. When we try to buy efficiency at the cost of convenience, we might lose security too.
Hmm, this feels less likely to me. Isolation can often be an efficiency benefit because one employee's mistake doesn't propagate to screwing up the work of everyone else, and this benefit scales as the number of employees increases. This assumes that labor is at least somewhat unreliable, but I do expect that AI labor will be overall unreliable, just because reliable AI labor will likely be much more expensive and won't look cost-effective to deploy in most situations.
I'd expect that you buy efficiency by building more fine-grained configuration mechanisms and requiring that employees use them. This will also become easier with AI, since you can have specialized AI assistants that understand the configuration mechanisms well and can set them for all experiments (unlike the current status quo where all your employees have to be separately trained on the use of the configurations and many of them forget it and just copy configurations from elsewhere).
In general it seems like this should improve security by creating more info and legibility around what is happening (which in turn gives you more affordances to enforce security). Some examples (not necessarily about AI):
That said, I did think of one example that worsens security -- switching from a memory-safe language like Python to one that requires manual memory management like C. Though in this case, it's that the "security infra" (automatic memory management) itself is too compute-expensive to run, which doesn't seem like it will apply in the AI case.
Overall my very weak guess is that the net effect will be to increase security. Of course, the part where you build more fine-grained configuration mechanisms is "novel (potentially) security-critical infrastructure", so this does support your broader point. (When I say it will increase security, this is under the assumption that the novel infra isn't sabotaged by scheming AIs.)
I think the evidence is roughly at "this should be a weakly held prior easily overturned by personal experience": https://www.lesswrong.com/posts/c8EeJtqnsKyXdLtc5/how-long-can-people-usefully-work
That said, I do think there's enough evidence that I would bet (not at extreme odds) that it is bad for productivity to have organizational cultures that emphasize working very long hours (say > 60 hours / week), unless you are putting in special care to hire people compatible with that culture. Partly this is because I expect organizations to often be unable to overcome weak priors even when faced with blatant evidence.
Andrew Gelman: "Bring on the Stupid: When does it make sense to judge a person, a group, or an organization by its worst?" (Not quite as clearcut, since it doesn't name the person in the title, but still)
(If this also doesn't count as "intellectual writing circles", consider renaming your category, since I clearly do not understand what you mean, except inasmuch as it is "rationalist or rationalist-adjacent circles".)
Hmm, interesting. I was surprised by the claim so I did look back through ACX and posts from the LW review, and it does seem to back up your claim (the closest I saw was "Sorry, I Still Think MR Is Wrong About USAID", note I didn't look very hard). EDIT: Actually I agree with sunwillrise that "Moldbug sold out" meets the bar (and in general my felt sense is that ACX does do this).
I'd dispute the characterization of this norm as operating "within intellectual online writing circles". I think it's a rationalist norm if anything. For example I went to Slow Boring and the sixth post title is "Tema Okun's "White Supremacy Culture" work is bad".
This norm seems like it both (1) creates incentives against outside critique and (2) lessens the extremes of a bad thing (e.g. like a norm that even if you have fistfights you won't use knives). I think on balance I support it but still feel pretty meh about its application in this case. Still, this did change my mind somewhat, thanks.
While I disagree with Nate on a wide variety of topics (including implicit claims in this post), I do want to explicitly highlight strong agreement with this:
I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns as if they’re obvious and sensible, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it seriously too.
The position that is "obvious and sensible" doesn't have to be "if anyone builds it, everyone dies". I don't believe that position. It could instead be "there is a real threat model for existential risk, and it is important that society does more to address it than it is currently doing". If you're going to share concerns at all, figure out the position you do have courage in, and then discuss that as if it is obvious and sensible, not as if you are ashamed of it.
(Note that I am not convinced that you should always be sharing your concerns. This is a claim about how you should share concerns, conditional on having decided that you are going to share them.)
I don’t see any inconsistency in being unhappy with what titotal is doing and happy about what AI 2027 is doing.
I agree with this. I was responding pretty specifically to Zvi's critique in particular, which is focusing on things like the use of the word "bad" and the notion that there could be a goal to lower the status and prestige of AI 2027. If instead the critique was about e.g. norms of intellectual discourse I'd be on board.
That said I don't feel like your defense feels all that strong to me? I'm happy to take your word for it that there was lots of review of AI 2027, but my understanding is that titotal also engaged quite a lot with the authors of AI 2027 before publishing the post? (I definitely expect it was much lower engagement / review in an absolute sense, but everything about it is going to be much lower in an absolute sense, since it is not as big a project.)
If I had to guess at the difference between us, it would be that I primarily see emotionally gripping storytelling as a symmetric weapon to be regarded with suspicion by default, whereas you primarily view it as an important and valuable way to get people to really engage with a topic. (Though admittedly on this view I can't quite see why you'd object to describing a model as "bad", since that also seems like a way to get people to better engage with a topic.) Or possibly it's more salient to me how the storytelling in the finished AI 2027 product comes across since I wasn't involved in its creation, whereas to you the research and analysis is more salient.
Anyway it doesn't seem super worth digging to the bottom of this, seems reasonable to leave it here (though I would be interested in any reactions you have if you felt like writing them).
EDIT: Actually looking at the other comments here I think it's plausible that a lot of the difference is in creators thinking the point of AI 2027 was the scenario whereas the public reception was much more about timelines. I feel like it was very predictable that public reception would focus a lot on the timeline, but perhaps this would have been less clear in advance. Though looking at Scott's post, the timeline is really quite central to the presentation, so I don't feel like this can really be a surprise.
But it isn't trend extrapolation?
If the current doubling time is T, and each subsequent doubling takes 10% less time, then you have infinite doublings (i.e. singularity) by time 10T. So with T = 4.5 months you get singularity by 45 months. This is completely insensitive to the initial conditions or to the trend in changes-in-doubling-time (unless the number "10%" was chosen based on trend extrapolation, but that doesn't seem to be the case).
(In practice the superexponential model predicts singularity even sooner than 45 months, because of the additional effect from automated AI R&D.)
I don't see how this is responding to anything I've said? What in my comment are you disagreeing with or adding color to?
Again, my position is not "AI 2027 did something bad". My position is "stop critiquing people for having goals around status and prestige rather than epistemics, or at least do so consistently".
(Incidentally, I suspect bio anchors did better on the axis of getting good reviews / feedback, but that isn't particularly central to anything I'm claiming.)
Things I agree with:
I disagree that titotal's critique is far away from AI 2027 on the relevant spectrum. For example, titotal's critique was posted on the EA Forum / LessWrong, and focused on technical disagreements, rather than going through a huge amplification / social media push, and focusing on storytelling.
(I'd agree that AI 2027 put in more effort / are more obviously "trying" relative to titotal, so they're far away as judged by intent, but I mostly care about outcomes rather than intent.)
You might say that obviously AI 2027 needed to do the amplification / social media push + storytelling in order to achieve its goals of influencing the discourse, and I would agree with you. But "influence the discourse" is ultimately going to be about status and prestige (given how discourse works in practice). If you're taking a stance against goals around status and prestige that trade off against epistemic commons, I think you also need to take a stance against AI 2027. (To be clear, I don't take that stance! I'm just arguing for consistency.)
You mention permission systems, which is certainly a big deal, but I didn't see anything about broader configuration mechanisms, much of which can be motivated solely by efficiency and incidentally helps with security. (I was disputing your efficiency -> less security claim; permissions mechanisms aren't a valid counterargument since they aren't motivated by efficiency.)
Hmm, I think this assumes the AIs will be more reliable than I expect them to be. You would want this to be reliable enough that your (AI or human) software engineers ~never have to consider non-isolation as a possible cause of bugs (unless explicitly flagged otherwise); I'd guess that corresponds to about 6 nines of reliability at following the naming conventions? (Which might get you to 8 nines of reliability at avoiding non-isolation bugs, since most naming convention errors won't lead to bugs.)
Some complete guesses: human software engineers might get 0.5-2 nines of reliability at following the naming conventions, a superhuman coder model might hit ~4 nines of reliability if you made this a medium priority, if you add a review step that explicitly focuses just on checking adherence to naming conventions then you might hit 6 nines of reliability if the critique uses a model somewhat more powerful than a current frontier model. But this is now a non-trivial expense; it's not clear to me it would be worth it.