alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race
Off the top of my head, this post. More generally, this is an obvious feature of AI arms races in the presence of alignment tax. Here's a 2011 writeup that lays it out:
Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided: development paths that seemed to pose a high risk of catastrophe could be relinquished in favor of safer ones. However, the context of an arms race might not permit such caution. A risk of accidental AI disaster would threaten all of humanity, while the benefits of being first to develop AI would be concentrated, creating a collective action problem insofar as tradeoffs between speed and safety existed.
I assure you the AI Safety/Alignment field has been widely aware of it since at least that long ago.
Also,
alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race
Any (human) system that is optimizing as hard as possible also won't survive the race. Which hints at what the actual problem is: it's not even that we're in an AI arms race, it's that we're in an AI suicide race which the people racing incorrectly believe to be an AI arms race. Convincing people of the true nature of what's happening is therefore a way to dissolve the race dynamic. Arms races are correct strategies to pursue under certain conditions; suicide races aren't.
I've skimmed™ what I assume is your "main essay". Thoughtless Kneejerk Reaction™ follows:
I'm getting the impression that you did not familiarize yourself with LW's culture and stances prior to posting. If yes, this is at the root of the problems you ran into.
Edit:
Imagine for a moment that an amateur astronomer spots an asteroid on a trajectory to wipe out humanity. He doesn’t have a PhD. He’s not affiliated with NASA. But the evidence is there. And when he contacts the people whose job it is to monitor the skies, they say: “Who are you to discover this?” And then refuse to even look in the direction he’s pointing.
A more accurate analogy would involve the amateur astronomer joining a conference for people discussing how to divert that asteroid, giving a presentation where he argues for the asteroid's existence using low-resolution photos and hand-made calculations (to a room full of people who've observed the asteroid through the largest international telescopes or programmed supercomputer simulations of its trajectory), and is then confused why it's not very well-received.
It's been more than three months since o3 and still no o4, despite OpenAI researchers' promises.
Deep Learning has officially hit a wall. Schedule the funeral.
[/taunting_god]
Counterargument: Doing it manually teaches you the skills and the strategies for autonomously attaining high levels of understanding quickly and data-efficiently. Those skills would then generalize to cases in which you can't consult anyone, such as cases where the authors are incommunicado, dead, or don't exist/the author is the raw reality. That last case is particularly important for doing frontier research: if you've generated a bunch of experimental results and derivations, the skills to make sense of what it all means have a fair amount of overlap with the skills for independently integrating a new paper into your world-models.
Of course, this is primarily applicable if you expect research to be a core part of your career, and it's important to keep in mind that "ask an expert for help" is an option. Still, I think independent self-studies can serve as good "training wheels".
Which is weird, if you are overwhelmed shouldn’t you also be excited or impressed? I guess not, which seems like a mistake, exciting things are happening.
"Impressed" or "excited" implies a positive/approving emotion towards the overwhelming news coming from the AI sphere. As an on-the-nose comparison, you would not be "impressed" or "excited" by a constant stream of reports covering how quickly an invading army is managing to occupy your cities, even if the new military hardware they deploy is "impressive" in a strictly technical sense.
When reading LLM outputs, I tend to skim them. They're light on relevant, non-obvious content. You can usually just kind of glance diagonally through their text and get the gist, because they tend to spend a lot of words saying nothing/repeating themselves/saying obvious inanities or extensions of what they've already said.
When I first saw Deep Research outputs, it didn't read to me like this. Every sentence seemed to be insightful, dense with pertinent information.
Now I've adjusted to the way Deep Research phrases itself, and it reads same as any other LLM output. Too many words conveying too few ideas.
Not to say plenty of human writing isn't similar kind of slop, and not to say some LLM outputs aren't actually information-dense. But well-written human stuff is usually information-dense, and could have surprising twists of thought or rhetoric that demand you to actually properly read it. And LLM outputs – including, as it turns out, Deep Research's – are usually very water-y.
Altman’s model of the how AGI will impact the world is super weird if you take it seriously as a physical model of a future reality
My instinctive guess is that these sorts of statements from OpenAI are Blatant Lies intended to lower the AGI labs' profile and ensure there's no widespread social/political panic. There's a narrow balance to maintain, between generating enough hype targeting certain demographics to get billions of dollars in investments from them ("we are going to build and enslave digital gods and take over the world, do you want to invest in us and get a slice of the pie, or miss out and end up part of the pie getting sliced up?") and not generate so much hype of the wrong type that the governments notice and nationalize you ("it's all totally going to be business-as-usual, basically just a souped-up ChatGPT, no paradigm shifts, no redistribution of power, Everything will be Okay").
Sending contradictory messages such that each demographic hears only what they want to hear is a basic tactic for this. The tech investors buy the hype/get the FOMO and invest, the politicians and the laymen dismiss it and do nothing.
They seem to be succeeding at striking the right balance, I think. Hundreds of billions of dollars going into it from the private sector while the governments herp-derp.
certainly possible that the first AGI-level product will come out – maybe it’s a new form of Deep Research, let’s say – and initially most people don’t notice or care all that much
My current baseline expectation is that it won't look like this (unless the AGI labs/the AGI will want to artificially make it look like this). Attaining actual AGI, instead of the current shallow facsimiles, will feel qualitatively different.
For me, with LLMs, there's a palatable sense that they need to be babied and managed and carefully slotted into well-designed templates or everything will fall apart. It won't be like that with an actual AGI, an actual AGI would be exerting optimization pressure from its own end to make things function.
Relevant meme
There'll be a palatable feeling of "lucidity" that's currently missing with LLMs. You wouldn't confuse the two if you had their chat windows open side by side, and the transformative effects will be ~instant.
Competitive agents will chose to commit suicide, knowing it's suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn't happen?
Are you quite sure the underlying issue here is not that the competitive agents don't believe the suicide raise to be a suicide race?