Unfortunately what you say sounds somewhat plausible to me; I look forward to hearing the responses.
I'll add this additional worry: If you are an early chemist exploring the properties of various metals, and you discover a metal that gets harder as it gets colder, this should increase your credence that there are other metals that share this property. Similarly, I think, for AI architectures. The GPT architecture seems to exhibit pretty awesome scaling properties. What if there are other architectures that also have awesome scaling properties, such that we'll discover this soon? How many architectures have had 1,000+ PF-days pumped into them? Seems like just two or three. And equally importantly, how many architectures have been tried with 100+ billion parameters? I don't know, please tell me if you do.
EDIT: By "architectures" I mean "Architectures + training setups (data, reward function, etc.)"
I find this interesting in the context of the recent podcast on errors in the classic arguments for AI risk - which boil down to, there is no necessary reason why instrumental convergence or orthogonality apply to your systems, and there are actually strong reasons, a priori, to think increasing AI capabilities and increasing AI alignment go together to some degree... and then GPT-3 comes along, and suggests that, practically speaking, you can get highly capable behaviour that scales up easily without much in the way of alignment.
On the one hand, GPT-3 is quite useful while being not robustly aligned, but on the other hand GPT-3's lack of alignment is impeding its capabilities to some degree.
Maybe if you update on both you just end up back where you started.
I think the errors in the classic arguments have been greatly exaggerated. So for me the update is just in one direction.
I don't have an elevator pitch summary of my views yet, and it's possible that my interpretation of the classic arguments is wrong, I haven't reread them recently. But here's an attempt:
--The orthogonality thesis and convergent instrumental goals arguments, respectively, attacked and destroyed two views which were surprisingly popular at the time: 1. that smarter AI would necessarily be good (unless we deliberately programmed it not to be) because it would be smart enough to figure out what's right, what we intended, etc. and 2. that smarter AI wouldn't lie to us, hurt us, manipulate us, take resources from us, etc. unless it wanted to (e.g. because it hates us, or because it has been programmed to kill, etc) which it probably wouldn't. I am old enough to remember talking to people who were otherwise smart and thoughtful who had views 1 and 2.
--As for whether the default outcome is doom, the original argument makes clear that default outcome means absent any special effort to make AI good, i.e. assuming everyone just tries to make it intelligent, but no effort is spent on making it good, the outcome is likely to be doom. This is, I think, true. La...
--The orthogonality thesis and convergent instrumental goals arguments, respectively, attacked and destroyed two views which were surprisingly popular at the time: 1. that smarter AI would necessarily be good (unless we deliberately programmed it not to be) because it would be smart enough to figure out what's right, what we intended, etc. and 2. that smarter AI wouldn't lie to us, hurt us, manipulate us, take resources from us, etc. unless it wanted to (e.g. because it hates us, or because it has been programmed to kill, etc) which it probably wouldn't. I am old enough to remember talking to people who were otherwise smart and thoughtful who had views 1 and 2.
Speaking from personal experience, those views both felt obvious to me before I came across Orthogonality Thesis or Instrumental convergence.
--As for whether the default outcome is doom, the original argument makes clear that default outcome means absent any special effort to make AI good, i.e. assuming everyone just tries to make it intelligent, but no effort is spent on making it good, the outcome is likely to be doom. This is, I think, true.
It depends on what you mean by 'special effort...
I think that the criticism sees it the second way and so sees the arguments as not establishing what they are supposed to establish, and I see it the first way - there might be a further fact that says why OT and IC don't apply to AGI like they theoretically should, but the burden is on you to prove it. Rather than saying that we need evidence OT and IC will apply to AGI.
I agree with that burden of proof. However, we do have evidence that IC will apply, if you think we might get AGI through RL.
I think that hypothesized AI catastrophe is usually due to power-seeking behavior and instrumental drives. I proved that that optimal policies are generally power-seeking in MDPs. This is a measure-based argument, and it is formally correct under broad classes of situations, like "optimal farsighted agents tend to preserve their access to terminal states" (Optimal Farsighted Agents Tend to Seek Power, §6.2 Theorem 19) and "optimal agents generally choose paths through the future that afford strictly more options" (Generalizing the Power-Seeking Theorems, Theorem 2).
The theorems aren't conclusive evidence:
I think the purpose of the OT and ICT is to establish that lots of AI safety needs to be done. I think they are successful in this. Then you come along and give your analogy to other cases (rockets, vaccines) and argue that lots of AI safety will in fact be done, enough that we don't need to worry about it. I interpret that as an attempt to meet the burden, rather than as an argument that the burden doesn't need to be met.
But maybe this is a merely verbal dispute now. I do agree that OT and ICT by themselves, without any further premises like "AI safety is hard" and "The people building AI don't seem to take safety seriously, as evidenced by their public statements and their research allocation" and "we won't actually get many chances to fail and learn from our mistakes" does not establish more than, say, 1% credence in "AI will kill us all," if even that. But I think it would be a misreading of the classic texts to say that they were wrong or misleading because of this; probably if you went back in time and asked Bostrom right before he published the book whether he agrees with you re the implications of OT and ICT on their own, he would have completely agreed. And the text itself seems to agree.
I don't want to take away from MIRI's work (I still support them, and I think that if the GPTs peter out, we'll be glad they've been continuing their work), but I think it's an essential time to support projects that can work for a GPT-style near-term AGI
I'd love to know of a non-zero integer number of plans that could possibly, possibly, possibly work for not dying to a GPT-style near-term AGI.
Here are 11. I wouldn't personally assign greater than 50/50 odds to any of them working, but I do think they all pass the threshold of “could possibly, possibly, possibly work.” It is worth noting that only some of them are language modeling approaches—though they are all prosaic ML approaches—so it does sort of also depend on your definition of “GPT-style” how many of them count or not.
Pretty sure OpenPhil and OpenAI currently try to fund plans that claim to look like this (e.g. all the ones orthonormal linked in the OP), though I agree that they could try increasing the financial reward by 100x (e.g. a prize) and see what that inspires.
If you want to understand why Eliezer doesn't find the current proposals feasible, his best writeups critiquing them specifically are this long comment containing high level disagreements with Alex Zhu's FAQ on iterated amplification and this response post to the details of Iterated Amplification.
As I understand it, the high level summary (naturally Eliezer can correct me) is that (a) corrigible behaviour is very unnatural and hard to find (most nearby things in mindspace are not in equilibrium and will move away from corrigibility as they reflect / improve), and (b) using complicated recursive setups with gradient descent to do supervised learning is incredible chaotic and hard to manage, and shouldn't be counted on working without major testing and delays (i.e. could not be competitive).
There's also some more subtle and implicit disagreement that's not been quite worked out but feeds into the above, where a lot of the ML-focused...
As I understand it, the high level summary (naturally Eliezer can correct me) is that (a) corrigible behaviour is very unnatural and hard to find (most nearby things in mindspace are not in equilibrium and will move away from corrigibility as they reflect / improve), and (b) using complicated recursive setups with gradient descent to do supervised learning is incredible chaotic and hard to manage, and shouldn't be counted on working without major testing and delays (i.e. could not be competitive).
Perhaps Eliezer can interject here, but it seems to me like these are not knockdown criticisms that such an approach can't “possibly, possibly, possibly work”—just reasons that it's unlikely to and that we shouldn't rely on it working.
My model is that those two are the well-operationalised disagreements and thus productive to focus on, but that most of the despair is coming from the third and currently more implicit point.
Stepping back, the baseline is that most plans are crossing over dozens of kill-switches without realising it (e.g. Yann LeCun's "objectives can be changed quickly when issues surface").
Then there are more interesting proposals that require being able to fully inspect the cognition of an ML system and have it be fully introspectively clear and then use it as a building block to build stronger, competitive, corrigible and aligned ML systems. I think this is an accurate description of Iterated Amplification + Debate as Zhu says in section 1.1.4 of his FAQ, and I think something very similar to this is what Chris Olah is excited about re: microscopes about reverse engineering the entire codebase/cognition of an ML system.
I don't deny that there are lot of substantive and fascinating details to a lot of these proposals and that if this is possible we might indeed solve the alignment problem, but I think that is a large step that sounds from some initial perspectives kind of magical. And don't...
I feel like it's one reasonable position to call such proposals non-starters until a possibility proof is shown, and instead work on basic theory that will eventually be able to give more plausible basic building blocks for designing an intelligent system.
I agree that deciding to work on basic theory is a pretty reasonable research direction—but that doesn't imply that other proposals can't possibly work. Thinking that a research direction is less likely to mitigate existential risk than another is different than thinking that a research direction is entirely a non-starter. The second requires significantly more evidence than the first and it doesn't seem to me like the points that you referenced cross that bar, though of course that's a subjective distinction.
As for planning, we've seen the GPTs ascend from planning out the next few words, to planning out the sentence or line, to planning out the paragraph or stanza. Planning out a whole text interaction is well within the scope I could imagine for the next few iterations, and from there you have the capability of manipulation without external malicious use.
Perhaps a nitpick, but is what it does planning?
Is it actually thinking several words ahead (a la AlphaZero evaluating moves) when it decides what word to say next, or is it just doing free-writing, and it just happens to be so good at coming up with words that fit with what's come before that it ends up looking like a planned out text?
You might argue that if it ends up as-good-as-planned, then it doesn't make a difference if it was actually planned or not. But it seems to me like it does make a difference. If it has actually learned some internal planning behavior, then that seems more likely to be dangerous and to generalize to other kinds of planning.
That's not a nitpick at all!
Upon reflection, the structured sentences, thematically resolved paragraphs, and even JSX code can be done without a lot of real lookahead. And there's some evidence it's not doing lookahead - its difficulty completing rhymes when writing poetry, for instance.
(Hmm, what's the simplest game that requires lookahead that we could try to teach to GPT-3, such that it couldn't just memorize moves?)
Thinking about this more, I think that since planning depends on causal modeling, I'd expect the latter to get good before the former. But I probably overstated the case for its current planning capabilities, and I'll edit accordingly. Thanks!
Yes! I was thinking about this yesterday, it occurred to me that GPT-3's difficulty with rhyming consistently might not just be a byte-pair problem, any highly structured text with extremely specific, restrictive forward and backward dependencies is going to be a challenge if you're just linearly appending one token at a time onto a sequence without the ability to revise it (maybe we should try a 175-billion parameter BERT?). That explains and predicts a broad spectrum of issues and potential solutions (here I'm calling them A, B and C): performance should correlate to (1) the allowable margin of error per token-group (coding syntax is harsh, solving math equations is harsh, trying to come up with a rhyme for 'orange' after you've written it is harsh), and (2) the extent to which each token-group depends on future token-groups. Human poets and writers always go through several iterations, but we're asking it to do what we do in just one pass.
So in playing around with GPT-3 (AID), I've found two (three?) meta approaches for dealing with this issue. I'll call them Strategies A, B and C.
A is the more general one. You just give it multiple ...
I'm confused about the "because I could not stop for death" example. You cite it as an example of GPT-3 developing "the sense of going somewhere, at least on the topic level," but it seems to have just memorized the Dickinson poem word for word; the completion looks identical to the original poem except for some punctuation.
(To be fair to GPT-3, I also never remember where Dickinson puts her em dashes.)
I think GPT-N is definitely not aligned, for mesa-optimizer reasons. It'll be some unholy being with a superhuman understanding of all the different types of humans, all the different parts of the internet, all the different kinds of content and style... but it won't itself be human, or anything close.
Of course, it's also not outer-aligned in Evan's sense, because of the universal prior being malign etc.
Suppose that GPT-6 does turn out to be some highly transformative AI capable of human-level language understanding and causal reasoning? What would the remaining gap be between that and an Agentive AGI? Possibly, it would not be much of a further leap.
There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of 'GPT-6' as I see them underlined:
Stuart Russell’s List
human-like language comprehension
cumulative learning
discovering new action sets
managing its own mental activity
For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960
perception and object recognition
efficient search over known facts
So we'd have discovering new action sets, and managing mental activity - effectively, the things that facilitate long-range complex planning, remaining. Unless you think those could also arise with GPT-N?
Suppose GPT-8 gives you all of those, just spontaneously, but its nothing but a really efficient text-predictor. Supposing that no dangerous mesa-optimisers arise, what then? Would it be relatively easy to turn it into something agentive, or would agent-like behavi...
You're careful here to talk about transformative AI rather than AGI, and I think that's right. GPT-N does seem like it stands to have transformative effects without necessarily being AGI, and that is quite worrisome. I think many of us expected to find ourselves in a world where AGI was primarily what we had to worry about, and instead we're in a world where "lesser" AI is on track to be powerful enough to dramatically change society. Or at least, so it seems from where we stand, extracting out the trends.
I have to further compliment my past self: this section aged extremely well, prefiguring the Shoggoth-with-a-smiley-face analogies several years in advance.
...GPT-3 is trained simply to predict continuations of text. So what would it actually optimize for, if it had a pretty good model of the world including itself and the ability to make plans in that world?
One might hope that because it's learning to imitate humans in an unsupervised way, that it would end up fairly human, or at least act in that way. I very much doubt this, for the following reason:
- Two hum
There are some posts with perennial value, and some which depend heavily on their surrounding context. This post is of the latter type. I think it was pretty worthwhile in its day (and in particular, the analogy between GPT upgrades and developmental stages is one I still find interesting), but I leave it to you whether the book should include time capsules like this.
It's also worth noting that, in the recent discussions, Eliezer has pointed to the GPT architecture as an example that scaling up has worked better than expected, but he diverges from the thes...
I think it's an essential time to support projects that can work for a GPT-style near-term AGI , for instance by incorporating specific alignment pressures during training. Intuitively, it seems as if Cooperative Inverse Reinforcement Learning or AI Safety via Debate or Iterated Amplification are in this class.
As I argued here, I think GPT-3 is more likely to be aligned than whatever we might do with CIRL/IDA/Debate ATM, since it is trained with (self)-supervised learning and gradient descent.
The main reason such a system could pose an x-risk by its...
Next, we might imagine GPT-N to just be an Oracle AI, which we would have better hopes of using well. But I don't expect that an approximate Oracle AI could be used safely with anything like the precautions that might work for a genuine Oracle AI. I don't know what internal optimizers GPT-N ends up building along the way, but I'm not going to count on there being none of them.
Is the distinguishing feature between Oracle AI and approximate Oracle AI, as you use the terms here, just about whether there are inner optimizers or not?
(When I started the paragrap...
And the really worrisome capability comes when it models its own interactions with the world, and makes plans with that taken into account.
Someone who's been playing with GPT-3 as a writing assistant gives an example which looks very much like GPT-3 describing this process:
..."One could write a program to generate a story that would create an intelligence. One could program the story to edit and refine itself, and to make its own changes in an attempt to improve itself over time. One could write a story to not only change the reader, but also to change
GPT-3 can generate a plan and then a way to implement it: bold is prompt.
"Below is a plan of preparing a dinner.
1) Chop up some vegetables and put them in the fridge for later use.
2) Cook some meat, then you can eat it tonight!
3) Wash your hands, because this is going to be messy!
4) And lastly...
5) Eat!
Now you start doing it:
You chop up some carrots, onions and potatoes. You cook some beef, then you can have dinner tonight!
After eating, you wash your hands and get ready for bed. You don't know how long it will take before you feel sleepy again s...
I don't believe rhymes are an example of a failure to plan. They are a clearcut case of BPE problems.
They follow the same patterns as other BPE problems: works on the most common (memorized) instances, rapidly degrading with rarity, the relevant information cannot be correctly represented by BPEs, they are inherently simple yet GPT-3 performs really badly despite human-like performance on almost identical tasks (like non-rhyming poetry, or non-pun based humor), and have improved minimally over GPT-2. With rhymes, it's even more clearly not a planning problem because Peter Vessenes, I think, on the Slack set up a demo problem where the task was merely to select the rhyming word for a target word out of a prespecified list of possible rhymes; in line with BPEs, GPT-3 could correctly select short common rhyme pairs, and then fell apart as soon as you used rarer words. Similarly, I found little gain for prespecified rhymes. The problem is not that GPT-3 can't plan good rhymes, the problem is that GPT-3 doesn't know what words rhyme, period.
As far as planning goes, next-token prediction is entirely consistent with implicit planning. During each forward pass, GPT-3 probably has plenty of...
Funny thing about BPEs: GPT-3 has to know about the individual letters, because I taught it how to spell both real words and nonsense words. (Prompts in bold, including two where I edited GPT-3's responses.)
The students were ready to test their spelling.
The teacher stood at the front of the class. "Adam, please spell PARTY."
Adam replied, "Party. P-A-R-T-Y."
The teacher said, "Correct. Beatrice, please spell THROUGH."
Beatrice replied, "Through. T-H-O-R-O-U-G-H."
The teacher said, "Incorrect. Through is spelled T-H-R-O-U-G-H. Carlos, please spell SPELL."
Carlos replied, "Spell. S-P-E-L-L."
The teacher said, "Correct. Daphne, please spell NUCLEAR."
Daphne replied, "Nuclear. N-U-C-L-E-A-R."
The teacher said, "Correct. Adam, please spell INFINITE."
Adam replied, "Infinite. I-N-F-I-N-A-T-E."
The teacher replied, "Incorrect. Infinite is spelled I-N-F-I-N-I-T-E. Beatrice, please spell BALLOON."
Beatrice replied, "Balloon. B-A-L-L-O-O-N."
The teacher replied, "Correct. Carlos, please spell ENCLOSURE."
Carlos replied, "Enclosure. I-N-C-L-O-S-U-R-E."
The teacher replied, "Incorrect. Enclosure is spelled E-N-C-L-O-S-U-R-E. Daphne, please spell ELECTRON."
Daphne replied, "Electron. E-L-E-C-T-R-O-N."
Th...
It would also be very useful to build some GPT feature "visualization" tools ASAP.
Do you have anything more specific in mind? I see the Image Feature Visualization tool, but in my mind it's basically doing exactly what you're already doing by comparing GPT-2 and GPT-3 snippets.
Epistemic Status: I only know as much as anyone else in my reference class (I build ML models, I can grok the GPT papers, and I don't work for OpenAI or a similar lab). But I think my thesis is original.
Related: Gwern on GPT-3
For the last several years, I've gone around saying that I'm worried about transformative AI, an AI capable of making an Industrial Revolution sized impact (the concept is agnostic on whether it has to be AGI or self-improving), because I think we might be one or two cognitive breakthroughs away from building one.
GPT-3 has made me move up my timelines, because it makes me think we might need zero more cognitive breakthroughs, just more refinement / efficiency / computing power: basically, GPT-6 or GPT-7 might do it. My reason for thinking this is comparing GPT-3 to GPT-2, and reflecting on what the differences say about the "missing pieces" for transformative AI.
My Thesis:
The difference between GPT-2 and GPT-3 has made me suspect that there's a legitimate comparison to be made between the scale of a network architecture like the GPTs, and some analogue of "developmental stages" of the resulting network. Furthermore, it's plausible to me that the functions needed to be a transformative AI are covered by a moderate number of such developmental stages, without requiring additional structure. Thus GPT-N would be a transformative AI, for some not-too-large N, and we need to redouble our efforts on ways to align such AIs.
The thesis doesn't strongly imply that we'll reach transformative AI via GPT-N especially soon; I have wide uncertainty, even given the thesis, about how large we should expect N to be, and whether the scaling of training and of computation slows down progress before then. But it's also plausible to me now that the timeline is only a few years, and that no fundamentally different approach will succeed before then. And that scares me.
Architecture and Scaling
GPT, GPT-2, and GPT-3 use nearly the same architecture; each paper says as much, with a sentence or two about minor improvements to the individual transformers. Model size (and the amount of training computation) is really the only difference.
GPT took 1 petaflop/s-day to train 117M parameters, GPT-2 took 10 petaflop/s-days to train 1.5B parameters, and the largest version of GPT-3 took 3,000 petaflop/s-days to train 175B parameters. By contrast, AlphaStar seems to have taken about 30,000 petaflop/s-days of training in mid-2019, so the pace of AI research computing power projects that there should be about 10x that today. The upshot is that OpenAI may not be able to afford it, but if Google really wanted to make GPT-4 this year, they could afford to do so.
Analogues to Developmental Stages
There are all sorts of (more or less well-defined) developmental stages for human beings: image tracking, object permanence, vocabulary and grammar, theory of mind, size and volume, emotional awareness, executive functioning, et cetera.
I was first reminded of developmental stages a few years ago, when I saw the layers of abstraction generated in this feature visualization tool for GoogLeNet.
We don't have feature visualization for language models, but we do have generative outputs. And as you scale up an architecture like GPT, you see higher levels of abstraction. Grammar gets mastered, then content (removing absurd but grammatical responses), then tone (first rough genre, then spookily accurate authorial voice). Topic coherence is mastered first on the phrase level, then the sentence level, then the paragraph level. So too with narrative flow.
Gwern's poetry experiments (GPT-2, GPT-3) are good examples. GPT-2 could more or less continue the meter of a poem and use words that fit the existing theme, but even its best efforts can get stuck in topic loops:
Or:
GPT-3, though, has the sense of going somewhere, at least on the topic level. (Prompts in bold.)
[EDIT: Previously I also included its completion of a famous Emily Dickinson poem here, but as benkuhn pointed out, GPT-3 had simply memorized the poem and recited it. I'm really embarrassed, and also kind of shocked that I looked at the actual text of "Because I could not stop for Death" and thought, "yup, that looks like something GPT-3 could produce".]
(One last shocking bit is that, while GPT-2 had to be fine-tuned by taking the general model and training it some more on a poetry-only dataset, you're seeing what GPT-3's model does with no fine-tuning, with just a prompt that sounds poetic!)
Similarly, GPT-3's ability to write fiction is impressive- unlike GPT-2, it doesn't lose track of the plot, it has sensible things happen, it just can't plan its way to a satisfying resolution.
I'd be somewhat surprised if GPT-4 shared that last problem.
What's Next?
How could one of the GPTs become a transformative AI, even if it becomes a better and better imitator of human prose style? Sure, we can imagine it being used maliciously to auto-generate targeted misinformation or things of that sort, but that's not the real risk I'm worrying about here.
My real worry is that causal inference and planning are starting to look more and more like plausible developmental stages that GPT-3 is moving towards, and that these were exactly the things I previously thought were the obvious obstacles between current AI paradigms and transformative AI.
Learning causal inference from observations doesn't seem qualitatively different from learning arithmetic or coding from examples (and not only is GPT-3 accurate at adding three-digit numbers, but apparently at writing JSX code to spec), only more complex in degree.
One might claim that causal inference is harder to glean from language-only data than from direct observation of the physical world, but that's a moot point, as OpenAI are using the same architecture to learn how to infer the rest of an image from one part.
Planning is more complex to assess. We've seen GPTs ascend from coherence of the next few words, to the sentence or line, to the paragraph or stanza, and we've even seen them write working code. But this can be done without planning; GPT-3 may simply have a good enough distribution over next words to prune out those that would lead to dead ends. (On the other hand, how sure are we that that's not the same as planning, if planning is just pruning on a high enough level of abstraction?)
The bigger point about planning, though, is that the GPTs are getting feedback on one word at a time in isolation. It's hard for them to learn not to paint themselves into a corner. It would make training more finicky and expensive if we expanded the time horizon of the loss function, of course. But that's a straightforward way to get the seeds of planning, and surely there are other ways.
With causal modeling and planning, you have the capability of manipulation without external malicious use. And the really worrisome capability comes when it models its own interactions with the world, and makes plans with that taken into account.
Could GPT-N turn out aligned, or at least harmless?
GPT-3 is trained simply to predict continuations of text. So what would it actually optimize for, if it had a pretty good model of the world including itself and the ability to make plans in that world?
One might hope that because it's learning to imitate humans in an unsupervised way, that it would end up fairly human, or at least act in that way. I very much doubt this, for the following reason:
What we have with the GPTs is the first deep learning architecture we've found that scales this well in the domain (so, probably not that much like our particular architecture), learning to mimic humans rather than growing in an environment with similar pressures. Why should we expect it to be anything but very alien under the hood, or to continue acting human once its actions take us outside of the training distribution?
Moreover, there may be much more going on under the hood than we realize; it may take much more general cognitive power to learn and imitate the patterns of humans, than it requires us to execute those patterns.
Next, we might imagine GPT-N to just be an Oracle AI, which we would have better hopes of using well. But I don't expect that an approximate Oracle AI could be used safely with anything like the precautions that might work for a genuine Oracle AI. I don't know what internal optimizers GPT-N ends up building along the way, but I'm not going to count on there being none of them.
I don't expect that GPT-N will be aligned or harmless by default. And if N isn't that large before it gets transformative capacity, that's simply terrifying.
What Can We Do?
While the short timeline suggested by the thesis is very bad news from an AI safety readiness perspective (less time to come up with better theoretical approaches), there is one silver lining: it at least reduces the chance of a hardware overhang. A project or coalition can feasibly wait and take a better-aligned approach that uses 10x the time and expense of an unaligned approach, as long as they have that amount of resource advantage over any competitor.
Unfortunately, the thesis also makes it less likely that a fundamentally different architecture will reach transformative status before something like GPT does.
I don't want to take away from MIRI's work (I still support them, and I think that if the GPTs peter out, we'll be glad they've been continuing their work), but I think it's an essential time to support projects that can work for a GPT-style near-term AGI, for instance by incorporating specific alignment pressures during training. Intuitively, it seems as if Cooperative Inverse Reinforcement Learning or AI Safety via Debate or Iterated Amplification are in this class.
We may also want to do a lot of work on how better to mold a GPT-in-training into the shape of an Oracle AI.
It would also be very useful to build some GPT feature "visualization" tools ASAP.
In the meantime, uh, enjoy AI Dungeon, I guess?