I think stealth hard counters a technologically backward opponent like Iran. If one's opponent has very few or very old sensor systems, then stealth can grant the ability to just fly over someone and drop bombs on them, which is pretty great, because bombs are cheap.
But in (say) a peer US-China conflict, drones are also sensing platforms.
In a potential conflict with China, China will have sensors on numerous (non-electric) unmanned drones, on submarine drones, on high-powered manned aircraft further back, on their satellites, all along their coastline and so on. They've had decades to prepare while stealth was the obvious number one American strategy; this is a very-well considered response. "Stealth" becomes a much more relative term; the advantage almost certainly looks more like, "Well, we can get N% closer before they see us" than anything transformational.
Anyhow, in a peer conflict, due to all the above, I expect even relatively stealthy US platforms like the B-21 to be using their stealth to haul LRASMs just a little closer before dropping them off; and even with conservative deployment like that, I expect notable losses.
If the US actually tries to drop bombs with them I expect very high losses (in the timeframe of days / weeks) that the US will be completely unable to sustain over time.
It's worth noting that Israel's strikes on Iran also involved using their special forces to disable Iranian air defense, using special forces and -- of course -- drones. So Iran's already-inferior defense against Israeli bombers was already notably degraded before overflights.
And if anyone's striking anyone with drones in an unexpected way in a peer conflict, it's not gonna necessarily be the US doing the striking. Ukraine destroyed a few hundred million dollars of Russian bomber hardware by smuggling a handful of cheap-ass drones into Russia; is China or the US going to have an easier time doing a similar feat? Are anyone's current air bases well protected to this? Whose airbases have more hardened shelters that prevent such cheap attacks?
Good summary of some pros and cons. Neutral lists of arguments both for and against something that don't try to come to a conclusion are rationally virtuous, so kudos.
An addition point con-neuralese -- it's a mix of 1, 3, 7, but you didn't mention it: I think CoT might give you a capability lever as well as an alignment lever.
Specifically, a CoT (or a stream of neuralese) could be legible to the model (or not legible). This may enable some process like this.
The above is a rather specific proposal, and so has many burdensome details that are likely to be wrong, but I think something like this is not entirely unlikely.
From another angle -- from one of the links you have above has Nostalgebraist talk about how from the users perspective, a legiible CoT is a capability enhancement:
In short: from an end user's perspective, CoT visibility is a capabilities improvement.
I ended up just switching to 3.7 Sonnet for the task discussed above – not because it was "smarter" as a model in any way I knew about, but simply because the associated API made it so much easier to construct prompts that would effectively leverage its intelligence for my purposes.
This strikes me as a very encouraging sign for the CoT-monitoring alignment story.
But from my angle, this is also a capability bonus -- because it allows the model to look at itself, etc?
I'm not sure about this of course, but I thought it worth adding.
Nice work.
I have thought for a while that if you relax some of the diagetically unmotivated parts of the alien prediction (why bear fat? why zero protein? raw?) then you end up with something like maple-glazed bacon. Which is also mass-produced, and quite popular.
But it's nice to see that even with some of the additional questionable constraints the aliens still come through.
So a thing I've been trying to look at is get a better notion of "What actually is it about human intelligence that lets us be the dominant species?" Like, "intelligence" is a big box that holds which specific behaviors? What were the actual behaviors that evolution reinforced, over the course of giving of big brains? Big question, hard to know what's the case.
I'm in the middle of "Darwin's Unfinished Symphony", and finding it at least intriguing as a look how creativity / imitation are related, and how "imitation" is a complex skill that humans are nevertheless supremely good at. (The "Secret of Our Success" is another great read here of course.)
Both of these kinda about the human imitation prior... in humans. And why that may be important. So I think if one is thinking around the human-imitation prior being powerful, it would make sense to read them as cases for why something like the human imitation prior is also powerful in humans :)
They don't give straight answers to any questions about AI, of course, and I'd be sympathetic to the belief that they're irrelevant or kinda a waste of time, and frankly they might be a waste of time depending on what you're funging against. I'm not saying they answer any question; I'm saying they're interesting. But I think they're good reads if one's approaching from the angle of "Intelligence is what lets humans dominate the earth" and want a particular angle on how "intelligence" is a mixed bag of some different skills, at least some of which are probably not general search and planning. So, yeah.
You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?
Yeah that's right, the Medieval Catholic Scholastic God.
Do you have examples of the kind of models / kind of questions that lead to this most strongly? I've been collecting behaviors but it's slow work reading a lot of CoTs and so anything would be welcome :)
Note that Nostalgebraist and Olli's comments on the original paper argue (imo cogently) that the original paper's framing is pretty misleading / questionable.
It looks like many of their points would carry over to this.
Combine it with other options and you rapidly build up a decent channel. And those options are myriad.
So here's the thing: your word choice between "robust plan" or "reasonable plan" or "clear plan" or "smart plan" already matters. It already carries information -- I might say, "Well, that''s a reasonable plan but not a robust plan," or "That seems like a smart plan, but it's not clear." There are infinite shades of meaning like this in English, and competence in using English entails command over all these shades.
So the question then is -- why would the LLM build up some channel, along angles entirely orthogonal to these existing shades, when these shades are here? You could decide on some encoding between firm / robust / sound / reliable -- but these things already have meaning, wouldn't you just use the meaning already there? There are a bunch of papers on how RL is mostly magnifying existing patterns in the LLMs, and is teaching how to employ skills already latent in the training data, rather than teaching totally new things -- if this is so, wouldn't we expect them to keep using these terms? Or there is work about how it's genuinely hard for LLMs to reason steganographically, even when we try to teach them to -- doesn't that incline against such a belief.
Or put alternately -- I can, within myself, connect "robust" and "reasonable" and so on with a huge internal-to-my-brain- channel, containing enormous bandwidth! If I want to make more information dense private language I could! But in fact, I find myself thinking almost exclusively in terms that make sense to others -- when I find myself using a private language, and terms that don't make sense to others, that's usually a sign my thoughts are unclear and likely wrong.
At least, those are some of the heuristics you'd invoke when inclining the other way. Empiricism will show us which is right :)
This comment does really kinda emphasize to me how much people live in different worlds.
Like -- I don't think my models lie to me a bunch.
In part this is because I think have reasonable theory-of-mind for the models; I try to ask questions in areas where they will be able to say true things or mostly true things. This doesn't feel weird; this is is part of courtesy when dealing with humans, of trying not to present adversarial inputs; so obviously I'm going to extend the same courtesy to LLMs.
And when models say things that aren't true, I very often don't perceive that as "lying," just as I don't perceive a guy who is trying to explain a thing to me and fumbles his facts as "lying." People who are "lying" are doing the kind of thing where they are cognizant of the truth and not telling me that, for some other purpose ("Yeah the car has never been in an accident") or also those who are self-deceiving themselves for the purposes of deceiving me ("Of course I'd always tell you before doing X"), or some similar act. Many cases of people fumbling the truth don't fall into this framework: a small nephew who mangles his recollection of the day's events is not lying; my sick grandmother was not lying when she got confused; I wasn't lying when, at the DMV, I somehow absent-mindedly said my weight was 240 rather than 140; improv is not lying; jokes are not lying; certain kinds of playful exageration are not lying; anosagnosiacs are not lying. "Lying" is one of many ways of saying not-true things, and most not-true things that models say to me don't really seem to be lies.
So yeah. I don't know if the chief difference between me and you is that you act differently -- do you have different theory of mind about LLMs, that leads you to ask questions very differently than me? Or perhaps is the difference not in actions but interpretation -- we would see identical things, and you would describe it as lying and not me. And of course you might say, well, I am deluding myself with a rosy interpretive framework; or perhaps I am stupidly careless, and simply believe the false things LLMs tell me; and so on and so forth.
Anyhow, yeah, people do seem to live in different worlds. Although I do persist in the belief that, in general, LW is far too ready to leap from "not true statement" to "it's a lie."