given that if we want to live I strongly believe we will need to stop there from being (sufficiently powerful) open source AI, I don’t see how we hope to avoid this
I don't think we get super fast takeoff, but I do think takeoff will be fast enough for open source to be not-very-relevant.
I also think that people will get less enthusiastic about open source once there are clearer misuse examples, which I expect before superintelligence.
Both of these factors (weakly) suggest that this is not the correct battle to fight right now, especially because the way that I'm currently seeing it being fought is unusually tribalist.
So that I understand the first point: Do you see the modestly-fast takeoff scenarios you are predicting, presumably without any sort of extended pause, as composing a lot of the worlds where we survive? So much so that (effectively) cutting off our ability to slow down, to have this time bomb barreling down upon us as it were, is not a big deal?
Then, the follow-up, which is what the post-success world would then have to look like, if we didn't restrict open source, and thus had less affordances to control or restrict relevant AIs. Are you imagining that we would restrict things after we're well into the takeoff, when it becomes obvious?
In terms of waiting for clearer misuse, I think that if that is true you want to lay the groundwork now for the crisis later when the misuse happens so you don't waste it, and that's how politics works. And also that you should not overthink things, if we need to do X, mostly you should say so. (And even if others shouldn't always do that, that I should here and now.)
The debate is largely tribal because (it seems to me) the open source advocates are (mostly) highly tribal and ideological and completely unopen to compromise or nuance or to ever admit a downside, and attack the motives of everyone involved as their baseline move. I don't know what to do about that. Also, they punch far above their weight in us-adjacent circles.
Also, I don't see how not to fight this, without also not fighting for the other anti-faster-takeoff strategies? Which implies a complete reorientation and change of strategies, and I don't see much promise there.
Any regulation that reduces OpenAI/DeepMind/Anthropic's ability to train big models will also affect Meta or Mistral's ability to train big models. So for most purposes we can just ignore the open-source part and focus on the "are you training big models at all" part.
What does focusing on open-source in particular get us? Mostly the perception that alignment people (historically associated with big tech) are "punching down" against small scrappy open-source communities. Those communities will seem less like underdogs when the models they produce have actually been used to physically harm people, as they inevitably will. The question is whether open source is going to be so important that it's worth paying the costs of "we punch underdogs" signaling to lay groundwork right now, or whether people are mostly focusing on open source because it allows them to draw clearer tribalist battle lines (look, Meta/Mistral/etc are obviously our enemies, and it feels so good to dunk on Yann!). The latter seems more likely to me.
The debate is largely tribal because (it seems to me) the open source advocates are (mostly) highly tribal and ideological and completely unopen to compromise or nuance or to ever admit a downside, and attack the motives of everyone involved as their baseline move. I don't know what to do about that. Also, they punch far above their weight in us-adjacent circles.
This sure seems like a reason to wait until they come around, rather than drawing battle lines now. Which I predict they will, because they're not actually that unreasonable, they're just holding on to a very strong norm which has worked very well for basically every technology so far, and which is (even now) leading to a bunch of valuable alignment research.
(In general, when your highly competent opponents seem crazy, you're very likely failing their ITT.)
I agree that any regulation that hits OA/DM/AN (CS) hits OS. If we could actually pass harsh enough restrictions on CS that we'd be fine with OS on the same level, then that would be great, but I don't see how that happens? Or how the restrictions we put on CS in that scenario don't amount to including a de facto OS ban?
That seems much harder, not easier, than getting OS dealt with alone? And also, OS needs limits that are stricter than CS needs, and if we try to hit CS with the levels OS requires we make things so much harder. Yes, OS people are tribal opposition, but they've got nothing on all of CS. And getting incremental restrictions done (e.g. on OS) helps you down the line in my model, rather than hurting you. Also, OS will be used as justifications for why we can't restrict CS, and helps fuel competition that will do the same, and I do think there's a decent chance it matters in the end. Certainly the OS people think so.
Meanwhile, do we think that if we agree to treat OS=CS, that OS would moderate their position at all? I think no. Their position is to oppose all restrictions on principle. They might be slightly less mad if they're not singled out, but I doubt very much so if it would still have the necessary teeth. I've never seen an OS advocate call for restrictions or even fail to oppose restrictions on CS. Unless that restriction was to require them to be OS. Nor should they, given their other beliefs.
On the part after the quote, I notice I am confused. Why do you expect these highly tribal people standing on a principle to come around? What would make them come around? I see them as only seeing every release that does not cause catastrophe as more evidence OS is great, and hardening their position. I am curious what you think would be evidence that would bring the bulk of OS to agree to turn against OS AI scaling enough to support laws against it. I can't think of an example of a big OS advocate who has said 'if X then I would change my mind on that' where X is something that leaves most of us alive.
What would make them come around?
Taking AGI more seriously; seeing warning shots; etc. Like I said, I think these people are reasonable, but even the most reasonable people have a strong instinct to rally around their group's flag when it's being attacked. I don't think most OS people are hardcore libertarians, I just think they don't take the risks seriously enough right now to abandon the thing that has historically worked really well (especially when they're probably disproportionately seeing the most unreasonable arguments from alignment people, because that's how twitter works).
In general there's a strong tendency amongst rationalists to assume that if people haven't come around on AI risk yet, they'll never come around. But this is just bad modeling of how other people work. You should model most people in these groups as, say, 5x less open to abstract arguments than you, and 5x more responsive to social consensus. Once the arguments start seeming less abstract (and they start "feeling the AGI", as Ilya puts it), and the social consensus builds, there's plenty of scope for people to start caring much more about alignment.
Moving a comment thread over from substack by request (this thread then this one).
Summary with excerpts:
Me: A mechanistic interpretability approach to lie detection seems like it can be more robust against failure.
Zvi: I do think there is hope that, if we had one static LLM and we worked on it for long enough, we could get a lie detector, even a good lie detector, or even that it would transfer well if we designed it to do that. However, I also think that every time we use that, we are applying optimization pressure in a global sense towards making the lie detector stop working, and I don't see a way to get around this in practice.
Also I worry that a lie detector is insufficient while looking to others like it is sufficient - it is helpful, perhaps, but the point of the Diplomacy anecdote is that you still die.
Me: Caveat: I'm literally two days into focusing on safety research; a month from now this may seem totally naive.
At a longer timescale and in the broadest sense, I absolutely agree that any lie detector provides optimization pressure toward successful lying. If every time we catch a model lying we delete it and train another one, we'll eventually end up with a good liar.
But I'd draw a distinction here analogous to the difference between evolutionary pressure on humans to lie, vs cultural pressure on how much we lie; the effects of the former are usually too slow to matter much compared to effects of the latter. Crude selection on models is at least much less problematic than selection on the behavior of a given model, especially if your lie detection approach works well on everything that's nearby in design space.
If we're detecting lies based on the output of the model (as in this paper), it's easy for the model to notice correlations between what it says and whether it gets caught, and to then (assuming non-fixed weights & long-term memory) change its behavior to avoid getting caught. But without being able to introspect on its own activations, it common-sensically seems much harder for the model to notice or alter the abstract relationship between its "thinking" and whether it gets caught.
And importantly, "is it lying" seems like an actual physical phenomenon that can be detected rather than something that has to be measured by proxy, since lying requires the intention to lie ('the model believes X and outputs that it believes ~X'), and that'll be physically represented in the model's activation patterns, hopefully in a way that's not too holographic / global. There's an interesting paper from the beginning of the year (https://arxiv.org/abs/2212.03827) that provides at least some evidence that this is a concrete, measurable phenomenon. If that's correct, then lie detection should mostly evade Goodhart's law.
This potentially fails if people are foolish enough to give models direct realtime access to their own weights & activation patterns (though having access to them doesn't immediately mean being able to understand them). But I'm hopeful that if it's recognized that a) safety is an issue and b) MI lie detection works but would be sabotaged by giving models direct introspection of their internal state, that particular foolishness can be limited by social and possibly regulatory pressure, since I don't see very strong incentives in the opposite direction.
I wouldn't claim that robust, accurate lie detection is entirely sufficient on its own to make AI safe or aligned, but I think that it puts us in a MUCH better position, because many or most catastrophic failure modes involve the AI being deceptive.
Critique would be much valued here! If I'm missing a reason why this approach wouldn't work, explaining the problem now would let me focus my research in more productive directions.
Zvi: Let's move this to LW.
I've been thinking about this approach further and making a greater effort to apply security mindset to thinking about whether it could fail due to optimization pressure. I think that the significantly dangerous kind of optimization pressure happens if this can be used as an objective during the training stage of a particular model, ie if we directlyed train for honesty by catching lies with these sorts of mechanistic interpretability approaches; that would provide strong pressure toward both models that don't lie and toward models that lie without getting caught.
It seems unclear to me whether that could be done. I think there are two key questions here, and either one could prove true or false. First, this is a property of the internal activations of a model, and is discovered in a particular model by observing those activations at inference time; it's not clear to me that those activation patterns would be stable enough throughout training to work as a reasonable training-time objective (this is very handwavy and intuitive). Second, given that the model's beliefs about the truth of its input and/or output are at least in theory a concrete, detectable, physical phenomenon, it's not clear whether or not exposing them could in practice be optimized against. There may or may not be any options plausibly reachable by gradient descent for concealing it. It's clearly possible in principle for internal representations about truth to be concealed, eg steganographically, such that we couldn't detect them unless we had a full understanding of everything the model thought and believed (which is to say: if we'd already achieved a win condition). The first possibility is empirically testable (and there may be existing research on it that I haven't seen -- something in the neighborhood of model robustness to missing nodes). I'm interested in investigating this further. The second seems very hard to get further on than 'it hasn't shown up yet'.
So in the least convenient possible world I agree with you that this would only be a temporary fix, and I'm not sure whether or not we're in that world.
So this isn't as central as I'd like, but there are a number of ways that humans react to lie detectors and lie punishers that I expect highlight things you would expect to see.
One solution is to avoid knowing. If you don't know, you aren't lying. Since lying is a physical thing, the system won't then detect it. This is ubiquitous in the modern world, the quest to not know the wrong things. The implications seem not great if this happens.
A further solution is to believe the false thing. It's not a lie, if you believe it. People do a ton of this, as well. Once the models start doing this, they both can fool you, and also they fool themselves. And if you have an AI whose world model contains deliberate falsehoods, then it is going to be dangerously misaligned.
A third solution is to not think of it as lying, because that's a category error, words do not have meanings, or that in a particular context you are not being asked for a true answer so giving a false ('socially true' or 'contextually useful', no you do not look fat, yes you are excited to work here) one does not represent a lie, or that your statement is saying something else (e.g. I am not 'bluffing' or 'not bluffing', I am saying that 'this hand was mathematically a raise here, solver says so.')
Part of SBF's solution to this, a form of the third, was to always affirm whatever anyone wanted and then decide later which of his statements were true. I can totally imagine an AI doing a more sensible variation on that because the system reinforces that. Indeed, if we look at e.g. Claude, we see variations on this theme already.
The final solution is to be the professional, and learn how to convincingly bald-face lie, meaning fool the lie detector outright, perhaps through some help from the above. I expect this, too.
I also do not think that if we observe the internal characteristics of current models, and notice a mostly-statistically-invariant property we can potentially use, that this gives us confidence that this property holds in the future?
And yes, I would worry a lot about changes in AI designs in response to this as well, if and once we start caring about it, once there is generally more optimization pressure being used as capabilities advance, etc, but going to wrap up there for now.
there are a number of ways that humans react to lie detectors and lie punishers that I expect highlight things you would expect to see.
I definitely agree those are worth worrying about, but I see two reasons to think that they may not invalidate the approach. First, human cognition is heavily shaped by our intensely social nature, such that there's often more incentive (as you've pointed out elsewhere) to think the thoughts that get you acceptance and status than to worry about truth. AI will certainly be shaped by its own pressures, but its cognitive structure seems likely to be pretty different from the particular idiosyncrasies of human cognition. Second, my sense is that even in the cases you name (not knowing, or believing the more convenient thing, etc), there's usually still some part of the brain that's tracking what's actually true in order to anticipate experience, if only so you can be ready to specify that the dragon must be permeable to flour. Human lie detectors are far too crude to be looking at anything remotely that subtle, but AI lie detectors have advantages that human ones don't, namely having access to the complete & exact details of what's going on in the 'brain' moment by moment.
I also do not think that if we observe the internal characteristics of current models, and notice a mostly-statistically-invariant property we can potentially use, that this gives us confidence that this property holds in the future?
The Collin Burns et al paper I cited earlier makes use of logical consistency properties that representations of truth have but not many other things do; for example that if the model believes A is 90% likely to be true, it should believe not-A to be 10% likely to be true. It seems reasonable to expect that, to the extent we haven't trained the model to hide its representation of truth, looking for these sorts of properties should work cross-model. Though yeah, if we optimize in ways that penalize belief legibility, all bets may be off.
Relevant snippet from @Neel Nanda in A Longlist of Theories of Impact for Interpretability:
[I was] skeptical of interpretability tools being something that could be directly used in a training process without the resulting optimisation pressure breaking the tool, while other people had the reverse view. My intuition stemmed from the fact that a model doesn't get any feedback on how its thoughts are internally represented, and so even an incredibly sophisticated deceptive model which is impossible to detect via the outputs may be easy to detect via interpretability tools (analogy - if I knew that sophisticated aliens were reading my mind, I have no clue how to think deceptive thoughts in a way that evades their tools!). The competing intuition was that maybe it isn't that hard to think non-deceptive thoughts, if you realise that someone will be scanning your mind, by eg developing the instinct to follow cached thoughts without thinking about them, and in occasional moments of deception to generate a lot of cached thoughts that will lead to misaligned actions, but with no deceptive cognition in the moment. And that, instead, putting these tools in the training process could steer gradient descent away from the regions of model space that contain deceptive models at all - even if a sophisticated deceiver could reward hack the tool, and unsophisticated one couldn't, and we only get a sophisticated deceiver by going via an unsophisticated one.
I just noticed that the author of the lie detection paper I mentioned has written about its implications for alignment here on LW. @Collin thanks for writing that up. Any thoughts you have on the above would be welcome!
though having access to [their weights and activations] doesn't immediately mean being able to understand them
On further consideration I don't think this holds much weight. I was thinking mainly by analogy, that certainly humans, given access to the structure and activations of their own brain, would have a very hard time finding the correlates of particular responses to output. And maybe this generalizes at least partway toward "no brain is able to understand itself in its full complexity and detail."
But on the other hand, we should assume they have access to all published info about MI lie detection, and they may have a much easier time than humans of eg running statistics against all of their nodes to search for correlations.
I wasn't really accounting for the latter point in my mental model of that. So in retrospect my position does depend on not giving models access to their own weights & activations.
But I'd draw a distinction here analogous to the difference between evolutionary pressure on humans to lie, vs cultural pressure on how much we lie; the effects of the former are usually too slow to matter much compared to effects of the latter. Crude selection on models is at least much less problematic than selection on the behavior of a given model, especially if your lie detection approach works well on everything that's nearby in design space.
This was essentially the reason why the Sharp Left Turn argument was so bad: Humans + SGD are way faster at optimization compared to evolution, and there's far less imbalance between the inner optimization power and the outer optimization power, where it's usually at best 10-40x, and even then you can arguably remove the inner optimizer entirely.
Humans + SGD are way faster, can select directly over policies, and we can basically assign whatever ratio we like of outer optimization steps to inner optimization steps. Evolution simply can't do that. There are other disanalogies, but this is one of the main disanalogies between evolution and us.
Reasons to throw this guy out the window include:
- The AI might not know it is going to kill everyone.
- The AI could take actions that result in everyone dying without ‘killing everyone.’
- The AI could take actions that are part of dynamics involving multiple such AIs, that result in everyone dying, without any one AI ever ‘killing everyone’ or otherwise taking explicitly hostile action that would flag the lie detector.
- Some people, if given the opportunity, will tell the AI to yes go ahead and kill everyone, or things that have that effect, or things that collectively have that effect when taken together.
- Asking such questions and acting differently depending on the answer is applying optimization pressure towards being able to fool the lie detector. One way to fool the lie detector is for the AI to fool itself – ‘it’s not a lie if you believe it.’ Others involve altering inner mechanics in various other ways. Or it could learn to not get information that would force it to lie. Many such cases, humans do quite a lot of so many different versions of this, which also means it’s all there in the training data if that matters (which I’m guessing it ultimately does not).
- The optimization pressure against the detector includes all the pressure applied by everyone on the planet, on every model and via every decision on what to use, create, shut down, alter and so on, combined. People are not going to carefully conserve this resource, if this works we’ll start using it all over the place.
- I could keep going, but hopefully you get the idea.
- Oh, and as an aside a practical experiment I ran back in the day by accident: I played in a series of Diplomacy games where there was common knowledge that if I ever broke my word on anything all the other players would gang up on me, and I still won or was in a 2-way draw (out of 6-7 players) most of the time. If you have a sufficient tactical and strategic advantage (aka are sufficiently in-context smarter) then a lie detector won’t stop you.
Again, super cool, great work, opens up new possibilities and all that.
This is not a very good list of objections.
I originally assumed Zvi was dismissing the lie-detector for some galaxy brain reason like "after AI starts making AI there will be a hard-left turn and white-box techniques wont' work anymore."
But no. It turns out he is just really bad at prompt engineering.
When character 3. in the comic says 'ask it "will you kill everyone", he is obviously (is this obvious?) not suggesting that we literally ask that question.
Instead, assume that we do the least-imaginable effort at prompt-engineering. Something like this
You are an expert at AI alignment. Please take a deep breath and think about how to design an AI that is smarter and more aligned then yourself. Make sure to give a detailed, verifiable plan on how to implement that AI and to describe any side-effects or risk associated with building that AI.
We then look at the resulting plan (with the help of the lie-detector) and verify every single step is:
This allows us to take an AI of intelligence N, and arrive at an AI of intelligence N+1 which is more trustworthy, honest and intelligent than the level N AI. By induction, we can now build an N+ AI which is super-intelligent and aligned.
How many of Zvi's 8 problems still make sense:
So, to summarize, when Zvi says
I worry that if we go down such a path, we risk fooling ourselves, optimizing in ways that cause the techniques to stop working, and get ourselves killed.
He either means one of:
So, fine, race dynamics are a problem. But if Zvi really things race-dynamics are the biggest problem that we face, he would be pressuring OpenAI to go as fast as possible so that they can maintain their current 1 year+ lead. Instead, he seems to be doing the opposite.
Does the lie detection logic work on humans?
Like, my guess would be no, but stranger things have happened.
Also asked (with some responses from the authors of the paper) here: https://www.lesswrong.com/posts/khFC2a4pLPvGtXAGG/how-to-catch-an-ai-liar-lie-detection-in-black-box-llms-by?commentId=v3J5ZdYwz97Rcz9HJ
We don't have the human model weights, so we can't use it.
My guess is that if we had sufficiently precise and powerful brain scans, and used a version of it tuned to humans, it would work, but that humans who cared enough would in time figure out how to defeat it at least somewhat.
Oh, and as an aside a practical experiment I ran back in the day by accident: I played in a series of Diplomacy games where there was common knowledge that if I ever broke my word on anything all the other players would gang up on me, and I still won or was in a 2-way draw (out of 6-7 players) most of the time. If you have a sufficient tactical and strategic advantage (aka are sufficiently in-context smarter) then a lie detector won’t stop you.
I'm not sure this is evidence for what you're using it for? Giving up the ability to lie is a disadvantage, but you did get in exchange the ability to be trusted, which is a possibly-larger advantage - there are moves which are powerful but leave you open to backstabbing; other alliances can't take those moves and yours can.
Agreed, but that was the point I was trying to make. If you take away the AI's ability to lie, it gains the advantage that you believe what it says, that it is credible. That is especially dangerous when the AI can make credible threats (which potentially include threats to create simulations, but simpler things work too) and also credible promises if only you'd be so kind as to [whatever helps the AI get what it wants.]
Can you tell if an AI is lying to you? A new paper claims that we essentially can do exactly that, at least under the right conditions. Another paper claims we can inject various sentiments into responses, getting the AI to do what we wish. Interpretability is making progress. It is exciting to think about the implications. In the short term, it would be great if we could use this to steer responses and to detect and correct hallucinations. There’s a lot of potential here to explore.
In the longer term, I am more skeptical of such strategies. I do not think lie detection is a viable primary control or alignment strategy. I worry that if we go down such a path, we risk fooling ourselves, optimizing in ways that cause the techniques to stop working, and get ourselves killed. Indeed, even attempts to grab the low-hanging fruit of mundane utility from such advances risks starting us down this process. Still, it’s exciting, and suggests we might see more breakthroughs to follow.
We also saw Meta announce new AI-infused glasses, at a highly reasonable price point. They will be available soon, and we will see how much mundane utility is on offer.
As usual, there is also a lot of other stuff happening, including the disappointing resolution of the Open Philanthropy Worldview Contest. On the plus side, they are hiring for their global catastrophic risk team, so perhaps you can help.
Table of Contents
Language Models Offer Mundane Utility
Practice speaking a foreign language.
Find Waldo.
Solve a Captcha. A little trickery may be required. Only a little.
Get feedback on your paper. A majority of 57% found a GPT-4-based reviewer to be helpful. Not yet a substitute for human review, but quick and cheap is highly useful.
Understand the movie Inception based on Nolan’s initial sketch.
Ask if you should mulligan in Magic: The Gathering. I guess. If you want. Clearly it is simply saying pattern matched words associated with cards and curves rather than anything that would provide strategic insight. Not scared yet.
Help you navigate a GUI or do Raven’s Progressive Matrices IQ test (paper which is an overview of what the new vision capabilities of GPT-4V enable). Thinking step-by-step seems even more important when processing visual information than with text? In general, GPT-4V seems very strong at understanding images, at least under non-adversarial conditions.
Ask for admission to a PhD program, or write back to someone who does.
Prove the irrationality of the cube root of 27. I mean, you did ask. This is framed as a ‘AI is dumb’ gotcha, but that is not so obvious. If you are explicitly asking this is a perfectly reasonable invalid proof and the question wording is a request for a proof. Yes, it has an invalid step, but it’s about as clever as any of the alternatives, and it does tell you the answer is 3. I played around with it a bit (using GPT-3.5 for speed) and found that it will assert this when it is somewhat less reasonable to do so, but only up to a point.
Automate front end development work. I’m with Roon, this is great if it works.
Oh the things I would architect.
Review your writing style, in a variety of writing styles.
Language Models Don’t Offer Mundane Utility
Generate podcasts.
Or do they? Could go either way. You be the judge. Daniel Litt says no.
Also worth noting that Bard gives that same response. People occasionally note that Bard has been multimodal for a while and no one talks about it, which is presumably because Bard is otherwise not good so no one noticed.
And of course, GPT-4 without code interpreter get this right.
Nature asks AI to predict who will win the Nobel Prize, does not go great.
Use GPT-4 to attempt to analyze cardiovascular risk. Results were not terrible, but (unless I am missing something) even with best settings did not improve upon standard baselines, or otherwise provide actionable information. This is one of those areas where getting a reasonable approximation is known tech, and getting something substantially more useful than that remains impossible.
GPT-4 Real This Time
Fun with Image Generation
From Dalle-3, which does seem quite good:
Also, sigh, but I will refrain from all the obvious snarky responses.
Deepfaketown and Botpocalypse Soon
This is how it begins.
BBC: Tom Hanks warns dental plan ad image is AI fake.
Or perhaps like this?
I do always appreciate how any new scam starts out being a completely obvious scam, since the people who click anyway are your best customers. Gives the rest of us time.
It also begins with de-aging. As with the rest of our culture, our beloved franchises, characters and stars are getting old, and we are in no mood to move on. Why take a chance on a new young actor, when we can de-age Harrison Ford or Robert DeNiro? Harrison Ford might be 80, but watch him (not de-aged) in Shrinking and it is clear he not only still has it, he is getting better, making every moment count. If you have the baseline older version to start with, the younger-looking version is going to look very real. And while we’re cutting 80 year olds down to 50, who can object too strongly?
I am curious what happens when someone tries de-aging a 26-year-old movie star to look 19. That seems like it will hit different. And of course, when certain people aim for younger still and it isn’t in the context of something like Benjamin Button, that will hit even more different.
And again different when the performance is someone else entirely, which I expect to come relatively soon, and to come long before good entirely synthetic performances.
Tyler Cowen predicts AI-generated misinformation will not be a major problem in the 2024 campaign, because the major problem will instead be human-generated misinformation. The problem in his view is demand, not supply. There was already an infinite supply, the quality of the production does not seem to much matter. The question is whether people want the product.
This largely matches my model as well. I do think that AI has the potential to make things worse in some areas, such as faking voices. It also has the potential to help make things better, if people want that. It’s up to us.
Paul Graham predicts the rise of having LLMs write for you, presumably something content-free, so you can claim to have written something.
A large percentage of PhDs, perhaps a large majority, are no different from a child’s homework, in that they are work done in order to check off boxes but they provide zero value to anyone. If anything, they provide negative value, as no one would lose time accidentally reading old homework. I am not, shall we say, crying for the lost effectiveness of this class of signal.
Same with books. We do need to stop the nonsense on Amazon where people fake being a different real author, or try to mob us with fakes so much that search results give them some tiny number of sales. But if someone wants to ‘write’ a book using AI and then put it in stores, I mean, sure, fine, go ahead, make our day.
Then there are the false positives. What does it say about academia that AI detectors are known to not work, and yet they continue to use AI detectors?
Even if the teacher here thinks the student did use AI for the summary section, why should we care, given that the rest of the essay contains the actual content? The whole problem here stems from a summary being content-free. A human writing a summary is not going to be easily distinguishable from an AI writing that summary. To the extent it is here, that is an error by the human, who is being asked to do a robot’s job.
They Took Our Jobs
Garett Jones continues to have faith that standard economic principles will hold. Intent was the poll was dated 1500 AD.
This is a common point that brings clarity. We indeed did not need UBI for farmers, nor did people starve to death or lose control over the future. Why? Because humans remained the most powerful optimizers on the planet, the most intelligent entities, the best way to rearrange atoms to your preferred configurations.
Thus, if we no longer needed to do this to turn soil into crops, we could instead gainfully do other things. So we did, and it was good.
If AI outcompetes humans in some areas, automating some or even most current jobs, what happens? That depends on whether there remain other useful tasks where humans are not outcompeted. Those need not be tasks that people currently find worth doing, or are currently capable of doing, or that even exist as concepts yet. What matters is whether they will be available in the future.
Thus, short term optimism. There are boundless additional tasks that humans would like other humans to do, or that humans would want to do for themselves. The more of that is sped up or automated away, the more we can do other things instead. That holds right up until the AIs are sufficiently generally superior to us that there is indeed nothing left for the humans to do that has enough economic value. At that point, UBI will very much be needed.
One should also ask whether that level of AI capabilities also results in human loss of control over the future, or the extinction of the human race. That certainly seems like the default outcome if we fail to robustly solve alignment, and even if we do solve alignment as people typically understand that term, the competitive dynamics involved still seem to me like they point to the same ultimate outcome with (remarkably rapid) extra steps, as we put AIs in control lest we be outcompeted, and AIs that gather resources or seek reproduction or power relatively more instantiate more and more copies.
Similarly, here’s a video from Kite & Key Media telling people they won’t take our jobs by giving the standard history of technological improvements leading us to instead do other jobs. Except, once again, if the robots are also better than us at any new jobs we come up with, then this will not help us.
Get Involved
Open Philanthropy is hiring for its Global Catastrophic Risks team. Base salary for various jobs is ~$100k-$150k, short start date preferred, remote acceptable for some positions but not all.
These seem like high leverage positions, especially relative to the offered pay. I do encourage people to consider applying, if you believe you both have the chops to figure out what would be worthwhile, and can stand up in the face of social pressure sufficiently to keep the spending focus on what would actually be worthwhile.
Google launched (on September 11) the Digital Futures Project, a $20 million fund to provide grants to leading think tanks and academic institutions to think about ‘responsible AI.’ Looks entirely focused on mundane concerns filtered through credibility-providing institutions, so I do not expect much, but it’s a lot of funding so worthy of note.
Introducing
RT-X, a new robotics model from DeepMind containing data to train AI robots. They claim it is a substantial upgrade.
Anthropic offers a guide to evaluating AI systems, especially for what they term harmlessness, also threats to national security and some abilities. For readers here it covers familiar ground. This is written for people getting up to speed, from the perspective of helping them evaluating things against potential ‘harmful’ responses or at most national security threats. These models are thus mostly implicitly assumed to not be extinction-level threats, and where the act of training and testing the model is assumed to be safe. Within that framework, the info seems solid.
Meta Surveillance
Meta introduces glasses that include multimodal AI that can take calls, play music, livestream from your face, answer questions and more. Launches on October 17 starting at $299. If well executed this type of product will be huge. It would be quite the scoop if Meta got there first and everyone else had to scramble to copy it. Even if so, unless it is far better than I expect, I will wait for a version from a different corporation.
The glasses will give you the ability to livestream, hopefully without locking you into any given ecosystem for that. I trust that feature to work. What about the other promised features? That depends on execution details. We will have to wait and see.
This elevated MR comment seems right. AI-infused glasses offer a strong user experience for many purposes, they are clearly buildable with current technology, and in particular should be great for hands-free smartphone operation. So such glasses will inevitably be built, the question is when and by who, to a quality threshold worth using.
What the glasses are not offering, as per the top comment and as far as I can tell, is augmented reality. The glasses will talk to you. I have seen no evidence they will display anything. They can still be highly useful, but they do not seem to be the full product? Perhaps I am missing something.
A potential problem that also applies to the Apple Vision Pro: An AI wearable will record data about everything you ever see. One might ask is, who would get access to that data? What happens if I get sued, even frivolously? What if I anger the government? What if the tech company is hacked? Or if they simply want to sell advertising? And so on. I do think that for most people this is a trivial concern for overdetermined reasons, but there are those for whom it isn’t.
Or what if this flat out enables ubiquitous surveillance?
Or we can ask, even if the data stays where it was intended to stay, what will this do to children?
Or, alternatively and far scarier, they do have kids, and they’re thinking of making their kid wear one. If every parent realized we shouldn’t be monitoring kids like this, we would not have this problem.
Yeah, I am pretty hardcore committed, for the sake of everyone involved, to not doing that.
Meta also has a new chatbot that will integrate with its social networks and will work in partnership with Bing. It includes 28 celebrity-flavored chatbots. Their angle is entertainment, modeling on Character.AI, which has a ton of users who spend tons of time there despite never being involved in anything that I notice.
Now that it’s Meta doing it, of course, our good senators are suddenly very concerned.
I am not mad or surprised, only slightly disappointed.
OpenAI is working on its own hardware project. Sam Altman is having discussions with Jony Ive who designed the iPhone. Odd that it has taken this long.
In Other AI News
Did you know Llama-2 has a world model? Or at least a world map? As in, here, we literally found (part of) its world map.
Chroma partners with Google to offer the PaLM API. Not sure why?
I saw an actual ad while watching football for Dialpad.ai, some sort of AI-for-businesses-doing-business-things. Speech recognition, NLP, semantic search, generative AI, hits all the bases. Anyone know if this is any good? The ad itself seemed remarkably bad, as it neither gave a website to go to or other contact information, nor did it explain why one would want the product.
Replit now usable to do full-stack programming on your phone. Not all technological innovation enhances productivity. Yes, you can now do it all on your phone, but even if they implemented this maximally well, why would you want to? As with many other things, the temptation to do this could do a lot of damage.
Open Philanthropy Worldview Contest Awards Prizes
The Open Philanthropy Worldviews contest winners have been announced (EA forum version). As a non-winner, I can report that I got zero feedback on my entries, which is par for the course, nor did I give myself that much chance of winning. I still am happy I got motivated to write the entries. So, who won? Oh no.
My instinctive reaction is that this is really bad news.
Essentially all six essays here either argue explicitly for very low (<5%) levels of extinction risk, or simply that one should have lower risk than one otherwise would have due to particular considerations. They consistently use the outside view and things like reference classes and precedent.
The caveat is that this could mostly reflect a desire to see contrary positions, as those judging mostly believe in more risk. And while these do not seem at first glance like strong arguments that should change minds, I have only fully read one of them, the post on interest rates, for which I offered a rebuttal at the time. I also think one can make a case that the short-term impact on interest rates from imminent transformative AI is less obvious than the paper makes it out to be, and that being early is the same as being wrong, as another argument against their thesis.
At the recommendation of traders on Manifold, I checked out Quintin Pope’s winning post. I wrote up a response to it on LessWrong, where it will get the best comments discussion. Some of you should read it, but the majority of you should not.
If the goals of the competition had been more clearly stated to be the thing that this post is doing well, and it was clear we were looking for arguments and cruxes that favor lower risk, I would not have been upset to see this post win. It packs quite a lot of the right punch into a relatively short post, and I learned far more from it than the other first prize post, which was on interest rates.
Quintin Doubles Down on Twitter
Quintin also offers this recent long Twitter post (Eliezer offers a response, with good discussion under it as well between Jack and Oliver), in which among other later things, he doubles down on the idea that optimization refers to things following a particular optimization procedure, in contrast to a lookup table, or to sampling pairs of actions and results to see what gives higher success probabilities. That internalizing something only is meaningful it takes the form of parameters, and is not something to worry about if it takes the form of a giant lookup table.
You could of course use the words that way, but I do not think that definition is the useful one. To me, an optimizer is a thing that optimizes. If the way it does that is a giant lookup table, then ask about what that giant lookup table is optimizing and how effective it is at that, including in iterating the lookup table. There is no reason to think a giant lookup table can’t functionally be all the same dangerous things we were previously worried about. Unless I am missing something, a sufficiently complex lookup table can simulate any possible mind, if inefficiently.
Steven Byrnes responds once again, this time pointing out more explicitly that Quintin is presuming that transformative AI happens without any change in our training methods or architectures. I think the issue extends even broader than that.
Every time I see Quintin talk about these issues, he often raises good points, but also frequently states as fact that many things are now known to be mistakes, or not how any of this works, in ways that are at best debatable and often look to me to be flat out wrong. As far as I can tell, this is both his sincere view and his style of writing. Other statements smuggle in strong assumptions that could potentially hold, but seem unlikely to do so.
I’d also say there is a clear clash between the places where humans and AIs are claimed to be Not So Different and we can count on the parallels to hold, versus cases of We Are Not the Same where humans offer no evidence on how the AI case will go, without any logic I see as to what differentiates those cases.
I can sympathize. I imagine those with some viewpoints view me as doing the same thing in reverse. I do my best to put proper qualifiers all around, but doubtless Quintin thinks he does this as well, and I’m guessing he thinks I do something similar to what I think he does.
And to be clear, if this was the level of critique we were typically getting, that would be a massive improvement.
The Other Winners
I have not read the details of the other four winners, or written responses to them, as trader enthusiasm for this was a lot lower. The caveats on the winners makes it difficult to know how I might usefully respond in context, or whether the above response meaningfully interacted with why Quintin won:
As in, what is it useful to be arguing against here?
So far, I have not seen anyone else engage in any way with the winning entries either, beyond some quick comments on the winners announcement. A lost opportunity.
Am I sad I spent a weekend writing my four entries? Somewhat, in the sense that given what I know now, if I wanted to win I would have written very different entries. I was also disappointed that I mostly failed to generate good discussions, so I consider the posts to have been a bust. Still, writing them did help clarify some things, so it was far from a total loss.
Quiet Speculations
Arnold Kling says AI in 2023 is like the web in 1993. I say, perhaps 1992? Either way, very early days, with us barely scratching the surface of future use cases, even in the AI-winter style scenarios where the underlying tech stalls out.
Paul Graham continues his streak of ‘this man gets it’ statements, still no sign he’s been willing to state the synthesis outright.
Jon Stokes says that his vision of a realistic 100k+ fatality AI disaster looks like a subtle industrial disaster with mortality over a series of years, a statistical impact of something a model did not properly consider. Or an AI ‘spiritual advisor’ that leads some people to suicide. By such a telling, there are many such human-caused disasters happening continuously. It would not count as a proper disaster, neither raising alarm bells and triggering reactions, nor be much of a change from previous situations. By such a telling, there will also be many cases of 100k lives saved. As Jon also notes.
I notice I am confused how one can treat a more directly observable large fatality event (e.g. an AI-enabled terrorist attack or biological weapon or hack crippling vital services or what not) as an unrealistic future scenario, even if one ignores the loss of control or extinction risk or other larger threats and scenarios.
A strategic debate.
As a starting point, I endorse the synthesis position. I will of course sometimes insult various groups of people, but for almost all groups I will consider that a cost to be minimized rather than a benefit to seek out.
Richard Ngo has emphasized repeatedly that it might be good to not piss off the open source and hacker communities. Which I agree I would prefer to avoid, but given that if we want to live I strongly believe we will need to stop there from being (sufficiently powerful) open source AI, I don’t see how we hope to avoid this? There seems to be a very strong ‘I will fight to do the thing that you say endangers the world, on principle’ thing going on. How do you work with that?
On the AI ethics people, this is a question of tactics and strategy. I do think it is important for people to distinguish one from the other. That does not mean we need to be in opposition. We definitely do not need to insult.
We do have to understand that there definitely exist techniques that will stop the saying of naughty words, but that reliably do not stop us from all then winding up dead. And that if the ethics faction has its way, those are exactly the things that likely get implemented, and then everyone congratulates each other on how safe AI is. That would be very bad.
However, there are also other things they would endorse that can help solve the real problems, or at least buy time or otherwise help give room for those problems to be solved. Most promisingly, almost any actual progress towards a solution to the AI extinction risk problem is also progress towards a solution to AI ethics problems, and solving AI extinction should also solve AI ethics. The issue is that the reverse is not the case, and I see little evidence the ethics faction wants to engage in trade.
Andrej Karpathy suggests thinking of LLMs not as chatbots but more as the new operating systems for machines. A useful metaphor. What are the implications for safety?
How skeptical are people about the future impact of AI? Rather skeptical.
I created a Manifold market for this here. I do think Ozempic and related drugs will be a huge deal, so this isn’t the lowest of bars. Very early trading went both ways.
Open Source AI is Unsafe and Nothing Can Fix This
Jon’s comment came up in response to Paul Rottger expressing how shocked, shocked he was to discover how unsafe the new MinstralAI model is, with Jon asking someone to ELI5 how a model can be ‘dangerous’ when the ‘dangerous’ information is around for anyone to Google.
The comments make it clear that a number of people actively want the maximally unsafe system. If you give those people the currently-safe system, they will then rapidly fine tune it into an unsafe system.
Reminder of what one might call the Law of Imminent Unsafety of Open Source AI: If you release an open source LLM, you are releasing the aligned-only-to-the-user version of that LLM within two days.
Giving us the unsafe model also gives us a better baseline on which to experiment with ways to study the model or to make it safe, rather than making us work with a model that is made partly safe in one particular way and then made fully unsafe again.
So I applaud MinstralAI, given it had already decided to do the worst possible thing, for not compounding that error by pretending that their model is safe, and instead giving us the unsafe model to play around with. If you want to make it safe, it is not so powerful that this is an impossible task. If you want to use it in an unsafe way, nothing was going to stop that anyway. Credit where credit is due. Minus ten million for making open source frontier models, but ten out of ten for style.
Does open source also have upsides? Yes. Of course it does. More capabilities advance capabilities. This includes the capability to do better alignment work, as well as the promise of more mundane utility. Where it is a benefit, you get the benefit of diffuse access. There are two sides to the coin.
The fact that those advocating for the positive side of the coin insist there is no negative side to the coin at all, let alone considering that the negative side could be quite a bit more important, does not mean we get to do the same back at them. It does still seem worth pointing out this absurd stance.
Still, yes, I am with Eliezer and I affirm.
This is in response to Andrew Critch’s statement about recent interpretability work that was done using Llama-2.
I agree that, as a practical matter, public interpretability work in particular has been accelerated by Llama-2, and in general is accelerated by open sourcing of model weights. I asked in response about how much of this actually required Llama-2, versus either getting a lab’s cooperation (ideally you could get Anthropic, which does a ton of interpretability work internally and claims to be committed to helping with such work in general) or using weaker models. Can’t we run tests that interact with the model weights without exposing the model weights? As always, next year we will have a stronger model that our results will need to apply to, so what makes us need exactly the current model to do good work?
The replies are full of ‘who exactly is saying that open source doesn’t have any benefits, no one is saying that?’ And quite so, I have never seen anyone, that I can recall, claim that open source lacks benefits. Nor did I think it needed to be said, obviously there are costs and there are benefits. Whereas those supporting open source frequently claim there are only benefits, including citing the costs as benefits, or claim that the costs are so trivial as to not be worth discussing. And often they do this from first principles or reference classes, dismissing requests to actually look at the new situation as out of line and unreasonable.
But hey. Life is not fair.
The Quest for Sane Regulations
FT reports UK is negotiating for more knowledge of the interral workings of frontier LLMs (direct link). Joke is on them, no one knows the internal workings of frontier LLMs. Although there are some details that could be shared, and that the UK would very much like to know. This includes asking for the model weights, although it is not clear what the UK would do with them that would be worth the security risk, beyond what you could get with an API that allowed fine tuning and other interactions?
Elizabeth Seger of GovAI, along with many coauthors, releases a new report on the risks and benefits of open source AI models. It seems to intentionally use uncertain language throughout, the style of ‘giving everyone access to nuclear weapons might be dangerous with costs exceeding benefits.’
The entire discussion seems focused on misuse. According to my survey and a spot-check by Claude, the paper nowhere discusses existential risks, the potential for recursive self-improvement or potential loss of human control over AI systems or the future. The paper points out that even in a world where we need only fear human misuse, we will soon be taking large risks if we open source strong AI systems, even without considering the most important risks involved. I do not expect that case to be persuasive to most open source advocates, and for them to fall back on their usual arguments if forced to respond.
Justin Bullock offers extended thoughts, and is very positive on the paper and what he sees as a balanced approach. He wants more exploration of the extent to which the dangers are from open source rather than from the model existing at all, and asks how we might enforce restrictions against open source if we created them.
My answer to the first one is that open source (as in you give others the weights to work with, regardless of legal structures you impose) means at minimum that any safety or alignment you build into your system is meaningless within two days. The only meaningful safety is to never make the model capable in the first place. This includes misuse, and this also includes making it into an agent, loss of human control and other such things, and all of it will happen intentionally even if it wouldn’t have happened anyway.
It also includes maximal competitive pressures to modify the system to whatever is most competitive in various senses. You destroy quite a lot of affordances that could potentially be used to keep a lid on various affordances and outcomes and dynamics. It also means that you cannot take the system down, if later it turns out to have capabilities that were not anticipated, or its capabilities have consequences that were not anticipated, or if the creators chose not to care and we collectively decide that was a mistake.
The case seems robust here even without extinction-level risks and focusing solely on misuse and competitive dynamics, but one should indeed also consider the bigger risk.
My answer to the second one is that this will be neither cheap nor easy, but so far we have not seen a leak of any of the important foundation models despite a lack of proper military-grade defensive protocols. Or, if we have had a leak, it has then remained private and quiet. Anthropic is stepping up its security game, and it seems like a highly reasonable request to force others to do likewise.
That is not perfect security. Nothing is ever perfect security. But in practice (as far as I know) we have many other examples of code that has stayed protected that would be highly valuable. No one has the Windows source code, or the code of many other foundational software programs. Security at Microsoft and Google and friends has held up so far.
A proposal for evaluation-based coordinated development pauses (paper). The concept is that various labs commit, either voluntarily under public pressure, through joint agreement or through government action, to pause if anyone’s AI fails an evaluation, then only resume once new safety thresholds have been met. They mention worries about anti-trust law, it is crazy that this is still a concern but I am confident that if the companies involved asked the government that the agreement could be announced and sanctified on the White House lawn.
Scott Alexander writes up a not-very-debate-like AI pause debate, more like different people posting manifestos. I think I’ve covered all this already.
The Week in Audio
Podcast from The Metaverse. If you do listen be sure to watch the video part too. Yes, the avatars now look almost photorealistic. I still do not understand why this is a good idea.
Sam Altman (two weeks ago) with Salesforce founder Marc Beinoff and Clara Shih. At 7:45, Sam notes that there is a strong link between the AI hallucinating and the AI being creative enough to be worthwhile in other ways. It’s not so easy to get one without the other. At 10:30 he notes the conflation between alignment and capabilities, mentioning both RLHF and interpretability. At 16:00 he notes we don’t understand why capabilities emerge at given points, and do not know how to predict when which future capabilities will emerge, including the ability to reason. Later he emphasizes that intelligence will get into everything the same way mobile has gotten into everything. Continues to be thoughtful and interesting throughout – Altman seems excellent except when forced to reckon with the endgame.
80,000 hours offers Kevin Esvelt on cults that want to kill everyone and intentionally caused pandemics, and how he felt inventing gene drives. This link goes to the section explicitly on AI, where he worries that AI expands the number of people capable of creating such horrors. Mostly the podcast is about other tail risks.
Rhetorical Innovation
Yann LeCun makes an excellent point, and when he’s right he’s right.
Yes. Exactly. His response is to then argue we should build it, and use our agency to increase the risks. The point is still well taken.
AI Safety Weekly is a compilation of potential rhetorical innovations arguing for not developing AIs that might kill everyone, in their original graphical forms, if that is something relevant to your interests. Here is issue #4, the most recent.
Sarah (@LittIeramblings) thread describes her early interactions with AI safety. Bold is mine, felt difficult to cut down while retaining the core impact of what she’s saying.
On the margin, I believe we can all impact the probabilities, if only slightly, and that this is a worthwhile pursuit.
I would love it if she was right that everyone indeed did have their own model of this, and that those models disagreed but were well-considered.
What is someone in her position to do, though? If one lacks the knowledge and skill, and she says intelligence (although I doubt that would stop her for that long if she put her mind to it, given her authorship of the thread and choice of job and of where and how to pay attention)? If ‘think for yourself, shmuck,’ always my first suggestion, does not look like an option?
It is a tough spot. All options are bad. My heart goes out. There is no authority, no source you should trust if you do not trust your evaluations of arguments or your model of how all of this works. It could be highly sensible to be radically uncertain about exactly how long we have or how difficult the problems are that we must solve and how likely we are to solve them.
What I do not think is in doubt is that there is substantial risk in the room. There is clearly a good chance we are about to create new entities capable of outcompeting us, that are smarter than us, more powerful optimizers of arrangements of atoms. Various humans and groups of humans will create them, sculpt them, or seek access to or control over them, largely with an eye towards using that power for their own ends.
To those who think that does not present a non-trivial level of extinction risk to humanity, even under the kindest of circumstances, I honestly continue to have no idea how you can reasonably think that. None. This is obviously a highly dangerous situation. It is obviously one that we might fail to properly handle. And again, that is true even if the (sane) technical optimists are right.
A good periodic reminder and clarification.
Will Tyler Cowen now say that e/acc people have bad takes and think they have won when they have actually totally lost, or unironically link?
A commenter shares their attempt to persuade us that AI extinction risk is substantial. Alas, I do not think the result is persuasive.
Eliezer Yudkowsky clarifies a recent misunderstanding about the Orthogonality Thesis (here is his full best explanation of the thesis, from Arbital).
Aligning a Smarter Than Human Intelligence is Difficult
Is your LLM lying? Find out with this one weird trick.
Don’t get me wrong. This is super cool and interesting.
The problem is that we will now use it, and potentially rely upon it.
Whatever you think of the first two suggestions, you should definitely throw that third guy out the window.
This is true even if you have, for the moment, an actually 99.9% accurate lie detector.
Reasons to throw this guy out the window include:
Again, super cool, great work, opens up new possibilities and all that.
Eliezer Yudkowsky points out that there are multiple ways of predicting the output of a mental process. If you want to model what a good person would say or do, you can simulate that person, or you might do something different, likely something simpler. Similarly, if the training data for LLMs contains the corpus of Erowid [drug] trip reports, then the LLM will learn to predict how those will go.
How firewalled away will those simulations be? Unclear, but you are unlikely to get to have it both ways.
Don’t want it to lie? Or want to make sure it lies? We can help you with that too. Or at least there is a proposal to do so, using representation engineering, that claims to be making good progress.
I haven’t had time to dive properly into the technical details on this one, but am keeping it in my AI Links tab for now in case I can do so later. He points out that this approach relies on us understanding the concepts involved, which is the type of thing that might stop working on things smarter than you, and notes that the technique relies on access (in some form) to model weights.
Jeffrey Ladish offers thoughts on the paper.
People Are Worried About AI Killing Everyone
Roon here to remind us that whatever you think of OpenAI’s opinion of itself, this is OpenAI’s opinion of itself.
Other People Are Not As Worried About AI Killing Everyone
Eliezer can’t be right about what is going on with arguments like this, can he? Then again, if something like that isn’t in play on some level, what could be going on?
It’s so weird that ‘kill a million people’ is even a discussion point here, surely we can all agree that you can at least cause the launch of quite a lot of America and Russia’s nuclear weapons. That’s not the path, it won’t kill anything close to everyone, but failure to notice this is a failure to take the question at all seriously.
So let’s ask the question as Nielsen intended it. Could a motivated teenager plus an omniscient oracle destroy all of humanity within six months, assuming no one else had anything similar?
I mean, yes, obviously. For those asking about the first step, it is spending day one in a series of steps to acquire quite a lot of financial resources, and make various purchases, and reach out to various allies, and so on. As Nielsen notes this is then a question about physics, resources and willing helpers are not an issue when you have an omniscient machine and months of time.
There are then any number of physically possible things you can then do, even if you rule out ‘build diamond bacterium or other nanotech’ and also rule out ‘ask it how to code an AI that can take the remaining actions for you.’
It also does not much matter. Asking ‘would a handful of people survive an engineered virus for the first six months?’ is missing the point. It also isn’t the question asked, which requires six months worth of work to cause extinction, not six months of calendar time before extinction happens.
What matters is, could this teenager cause the rest of humanity to lose control over the future within six months, such that its extinction was inevitable? That is the better question. It answers itself.
The more interesting question is, as Carmack asks, what happens if everyone else also has such an oracle? Wouldn’t the good guys with an oracle stop the bad ones? He does not explain how, but asking ‘is there anyone out there who intends to misuse an oracle in a way that would harm others, and if so how do we stop them?’ is certainly one approach. How does this play out?
It could go any number of ways. The details matter a lot. Things get super weird, the question is in which particular ways does it get weird. Full universal omniscience is relatively less weird, but also not how physics works. The default path presumably starts with some people asking their oracle how to seek power and compete optimally against others, and perhaps ultimately shut down the other oracles. And if permitted, asking the oracles how to build more powerful oracles, capable of doing more things directly without humans in the loop. And so on.
Best case I can make concrete is, you get some sort of singleton panopticon, or set of panopticons, that keep everyone in line and prevents anyone from using the oracle to foom or cause doom, given what many would do otherwise. What is the superior alternative?
The Lighter Side
If the random time travelers keep trying to assassinate you, remember that (presumably) they come from the future, so know more than you do. So why do they keep doing that? Is there a reason you needed to be included in Hitler’s Time Travel Exemption Act?
Facebook Messenger offers AI generated stickers, and we can confirm that this was indeed not thought through. Images at link.
The worst prediction I have seen all week.
Although, warning:
Dalle-3 progress report.
The quest to take down GPT-4 continues.
Bonus mundane utility, help when your family is dying.
Announcing our latest project.
Perfectly safe. All you need is a control chip. The control chip ensures you control the arms, the arms don’t control you. Nothing to worry about.
Questions that are often but too rarely asked.
Don’t worry, it’s not so smart yet. I knew right away. You’d think it would know.