Roon also lays down the beats
This isn't a link so I can't verify if the source was mentioned, but this isn't his lyrics. It's a third verse from ERB video from 2012
We would still have to address that worry mentioned earlier about “formally specifying complex requirements such as “don’t drive humanity extinct.”” I have not a clue. Anyone have ideas?
Sure. To start with an easier example:
Where I get confused is, what would it mean to prove that a given set of code will do even the straightforward tasks like proving cybersecurity. How does one prove that you cannot gain access without proper credentials? Doesn’t this fact rely upon physical properties, lest there be a bug or physical manipulation one can make?
No idea if the below is what Tegmark and Omohundro meant, but here's how I think it can theoretically work:
I actually do think this is more doable than it sounds, but, uh... It's definitely not "an unsolved yet easier" challenge compared to robust AGI alignment. Nay, it is in fact an AGI-alignment-complete problem.
Consider frameworks like the Bayesian probability theory or various decision theories, which (strive to) establish the formally correct algorithms for how systems embedded in a universe larger than themselves must act, even under various uncertainties. How to update on observations, what decisions to make given what information, etc. They still take on "first-person" perspective, they assume that you're operating on models of reality rather than the reality directly — but they strive to be formally correct given this setup.
Admittedly, this does have 1 big problem, and I'll list them down below:
So the question is, why do you believe formalization is tractable at all for the AI safety problem?
So the question is, why do you believe formalization is tractable at all for the AI safety problem?
It's more that I don't positively believe it's not tractable. Some of my reasoning is outlined here, some of it is based on inferences and models that I'm going to distill and post publicly aaaany day now, and mostly it's an inside-view feel for what problems remain and how hopeless-to-solve they feel.
Which is to say, I can absolutely see how a better AGI paradigm may be locked behind theoretical challenges on the difficulty level of "prove that ", and I certainly wouldn't bet the civilization on solving them in the next five or seven years. But I think it's worth keeping an eye out for whether e. g. some advanced interpretability tool we invent turns out to have a dual use as a foundation of such a paradigm or puts it within reach.
They assume either infinite computation, or in the regime of bounded Bayesian reasoning/rationality, they assume the ability to solve very difficult problems
Yeah, this is why I added "approximation of" to every "formal" in the summary in my original comment. I have some thoughts on looping in computational complexity into agency theory, but that may not even be necessary.
People turn these things into agents easily already, and they already contain goal-driven subagent processes.
Sorry, what is this referring to exactly?
TLDR:LLMs can simulate agents and so, in some sense, contain those goal driven agents.
An LLM learns to simulate agents because this improves prediction scores. An agent is invoked by supplying a context that indicates text would be written by an agent (EG:specify text is written by some historical figure)
Contrast with pure scaffolding type agent conversions using a Q&A finetuned model. For these, you supply questions (Generate a plan to accomplish X) and then execute the resulting steps. This implicitly uses the Q&A fine tuned "agent" that can have values which conflict with ("I'm sorry I can't do that") or augment the given goal. Here's an AutoGPT taking initiative to try and report people it found doing questionable stuff rather than just doing the original task of finding their posts.(LW source).
The base model can also be used to simulate a goal driven agent directly by supplying appropriate context so the LLM fills in its best guess for what that agent would say (or rather what internet text with that context would have that agent say). The outputs of this process can of course be fed to external systems to execute actions as with the usual scafolded agents. The values of such agents are not uniform. You can ask for simulated Hitler who will have different values than simulated Gandhi.
Not sure if that's exactly what Zvi meant.
Worth noting that the scam attempt failed. We keep hearing ‘I almost fell for it’ and keep not hearing from anyone who actually lost money.
Here's a story where someone lost quite a lot of money through an AI-powered scam:
https://www.reuters.com/technology/deepfake-scam-china-fans-worries-over-ai-driven-fraud-2023-05-22/
We want to avoid ubiquitous surveillance, or minimize its impact. If there exists a sufficiently dangerous technology, that leaves you two choices.
- You can do what surveillance and enforcement is necessary to limit access.
- You can do what surveillance and enforcement is necessary to contain usage.
Which of these will violate freedom less? My strong prediction for AGI is the first one.
While I certainly agree that option 1 seems much better, I don't see how we can maintain option 1 for long. A few years perhaps, but the knowledge that powerful AI is possible and obtainable with some set of algorithms which are improvements over published algorithms and consumer-grade technology... that seems sufficient for smart selfish actors to rationalize themselves into justifying secretly trying to reverse engineer their way to these better algorithms. And that scenario requires option 2 for things to not go wrong if they succeed. This would be an easier problem if we were hardware constrained, but I don't think we are, and I think we will be even less hardware constrained in the future as effective compute prices continue to fall. There are a LOT of small privately-owned datacenters out there. If a basement-bitcoin-miner's setup is enough to be a dangerous supply of compute, then option 2 seems like the only stable one.
Edit: I think Zvi puts the problem well in this section:
The third problem is the competitive and evolutionary, the dynamics and equilibrium of a world with many ASIs (artificial superintelligences) in it.
This is a world almost no one is making any serious attempt to think about or model, and those who have (such as fiction writers) almost always end up using hand waves or absurdities and presenting worlds highly out of equilibrium.
We will be creating something smarter and more capable and better at optimization than ourselves, that many people will have strong incentives both economic and ideological to make into various agents with various goals including reproduction and resource acquisition. Why should we expect to long be in charge, or even to survive?
So what then leads to the difference in our views in the first quote? I suppose a question about whether the hardware sufficient for dangerous ASI will be broadly distributed? Please correct me if I'm confused here.
My view is that 'scale is sufficient, but not necessary. Either larger scale, OR secret sauce algorithmic improvements will lead to sufficiently capable AI that it can initiate recursive self-improvement which will be capable of scaling to ASI if run unchecked.'
Humanity has a track record of smart groups of engineers noticing that some other group has achieved a specific technology, and then figuring out how to replicate that. We really shouldn't count on that NOT happening in this case, when so much is on the line. Simply knowing that some other group solved it, and what tools they used, and the background knowledge that they went into their research project with... that's enough to solve it independently a few years later.
Alignment is solved, systems do what their owners tell the system to do.
Presumably you're assuming we haven't figured out how to implement a CEV sovereign (i.e. an autonomous agent pursuing CEV)? Because in that case, I don't get why the CEV-aligned ASI wouldn't negotiate with the other ASI to get humanity a galaxy/galaxies and keep us safe whilst we decide our future.
This all does seem like work better done than not done, who knows, usefulness could ensue in various ways and downsides seem relatively small.
I disagree about item #1, automating formal verification. From the paper:
9.1 Automate formal verification:
As described above, formal verification and automatic theorem proving more generally needs to be fully automated. The awe-inspiring potential of LLMs and other modern AI tools to help with this should be fully realized.
Training LLMs to do formal verification seems dangerous. In fact, I think I would extend that to any method of automating formal verification that would be competitive with human experts. Even if it didn't use ML at all, the publication of a superhuman theorem-proving AI, or even just the public knowledge that such a thing existed, seems likely to lead to the development of more general AIs with similar capabilities within a few years. Without a realistic plan for how to use such a system to solve the hard parts of AI alignment, I predict that it would just shorten the timeline to unaligned superintelligence, by enabling systems that are better at sustaining long chains of reasoning, which is one of the major advantages humans still have over AIs. I worry that vague talk of using formal verification for AI safety is in effect safety-washing a dangerous capabilities research program.
All that said, a superhuman formal-theorem-proving assistant would be a super-cool toy, so if anyone has a more detailed argument for why it would actually be a net win for safety in expectation, I'd be interested to hear it.
Formally proving that some X you could realistically build has property Y is way harder than building an X with property Y. I know of no exceptions (formal proof only applies to programs and other mathematical objects). Do you disagree?
I don't understand why you expect the existence of a "formal math bot" to lead to anything particularly dangerous, other than by being another advance in AI capabilities which goes along other advances (which is fair I guess).
Human-long chains of reasoning (as used for taking action in the real world) neither require nor imply the ability to write formal proofs. Formal proofs are about math and making use of math in the real world requires modelling, which is crucial, hard and usually very informal. You make assumptions that are obviously wrong, derive something from these assumptions, and make an educated guess that the conclusions still won't be too far from the truth in the ways you care about. In the real world, this only works when your chain of reasoning is fairly short (human-length), just as arbitrarily complex and long-term planning doesn't work, while math uses very long chains of reasoning. The only practically relevant application so-far seems cryptography because computers are extremely reliable and thus modeling is comparatively easy. However, plausibly it's still easier to break some encryption scheme than to formally prove that your practically relevant algorithm could break it.
LLMs that can do formal proof would greatly improve cybersecurity across the board (good for delaying some scenarios of AI takeover!). I don't think they would advance AI capabilities beyond the technological advances used to build them and increasing AI hype. However, I also don't expect to see useful formal proofs about useful LLMs in my lifetime (you could call this "formal interpretability"? We would first get "informal interpretability" that says useful things about useful models.) Maybe some other AI approach will be more interpretable.
Fundamentally, the objection stands that you can't prove anything about the real world without modeling, and modeling always yields a leaky abstraction. So we would have to figure out "assumptions that allow to prove that AI won't kill us all while being only slightly false and in the right ways". This doesn't really solve the "you only get one try problem". Maybe it could help a bit anyway?
I expect a first step might be an AI test lab with many layers of improving cybersecurity, ending at formally verified, air-gapped, no interaction to humans. However, it doesn't look like people are currently worried enough to bother building something like this. I also don't see such an "AI lab leak" as the main path towards AI takeover. Rather, I expect we will deploy the systems ourselves and on purpose, finding us at the mercy of competing intelligences that operate at faster timescales than us, and losing control.
followed by strategies humans haven’t even considered
followed by strategies humans wouldn't even understand because they do not translate well to human language. i.e. they can be translated directly but noone will understand why that works.
We are, as Tyler Cowen has noted, in a bit of a lull. Those of us ahead of the curve have gotten used to GPT-4 and Claude-2 and MidJourney. Functionality and integration are expanding, but on a relatively slow pace. Most people remain blissfully unaware, allowing me to try out new explanations on them tabula rosa, and many others say it was all hype. Which they will keep saying, until something forces them not to, most likely Gemini, although it is worth noting the skepticism I am seeing regarding Gemini in 2023 (only 25% for Google to have the best model by end of year) or even in 2024 (only 41% to happen even by end of next year.)
I see this as part of a pattern of continuing good news. While we have a long way to go and very much face impossible problems, the discourse and Overton windows and awareness and understanding of the real problems have continuously improved in the past half year. Alignment interest and funding is growing rapidly, in and out of the major labs. Mundane utility has also steadily improved, with benefits dwarfing costs, and the mundane harms so far proving much lighter than almost anyone expected from the techs available. Capabilities are advancing at a rapid and alarming pace, but less rapidly and less alarmingly than I expected.
This week’s highlights include an update on the UK taskforce and an interview with Suleyman of Inflection AI.
We’re on a roll. Let’s keep it up.
Even if this week’s mundane utility is of, shall we say, questionable utility.
Table of Contents
Language Models Offer Mundane Utility
Do automatic chat moderation for Call of Duty. Given that the practical alternatives are that many games have zero chat and the others have chat filled with the most vile assembly of scum and villainy, I am less on the side of ‘new dystopian hellscape’ as much as ‘what exactly is the better alternative here.’
Monitor your employees and customers.
It’s not the tool, it is how you use it. Already some companies such as JPMorgan Chase use highly toxic dystopian monitoring tools, which lets them take to the next level. It seems highly useful to keep track of how long customers have been in the store, or whether they are repeat customers and how long they wait for orders. Tracking productivity in broad terms like orders filled is a case where too much precision and attention has big problems but so does not having enough. Much better an objective answer with no work than a biased error-prone answer with lots of work.
Monitor your citizens on social media (below is the entire post).
This seems like exactly what such folks were doing before? The problem here isn’t AI.
Win a physical sport against humans for the first time, I am sure it is nothing, the sport is (checks notes) drone racing.
Get help with aspects of writing. As Parell notes, directly asking ChatGPT to help you write is useless, but it can be great as a personal librarian and thing explainer. He recommends the term ‘say more,’ asking for restatements in the styles of various authors and for summaries, talking back and forth and always being as specific as possible, and having the program check for typos.
Ethan Mollick proposes developing what he calls Grimoires, which my brain wants to autocorrect to spellbooks (a term he uses as well), prompts designed to optimize the interaction, including giving the AI a role, goal, step-by-step instructions, probably a request for examples and to have the AI gather necessary context from the user.
Play the game of Hoodwinked, similar to Mafia or Among Us. More capable models, as one would expect, outperform less capable ones, and frequently lie and deceive as per the way the game is played. The proper strategy in such games for humans is usually, if you can, some variant of saying whatever you would have said if you were innocent, which presumably is pretty easy to get an LLM to do. Note that as the LLM gets smarter, other strategies become superior, followed by other strategies that humans couldn’t pull off, followed by strategies humans haven’t even considered.
Language Models Don’t Offer Mundane Utility
Beware of AI-generated garbage articles, many say, although I still have yet to actually encounter one. Ryan is correct here, Rohit also, although neither solves the issue.
Here is one of several other claims I saw this week that Google search is getting rapidly polluted by LLM-generated garbage.
Beware even looking at AI when Steam (Valve) is involved, they remove a game permanently for once allowing a mod that lets characters use GPT-generated dialogue, even though the mod was then removed. While I do admire when one does fully commit to the bit, this is very obviously taking things way too far, and I hope Valve realizes this and reverses their decision.
Judge Roy Ferguson asks Claude who he is, gets into numerous cycles of fabricated information and Claude apologizing and admitting it fabricated information. Definitely a problem. Ferguson treats this as ‘intentional’ on the part of Claude, which I believe is a misunderstanding of how LLMs work.
Deepfaketown and Botpocalypse Soon
So far, Donald Trump has had the best uses of deepfakes in politics. Do they inevitably favor someone of his talents? Another angle to consider is, who is more vulnerable to such tactics than trump supporters?
I checked r/scams on a whim about 40 posts deep. Almost all were old school, one of them was a report of the classic ‘your child has been arrested and you must send bail money’ scam. The replies refused to believe it was an actual deepfake, saying it was a normal fake voice. It seems even the scam experts don’t realize how easy it is to do a deepfake now, or alternatively they are so used to everything being a scam (because if you have to ask, I have some news) that they assume deepfakes must be a scam too?
Worth noting that the scam attempt failed. We keep hearing ‘I almost fell for it’ and keep not hearing from anyone who actually lost money.
The self-explanatory and for now deeply disappointing ‘smashOrPass.ai’ offers the latest in user motivation for fine tuning, whatever one might think of the ethics involved. As of this writing it is only a small set of images on a loop, so claims that it ‘learns what you like’ seem rather silly, but that is easily fixed. What is less easily fixed is all the confounders, which this should illustrate nicely. This is a perfect illustration of how much data is lost when you compress to a 0-1 scale, also either people adjust for context or don’t and either approach will be highly confusing, you really need 0-10 here. And yes, in case anyone was wondering, of course the porn version is coming soon, if not from him than someone else. More interestingly, how long until the version where it takes this information and uses it to auto-swipe on dating app profiles for you?
Also, can I give everyone who hates this, or anything else on the internet, the obvious advice? For example, Vice’s Janus Rose? Do not highlight things no one would otherwise have heard of, in order to complain about them. This thing came to my attention entirely due to negative publicity.
Recreation of James Dean to star in new movie. No doubt more of this is coming. What is weird is that people are talking about cloning great actors, including great voice actors, whether dead or otherwise, and using this to drive living actors out of work.
The problem with using AI is that AI is not a good actor.
An AI voice actor can copy the voice of Mel Brooks, but the voice has little to do with what makes Mel Brooks great. What I presume you would actually do, at least for a while, is to have some great voice actor record the new lines. Then use AI to transform the new lines to infuse it with the vocal stylings of Mel Brooks.
If we have Tom Hanks or Susan Sarandon (both quoted in OP) doing work after they die, then we are choosing to recreate their image and voice, without the ability to copy their actual talents or skills. To the extent that we get a ‘good performance’ out of them, we could have gotten that performance using anyone with enough recorded data as a baseline, such as Your Mom, or a gorgeous fashion model who absolutely cannot act. It makes sense to use this for sequels when someone dies, and continuity takes priority, but presumably the actors of the future will be those with The Look? Thus James Dean makes sense, or Marylin Monroe. Or someone whose presence is importantly symbolic.
They Took Our Jobs
AI detection tools do not work. We know this. Now we have data on one way they do not work, which is by flagging the work of non-native English speakers at Stanford.
The problem is that the test does not work. This is an illustration that the test does not work. That it happens to hit non-native speakers illustrates how pathetic are our current attempts at detection.
The AIs we were training on this were misaligned. They noticed that word complexity was a statistically effective proxy in their training data, so they maximized their score as best they could. Could one generate a bespoke training set without this correlation and then try again? Perhaps one could, but I would expect many cycles of this will be necessary before we get something that one can use.
If anything, this discrimination makes the AI detector more useful rather than less useful. By concentrating its errors in a particular place and with a testable explanation, you can exclude many of its errors. It can’t discriminate against non-native speakers if you never use it on their work.
It also shows an easy way AI work can be disguised, using complexity of word choice.
Get Involved
The Center for AI Policy is a new organization developing and advocating for policy to mitigate catastrophic risks from advanced AI. They’re hiring an AI Policy Analyst and a Communications Director. They recently proposed the Responsible AI Act, which needs refinement in spots but was very good to propose, as it is a concrete proposal moving things in productive directions. Learn more and apply here.
Rethink Priorities doing incubation for AI safety efforts, including field building in universities for AI policy careers. Do be skeptical of the plan of an incubation center for a project to help incubate people for future projects. I get how the math in theory works, still most people doing something must ultimately be doing the thing directly or no thing will get done.
Introducing
Time introduced the Time 100 for AI. I’d look into it but for now I see our time is up.
Claude Pro from Anthropic, pay $20/month for higher bandwidth and priority. I have never run up against the usage limit for Claude, despite finding it highly useful – my conversations tend to be relatively short, the only thing I do that would be expensive is attaching huge PDFs, which they say shrinks the limits but I’ve yet to run into any problems. It is a good note that, when using such attachments, it is efficient to ask for a small number of extensive answers rather than a large number of small ones.
Falcon 180B, HuggingFace says it is a ever so slightly better model than Llama-2, which makes it worse relative to its scale and cost. They say it is ‘somewhere between GPT-3.5 and GPT-4’ on the evaluation benchmarks, I continue to presume that in practical usage it will remain below 3.5.
OpenAI will host developer conference November 6 in San Francisco.
Automorphic (a YC ‘23 company) offers train-as-you-go fine tuning, including continuous RLHF, using as little has a handful of examples, offers fine tuning of your first three models free. Definitely something that should exist, no idea if they have delivered the goods, anyone try it out?
UK Taskforce Update
Update on the UK Foundation Model Taskforce from Ian Hogarth (direct). Advisory board looks top notch, including Bengio and Christiano plus Sommeren for national security expertise. They are partnering with ARC Evals, the Center for AI Safety and others. The summit is fast approaching in November, so everything is moving quickly. They are expanding rapidly, and very much still hiring.
In Other AI News
New paper from Peter S. Park, Simon Goldstein, Aidan O’Gara, Michael Chen, Dan Hendrycks: AI Deception: A Survey of Examples, Risks, and Potential Solutions.
Here is the abstract:
They do a good job of pointing out that whatever your central case of what counts as deception, there is a good chance we already have a good example of AIs doing that. LLMs are often involved. There is no reason to think deception does not come naturally to optimizing AI systems when it would be a useful thing to do. Sometimes it is intentional or predicted, other times it was unintended and happened anyway, including sometimes with the AI’s explicit intent or plan to do so.
New paper from Owain Evans tests potential situational awareness of LLMs (paper).
I am not as surprised as Owain was, this all makes sense to me. I still find it interesting and I’m glad it was tried. I am not sure what updates to make in response.
Twitter privacy policy now warns it can use your data to train AI models. Which they were going to do anyway, if anything Elon Musk is focusing on not letting anyone else do this.
Chip restrictions expand to parts of the Middle East. How do you keep chips out of China without keeping them out of places that would allow China to buy the chips?
Good Time article from Walter Isaacson chronicling the tragedy and cautionary tale of Elon Musk, who Demis Hassabis warned about the dangers of AI, and who then completely misunderstood what would be helpful and as a result made things infinitely worse. He continues to do what feels right to him, and continues to not understand what would make it more versus less likely we all don’t die. It is not without a lot of risk, but we should continue trying to be helpful in building up his map, and try to get him to talk to Eliezer Yudkowsky or other experts in private curious mode if we can. Needless to say, Elon, call any time, my door is always open.
Brief Twitter-post 101 explainer of fine tuning.
Quiet Speculations
Kate Hall, who has practiced copyright law, predicts that MidJourney, GPT and any other models trained on copyrighted material will be found to have violated copyright.
This is a highly technical opinion, and relies on courts applying a typical set of heuristics to a highly unusual situation, so it seems far from certain. Also details potentially matter in weird ways.
Presumably we can all agree that this rule does not make a whole lot of sense. Things could also take a while. Or it might not.
Cate Hall’s position is in sharp contrast to OpenAI’s.
The authors suing in the current copyright lawsuit do seem to be a bit overreaching?
Is generative AI in violation of copyright? Perhaps it is. Is it a ‘grift’ that merely repackages existing work, as the authors claim? No.
The US Copyright Office has opened a comment period. Their emphasis is on outputs
Arnold Kling sees the big seven tech stocks as highly overvalued. My portfolio disagrees on many of them. One mistake is these are global companies, so you should compare to world GDP of 96 trillion, not US GDP of 26 trillion, which makes an overall P/E of 50 seem highly reasonable, given how much of the economy is going to shift into AI.
Tyler Cowen says we are in an ‘AI lull’ with use leveling off and obvious advances stalled for a time, but transformational change is coming. I agree. He is excited by, and it seems not worried about, what he sees as unexpectedly rapid advancements in open source models. I am skeptical that they are doing so well, they systematically underperform their benchmark scores. In practice and as far as I can tell GPT-3.5 is still superior to every open source option.
Flo Crivello shortens their timelines.
I don’t see this as consistent. If you get AGI in 2-8 years, you get ASI in a lot less than 2-8 more years after that.
The Quest for Sane Regulations
Full list of people attending Schumer’s meeting.
The Week in Audio
The main audio event this week was Inflection AI CEO and DeepMind founder Mustafa Suleyman on the 80,000 hours podcast, giving us a much better idea where his head is at, although is it even an 80,000 hours podcast if it is under an hour?
Up front, I want to say that it’s great that he went on 80,000 hours and engaged for real with the questions. A lot of Suleyman’s thinking here is very good, and his openness is refreshing. I am going to be harsh in places on the overview list below, so I want to be clear that he is overall being super helpful and I want more of this.
I also got to see notes on Suleyman’s new book, The Coming Wave. The book and podcast are broadly consistent, with the main distinction being that the book is clearly aiming to be normie-friendly and conspicuously does not discuss extinction risks, even downplaying details of the less extreme downsides he emphasizes more.
It is perhaps worth contrasting this with this CNN interview with former Google ECO Eric Schmidt, who thinks recursive self-improvement and superintelligence are indeed coming soon and a big deal we need to handle properly or else, while also echoing many of Suleyman’s concerns.
There is also the video and transcript of the talks from the San Francisco Alignment Workshop from last February. Quite the lineup was present.
Jan Leike’s talk starts out by noting that RLHF will fail when human evaluation fails, although we disagree about what counts as failure here. Then he uses the example of bugs in code and using another AI to point them out and states his principle of evaluation being easier than generation. Post contra this hopefully coming soon.
Sam Altman recommends surrounding yourself with people who will raise your ambition, warns 98% of people will pull you back. Full interview on YouTube here.
Telling is that he says that most people are too worried about catastrophic risk, and not worried enough about chronic risk – they should be concerned they will waste their life without accomplishment, instead they worry about failure. I am very glad someone with this attitude is out there running all but one of Sam Altman’s companies and efforts, and most people in most places could use far more of this energy. The problem is that he happens to also be CEO of OpenAI, working on the one problem where catastrophic (existential) risk is quite central.
Also he says (22:25) “If we [build AGI at OpenAI] that will be more important than all the innovation in all of human history.” He is right. Let that sink in.
Paige Bailey, the project manager for Google’s PaLM-2, goes on Cognitive Revolution. This felt like an alternative universe interview, from a world in which Google’s AI efforts are going well, or OpenAI and Anthropic didn’t exist, in addition to there being no risks to consider. It is a joy to see her wonder and excitement at all the things AI is learning how to do, and her passion for making things better. The elephant in the room, which is not mentioned at all, is that all of Google’s Generative AI products are terrible. To what extent this is the ‘fault’ of PaLM-2 is unclear but presumably that is a big contributing factor. It’s not that Bard is not a highly useful tool, it’s that multiple other companies with far fewer resources have done so much better and Bard is not catching up at least pre-Gemini.
Risks are not mentioned at all, although it is hard to imagine Bailey is at all worried about extinction risks. She also doesn’t see any problem with Llama-2 and open source, citing it unprompted as a great resource, which also goes against incentives. Oh how much I want her to be right and the rest of us to live in her world. Alas, I do not believe this is the case. We will see what Gemini has to offer. If Google thinks everything is going fine, that is quite the bad sign.
Rhetorical Innovation
Perhaps a good short explanation in response here?
The point I was trying to make last week, not landing as intended.
Exactly. We want to avoid ubiquitous surveillance, or minimize its impact. If there exists a sufficiently dangerous technology, that leaves you two choices.
Which of these will violate freedom less? My strong prediction for AGI is the first one.
As a reminder, this assumes we fully solved the alignment problem in the first place. This is how we deal with the threat of human misuse or misalignment of AGI in spite of alignment being robustly solved in practice. If we haven’t yet solved alignment, then failing to limit access (either to zero people, or at least to a very highly boxed system treated like the potential threat that it would be) would mean we are all very dead no matter what.
No One Would Be So Stupid As To
Make Google DeepMind’s AIs as autonomous as possible.
a16z gives out grants for open source AI work, doing their best to proliferate as much as possible with as few constraints as possible. Given Marc Andreessen’s statements, this should come as no surprise.
Aligning a Smarter Than Human Intelligence is Difficult
We have a class for that now, at Princeton, technically a graduate seminar but undergraduates welcome. Everything they will read is online so lots of resources and links there and list seems excellent at a glance.
Max Tegmark and Steve Omohundo drop a new paper claiming provably safe systems are the only feasible path to controlling AGI, Davidad notes no substantive disagreements with his OAA plan.
Jan Leike, head of alignment at OpenAI, relies heavily on the principle that verification is in general easier than generation. I strongly think this is importantly false in general for AI contexts. You need to approach having a flawless verifier, whereas the generator need not achieve that standard.
Proofs are the exception. The whole point of a proof is that it is easy to definitively verify. Relying only on that which you can prove is a heavy alignment tax, especially where the proof is in the math sense, not merely in the courtroom sense. If you can prove your system satisfies your requirements, and you can prove that your requirements satisfy your actual needs, you are all set.
The question is, can it be done? Can we build the future entirely out of things where we have proofs that they will do what we want, and not do the things we do not want?
That does seems super hard. The proposal here is to use AIs to discover proof-carrying code.
Where I get confused is, what would it mean to prove that a given set of code will do even the straightforward tasks like proving cybersecurity.
How does one prove that you cannot gain access without proper credentials? Doesn’t this fact rely upon physical properties, lest there be a bug or physical manipulation one can make? Couldn’t sufficiently advanced physical analysis allow access, if only via identification of the credentials? How do we know the AI won’t be able to figure out the credentials, perhaps in a way we don’t anticipate, perhaps in a classic way as simple as engineering a wrench attack?
They then consider securing the blockchain, such as by formally verifying Ethereum, which would still leave various vulnerabilities in those using the protocol, it would not I’d expect mean you were safe from a hack. The idea of proving that you have ‘secured critical infrastructure’ seems even more confused.
These don’t seem like the types of things one can prove even under normal circumstances. They certainly don’t seem like things you can prove if you have to worry about a potential superintelligent adversary, and their plan says you need to not assume AI non-hostility, let alone AI active alignment.
They do mean to do the thing, and warn that means doing it for real:
How are we going to pull this off? They suggest that once you have an LLM learn all the things, you can then abstract its functionality to traditional code.
I worry that Emerson Pugh comes to mind: If the human brain were so simple that we could understand it, we would be so simple that we couldn’t.
Will introspection ever be easier than operation? Will it be possible for a mind to be powerful enough to fully abstract out the meaningful operations of a similarly powerful mind? If not, will there be a way to safely ‘move down the chain’ where we are able to use a dangerous unaligned model we do not control to safely abstract out the functionality of a less powerful other model, which presumably involves formally verify the resulting code before we run it? Will we be able to generate that proof, again with the tools we dare create and use, in any sane amount of time, even if we do translate into normal computer code, presumably quite messy code and quite a lot of it?
The paper expresses great optimism about progress in mechanistic interpretability, and that we might be able to progress it to this level. I am skeptical.
Perhaps I am overestimating what we actually need here, if we can coordinate on the proof requirements? Perhaps we can give up quite a lot and still have enough with what is left. I don’t know. I do know that of the things I expect to be able to prove, I don’t know how to use them to do what needs to be done.
They suggest that Godel’s Completeness Theorem implies that, given AI systems are finite, any system you can’t prove is safe will be unsafe. In practice I don’t see how this binds. I agree with the ‘sufficiently powerful AGIs will find a way if a way exists’ part. I don’t agree with ‘you being unable to prove it in reasonable time’ implying that no proof exists, or that you can be confident the proof you think you have proves the practical property you think it proves.
I would also note that we are unlikely any time soon to prove that humans are safe in any sense, given that they clearly aren’t. Where does that leave us? They warn humans might have to operate without any guarantees of safety, but no system in human history has ever had real guarantees of safety, because it was part of human history. We have needed to find other ways to trust. They make a different case.
Similarly, if we actually do build the human-flourishing-enabling AI that will give us everything we want, it will be impossible to prove that it is safe, because it won’t be.
I get why this argument is being trotted out here. I don’t expect it to work. It never does, Arrested Development meme style.
Their argument laid out in the remainder of section 8, of why alternative approaches are unlikely to work, alas rings quite true. We have to solve an impossible problem somewhere. Pointing out an approach has impossible problems it requires you to solve is not as knock-down an argument as one would like it to be.
As calls to action, they suggest work on:
I despair at the proposed applications, which are very much seem to me to be in the ‘you are still dead’ and ‘have we not been over that this will never work’ categories.
This all does seem like work better done than not done, who knows, usefulness could ensue in various ways and downsides seem relatively small.
We would still have to address that worry mentioned earlier about “formally specifying complex requirements such as “don’t drive humanity extinct.”” I have not a clue. Anyone have ideas?
They finish with an FAQ, Davidad correctly labeled it fire.
I worry that this is a general counterargument for any objection that something is too technically difficult, either relatively or absolutely, and thus proves far too much.
Eliezer Yudkowsky responds more concisely to the whole proposal.
Yep. The idea, as I understand it, is to use proofs to gain capabilities while avoiding having to build a friendly superintelligence. Then use those capabilities to figure out how to do it (or prevent anyone from building an unfriendly one).
Twitter Community Notes Notes
Vitalik Buterin analyzes the Twitter community notes algorithm. It has a lot of fiddly details, but the core idea is simple. Qualified Twitter users participate, rating proposed community notes on a three-point scale, if your ratings are good you can propose new notes. Notes above about +0.4 helpfulness get shown. The key is that rather than use an average, notes are rewarded if people with a variety of perspectives vote the note highly, as measured by an organically emerging axis that corresponds very well to American left-right politics. Vitalik is especially excited because this is a very crypto-style approach, with a fully open-source algorithm determined by the participation of a large number of equal-weight participants with no central authority (beyond the ability to remove people from the pool for violations.)
This results in notes pretty much everyone likes, with a focus on hard and highly relevant facts, especially on materially false statements, and rejecting partisan statements.
He also notes that all the little complexity tweaks on top matter.
This is a great framing for the AI alignment debate.
In this framing, the central alignment-is-hard position is that you can’t use the engineering approach to align a system, because you are facing intelligence and optimization pressure that can adapt to the flaws in your noisy approach, and that then will exploit whatever weaknesses there are and kill you before you can furiously patch all the holes in the system. And that furiously patching less capable systems won’t much help you, the patches will stop working.
And also that because you have an engineering system that you are trying to align, even if it sort of does what you want now, it will stop doing that once it is confronted with the unexpected, or its capabilities improve enough to create an effectively unexpected set of affordances.
What is funny is that it is economists who are most skeptical of the things that might then go very wrong, and who then insist on an economist-style model of what will happen with these engineering-style systems. I’m not yet sure what to make of that.
In the context of Twitter, Vitalik notes that the complexity of the algorithm can backfire in terms of its credibility, as illustrated by a note critical of China that was posted then removed due to complex factors, with no direct intervention. It’s not simple to explain, so it could look like manipulation.
He also notes that the main criticism of community notes is that they do not go far enough, demanding too much consensus. I agree with Vitalik that it is better to demand a high standard of consensus, to maintain the reliability and credibility of the system, and to keep people motivated to word carefully and neutrally and focus on the facts.
The algorithm is open source, so it would perhaps be possible to allow some users to tinker with the algorithm for themselves. The risk is that they would then favor their tribe’s interpretations, which is the opposite of what the system is trying to accomplish, but you could safely allow for lower thresholds and looser conditions generally if you wanted to see more notes on the margin.
People Are Worried About AI Killing Everyone
People’s risk levels are up a little bit month over month on some questions (direct source).
I notice that the grave dangers number was essentially unchanged, whereas the capabilities number was up and the ‘no risk of human extinction’ was down. This could be small sample size, instead I suspect it is that people are not responding in consistent fashion and never have.
Other People Are Not As Worried About AI Killing Everyone
Tyler Cowen tries a new metaphor, so let’s try again in light of it.
I think ‘notice that less intelligence (or energy, or other useful things) is not what you want’ is a very good point to raise. Notice how many people who warn about the dangers of technology are actually opposed to civilization and even to humanity. Notice when the opposition to AGI – artificial general intelligence – is opposition to the A, when it is opposition to the G, and when it is opposition to the I.
Consider those who fear it will take their jobs. This is a real social and near-term issue, and we need to mitigate potential disruptions. Yet ‘this job’s work is no longer necessary to produce and do all the things, we now get that for free’ is a good thing, not a bad thing. Jobs are a cost, not a benefit, and we can now replace them with other jobs or other uses of time, while realizing that leaving people idle or without income is harmful and dangerous and if it happens at scale requires fixing.
The question that cuts reality at the joints here, I believe is: Do you support human intelligence augmentation? Would you rather people generally be smarter and more capable, or dumber and less capable?
I would strongly prefer humans generally be smarter across the board, in every sense. This is one of the most important things to do, and success would dramatically improve our future prospects, including for survival in the face of potential AGI. Would large human intelligence gains break or strain various things? Absolutely, and we would deal with it.
Thus, what are the relevant knobs we want to turn?
Why do I believe artificial intelligence is importantly different than human intelligence? Why do I value augmented humans, where I would not expect to (other than instrumentally) value a future smarter version of GPT? Why do I expect that augmented more intelligent humans would preserve the things and people that I care about, where I expect AGI to lead to their destruction?
This is in part a moral philosophy question. Do you care about you, your family and what other humans you care about in a way that you don’t care about a potential AGI? Robin Hanson would say that such AGI are our metaphorical children, as deserving of being considered moral patients and being assigned value as we are, and we should accept that such more fit minds will replace ours and seek to imbue them with some of our values, and accept that what is valued will dramatically change and what you think you value will likely mostly be gone. The word ‘speciesism’ has been thrown about for those who disagree with this.
I disagree with it. I believe that it is good and right to care about such distinctions, and value that which I choose to value. Whereas I expect to care about more intelligent humans the way I care about current humans.
It is then a practical question. What happens when the most powerful source of intelligence, the most capable and more powerful optimizing force available whatever you label it, is no longer humans, and is instead AIs? Would we remain in control? Would what we value be preserved and grow? Or would we face extinction?
In our timeline, I see three problems, only one of which I am optimistic about.
The first problem is the problem of social, political and economic disruption from the presence of more capable tools and new affordances – mundane utility, they took our jobs, deepfaketown and misinformation and all that. I am optimistic here.
The second problem is alignment. I am pessimistic here. Until we solve alignment, and can ensure such systems do what we want them to do, we need to not build them.
The third problem is the competitive and evolutionary, the dynamics and equilibrium of a world with many ASIs (artificial superintelligences) in it.
This is a world almost no one is making any serious attempt to think about or model, and those who have (such as fiction writers) almost always end up using hand waves or absurdities and presenting worlds highly out of equilibrium.
We will be creating something smarter and more capable and better at optimization than ourselves, that many people will have strong incentives both economic and ideological to make into various agents with various goals including reproduction and resource acquisition. Why should we expect to long be in charge, or even to survive?
If there is widespread access to ASI, then ASIs given the affordance to do so will outcompete humans at every turn. Anyone, or any company or government, that does not increasingly turn its decisions and actions over to such ASIs, and increasingly take humans out of the loop, will quickly be left in the dust. Those who do not ‘turn the moral weights down’ in some form will also prove uncompetitive. Those who do not turn their ASIs into agents (if they are not agents by default) will lose. The negative externalities will multiply, as will the ASIs themselves and their share of resources. ASIs will be set free to seek to acquire resources, make copies of themselves and modify to be more successful at these tasks, because this will be the competitively smart thing to do in many cases, and also because some people will ideologically wish to do this for its own sake.
That is all the default even if:
Remember also that if you open source an AI model, you are open sourcing the fully unaligned version of that model two days later, after it is fine tuned in this way by someone who wants that to exist. We have no current plan of how to prevent this.
Thus, we will need a way out of this mess, as well. We need that solution, at minimum, before we create the second ASI, ideally before we create the first one.
If I had a solution to both of these problems, that resulted in a world with humans still firmly in charge creating things that humans value, that I value, then I would be all for that, and would even tolerate a real risk that we fail and all perish. Alas, right now I see no such solutions.
Note that I expect these future coordination problems to be vastly harder than the current coordination problems of ‘labs face commercial pressure to build AGI’ or ‘we have to compete with China.’ If you think these current issues cannot be solved and we must instead race ahead, why do you think this future will be different?
If your plan is secretly ‘the right person or corporation or government takes this unique opportunity to take over, sidestepping all these problems’ then you need to own that, and all of its implications.
We could be reminded of the parable of the augments from Star Trek. Star Trek was the dominant good future choice when I asked in a series of polls a while back.
Augments were smarter, stronger and more capable than ordinary humans.
Alas, because it was a morality tale, that timeline failed to solve the augment alignment problem. Augments systematically lacked our moral qualms and desired power via instrumental convergence, and started the Eugenics Wars.
Fortunately for humanity, this was a fictional tale and augments could not trivially copy themselves or speed themselves up, nor could they do recursive self-improvement, and their numbers and capability advantages thus remained limited. Realistically the augments would have won – human writers can simultaneously make augments on paper smarter than us, then have Kirk outsmart Khan anyway, although reality would disagree – so in the story humanity somehow triumphed.
As a result, humanity banned human augmentation and genetic engineering, and this ban holds throughout the Star Trek universe.
This is despite that universe having periodic existential wars, in which any species that uses such skills would have a decisive advantage, and it being clear that it is possible to see dramatic capability gains without automatic alignment failure (see for example Julian Bashir on Deep Space Nine). Without its handful of illegal augmented humanoids, the Federation would have perished multiple times.
Note that Star Trek also has a huge ASI problem. The Enterprise ship’s computer is an ASI, and can create other ASIs on request, and Data vastly enhances overall ship capabilities. Everyone in that universe has somehow agreed simply to ignore that possibility, an illustration of how such stories are dramatically out of equilibrium.
For now, any ASI we could build would be a strictly much worse situation for us than the augments. It would be far more alien to us, not have inherent value, and quickly have a far greater capabilities gap and be impossible in practice to contain, and we alas do not live in a fictional universe protected by narrative causality (and, if you think about it, probably Qs or travelers or time paradoxes or something) or have the ability of that world’s humans to coordinate.
Also, minus points from Tyler in expectation for misuse of the word Bayes in the post title, is nothing sacred these days?
The New York Times reports that the real danger of AI is not that it might kill us, it is that it might not kill us, which would allow it to become a tool for neoliberalism. Yes, really.
Thing is, there is actually a point here, although the authors do not realize it. ‘Neoliberalism’ or ‘capitalism’ are not always ideologies or intentional constructs. They are also simply descriptions of the dynamics of systems when they are not under human control. If AIs are, as they will become by default, smarter, better optimizers and more efficient competitors than we are, and to win competitions and for other reasons we put them in charge of things or they take charge of things, the dynamics the author fears would be the result. Except instead of ‘increasing inequality’ or helping the bad humans, it would not help any of the humans, instead we would all be outcompeted and then die.
The Lighter Side
Then there’s Roon?
Yes? How about yes? I like this scrap and run away plan. I am here for this plan.
Roon also lays down the beats.
Twitter thread of captions of Oppenheimer, except more explicitly about AI.
The Server Break Room, one minute, no spoilers.