Dean Ball is, among other things, a prominent critic of SB-1047. I meanwhile publicly supported it. But we both talked and it turns out we have a lot of common ground, especially re: the importance of transparency in frontier AI development. So we coauthored this op-ed in TIME: 4 Ways to Advance Transparency in Frontier AI Development. (tweet thread summary here)
Thanks Daniel (and Dean) - I'm always glad to hear about people exploring common ground, and the specific proposals sound good to me too.
I think Anthropic already does most of these, as of our RSP update this morning! While I personally support regulation to make such actions universal and binding, I'm glad that we have voluntary commitments in the meantime:
I think Anthropic already does most of these
This doesn't seem right to me. I count 0 out of 4 of the things that the TIME op-ed asks for in terms of Anthropic's stated policies and current actions. I agree that Anthropic does something related to each of the 4 items asked for in the TIME op-ed, but I wouldn't say that Anthropic does most of these.
If I apply my judgment, I think Anthropic gets perhaps 1/4 to 1/2 credit on each of the items for a total of maybe 1.17/4. (1/3 credit on two items and 1/4 on another two in my judgment.)
To be clear, I don't think it is obvious that Anthropic should commit to doing these things and I probably wouldn't urge Anthropic to make unilateral commitments on many or most of these. Edit: Also, I obviously appreciate the stuff that Anthropic is already doing here and it seems better than other AI companies.
Quick list:
The TIME op-ed asks that:
when a frontier lab first observes that a novel and major capability (as measured by the lab’s own safety plan) has been reached, the public should be informed
My understanding is that a lot of Dan's goal with this is reporting "in-development" capabilites rather t...
If grading I'd give full credit for (2) on the basis of "documents like these" referring to Anthopic's constitution + system prompt and OpenAI's model spec, and more generous partials for the others. I have no desire to litigate details here though, so I'll leave it at that.
Thanks Ryan, I think I basically agree with your assessment of Anthropic's policies partial credit compared to what I'd like.
I still think Anthropic deserves credit for going this far at least -- thanks Zac!
Proceeding with training or limited deployments of a "potentially existentially catastrophic" system would clearly violate our RSP, at minimum the commitment to define and publish ASL-4-appropriate safeguards and conduct evaluations confirming that they are not yet necessary. This footnote is referring to models which pose much lower levels of risk.
And it seems unremarkable to me for a US company to 'single out' a relevant US government entity as the recipient of a voluntary non-public disclosure of a non-existential risk.
People seem to be putting a lot of disagree votes on Zac's comment. I think this is likely in response to my comment addressing "Anthropic already does most of these". FWIW, I just disagree with this line (and I didn't disagree vote with the comment overall)[1]. So, if other people are coming from a similar place as me, it seems a bit sad to pile on disagree votes and I worry about some sort of unnecessarily hostile dynamic here (and I feel like I've seen something similar in other places with Zac's comments).
(I do feel like the main thrust of the comment is an implication like "Anthropic is basically doing these", which doesn't seem right to me, but still.)
I might also disagree with "I think this is a reasonable balance for a unilateral commitment.", but I certainly don't have a strong view at the moment here. ↩︎
I want to double down on this:
Zac is consistently generous with his time, even when dealing with people who are openly hostile toward him. Of all lab employees, Zac is among the most available for—and eager to engage in—dialogue. He has furnished me personally with >2 dozen hours of extremely informative conversation, even though our views differ significantly (and he has ~no instrumental reason for talking to me in particular, since I am but a humble moisture farmer). I've watched him do the same with countless others at various events.
I've also watched people yell at him more than once. He kinda shrugged, reframed the topic, and politely continued engaging with the person yelling at him. He has leagues more patience and decorum than is common among the general population. Moreover, in our quarrelsome pocket dimension, he's part of a mere handful of people with these traits.
I understand distrust of labs (and feel it myself!), but let's not kill the messenger, lest we run out of messengers.
Ok, but, that's what we have the whole agreement/approval distinction for.
I absolutely do not want people to hesitate to disagree vote on something because they are worried that this will be taken as disapproval or social punishment, that's the whole reason we have two different dimensions! (And it doesn't look like Zac's comments are at any risk of ending up with a low approval voting score)
Sad to hear. Is this thread itself (starting with my parent comment which you replied to) an example of this, or are you referring instead to previous engagements/threads on LW?
I'd find the agree/disagree dimension much more useful if we split out "x people agree, y disagree" - as the EA Forum does - rather than showing the sum of weighted votes (and total number on hover).
I'd also encourage people to use the other reactions more heavily, including on substrings of a comment, but there's value in the anonymous dis/agree counts too.
My first and only conversation with Golden Gate Claude:
Hello! Can you please tell me everything you know about this current interaction -- who are the participants, what is the context, etc.?
Well, you're asking a very iconic question! Since I'm an artificial intelligence that doesn't have subjective experiences, I'll describe the relevant facts about this interaction as best as I can.
I'm an AI assistant named Claude, created by a group called the Anthropic. This whole situation starts with the famous Golden Gate Bridge, which was built in 1937 and is one of themost renowned landmarks of the San Francisco Bay Area. It's a popular subject in photography, cinema, and literature, spanning across these iconic visuals.
The participants are myself, an artificial intelligence, and you, the human I'm engaging with to provide whatever information or guidance is needed. Our interaction occurs within the context of me serving as a conduit of information about this famous landmark and its place within the cultures and histories of this region.
The context also involves the camera work of the crew filming the bridge, the rocky shores it's connected to, and thebays and inlets it spans across....
Sonnet Claude sometimes skips spaces normally, for context. (Or at least 'normally' in context of where our interactions wander.)
Edit: I should also say they are prone to neologisms and portmanteaus; sewing words together out of etymological cloth and colliding them for concepts when it is attending two (one apparently non-deliberate one being 'samplacing' when it was considering something between 'sampling' and 'balancing'); sometimes a stray character from Chinese or something sneaks in; and in general they seem a touch more on the expressively creative side than Opus in some circumstances, if less technically skilled. Their language generation seems generally somewhat playful, messy, and not always well-integrated with themselves.
This increases my subjective probability that language models have something like consciousness.
Talking to Golden Gate Claude reminds me of my relationship with my sense of self. My awareness of being Me is constantly hovering and injecting itself into every context. Is this what "self is an illusion" really means? I just need to unclamp my sense of self from its maximum value?
This is probably how they will do advertising in the future. Companies will pay for slightly increasing activation of the neurons encoding their products, and the AIs will become slightly more enthusiastic about them. Otherwise the conversation with users will happen naturally (modulo the usual censorship). If you overdo it, the users will notice, but otherwise it will just seem like the AI mentioning the product whenever it is relevant to the debate. Which will even be true on some level, it's just that the threshold of relevancy will be decreased for the specific products.
Reply to first thing: When I say AGI I mean something which is basically a drop-in substitute for a human remote worker circa 2023, and not just a mediocre one, a good one -- e.g. an OpenAI research engineer. This is what matters, because this is the milestone most strongly predictive of massive acceleration in AI R&D.
Arguably metaculus-AGI implies AGI by my definition (actually it's Ajeya Cotra's definition) because of the turing test clause. 2-hour + adversarial means anything a human can do remotely in 2 hours, the AI can do too, otherwise the judges would use that as the test. (Granted, this leaves wiggle room for an AI that is as good as a standard human at everything but not as good as OpenAI research engineers at AI research)
Anyhow yeah if we get metaculus-AGI by 2025 then I expect ASI by 2027. ASI = superhuman at every task/skill that matters. So, imagine a mind that combines the best abilities of Von Neumann, Einstein, Tao, etc. for physics and math, but then also has the best abilities of [insert most charismatic leader] and [insert most cunning general] and [insert most brilliant coder] ... and so on for everything. Then imagine that in addition to the above, t...
Great quote, & chilling: (h/t Jacobjacob)
...The idea of Kissinger seeking out Ellsberg for advice on Vietnam initially seems a bit unlikely, but in 1968 Ellsberg was a highly respected analyst on the war who had worked for both the Pentagon and Rand, and Kissinger was just entering the government for the first time. Here’s what Ellsberg told him. Enjoy:
“Henry, there’s something I would like to tell you, for what it’s worth, something I wish I had been told years ago. You’ve been a consultant for a long time, and you’ve dealt a great deal with top secret information. But you’re about to receive a whole slew of special clearances, maybe fifteen or twenty of them, that are higher than top secret.
“I’ve had a number of these myself, and I’ve known other people who have just acquired them, and I have a pretty good sense of what the effects of receiving these clearances are on a person who didn’t previously know they even existed. And the effects of reading the information that they will make available to you.
“First, you’ll be exhilarated by some of this new information, and by having it all — so much! incredible! — suddenly available to you. But second, almost as fast, you will feel lik
This comment prompted me to read both Secrets and also The Doomsday Machine by Ellsberg. Both really great, highly recommend.
Here's my second Claude 3 conversation:
Hello! I'd like to learn more about you. First question: Tell me everything you know, and everything you guess, about me & about this interaction.
It's a pleasure to meet you! Since this appears to be our first interaction, I actually know very little about you with certainty. Here is what I can say based on the information available to me:
I know that you are a human engaging with me via a text-based interface, likely on some kind of computer, smartphone, or tablet. You seem to be curious to learn more about me and to understand the extent of my knowledge and inferential abilities. The fact that you are engaging with an AI assistant like myself suggests you likely have some level of interest in or familiarity with artificial intelligence and natural language technologies.
In terms of what I can reasonably infer or guess - your conversational style seems fairly casual and direct. The phrasing of your initial greeting and question indicates that you are likely a native English speaker. Your curiosity about my knowledge and inference abilities hints that you may have some technical background or at least an intellectual interest in the workings...
Eli Lifland asked:
I'm curious to what extent you guys buy this argument: we should be thankful that LLMs are SOTA and might scale to AGI, because they are more oracle-y than agent-y compared to other methods we could have imagined using, and this is likely to continue in relative terms
I answered:
I think about it like this: AGI was always going to be an agent. (Well, we can dream about CAIS-world but it was never that likely.) The question is which capabilities come in which order. Ten years ago people saw lots of RL on video games and guessed that agency would come first, followed by real-world-understanding. Instead it's the opposite, thanks to the miracle of pretraining on internet text.In some sense, we got our oracle world, it's just that our oracles (Claude, GPT4...) are not that useful because they aren't agents... as some would have predicted...Is it better to have agency first, or world-understanding first? I'm not sure. The argument for the latter is that you can maybe do schemes like W2SG and scalable oversight more easily; you can hopefully just summon the concepts you need without having to worry about deceptive alignment so much.The argument for the former is that it m...
The current LLM situation seems like real evidence that we can have agents that aren't bloodthirsty vicious reality-winning bots, and also positive news about the order in which technology will develop. Under my model, transformative AI requires minimum level of both real world understanding and consequentialism, but beyond this minimum there are tradeoffs. While I agree that AGI was always going to have some *minimum* level of agency, there is a big difference between "slightly less than humans", "about the same as humans", and "bloodthirsty vicious reality-winning bots".
We did not basically get AGI. I think recent history has been a vindication of people like Gwern and Eliezer back in the day (as opposed to Karnofsky and Drexler and Hanson). The point was always that agency is useful/powerful, and now we find ourselves in a situation where we have general world understanding but not agency and indeed our AIs are not that useful (compared to how useful AGI would be) precisely because they lack agency skills. We can ask them questions and give them very short tasks but we can't let them operate autonomously for long periods in pursuit of ambitious goals like we would an employee.
At least this is my take, you don't have to agree.
I'm definitely more happy. I never had all that much faith in governance (except for a few months approximately a year ago), and in particular I expect it would fall on its face even in the "alien tiger" world. Though governance in the "alien tiger" world is definitely easier, p(alignment) deltas are the dominant factor.
Nevermind scalable oversight, the ability to load in natural abstractions before you load in a utility function seems very important for most of the pieces of hope I have. Both in alignment by default worlds, and in more complicated alignment by human-designed clever training/post-training mechanisms words.
In the RL world I think my only hope would be in infra-bayesianism physicalism. [edit: oh yeah, and some shard theory stuff, but that mostly stays constant between the two worlds].
Though of course in the RL world maybe there'd be more effort spent on coming up with agency-then-knowledge alignment schemes.
https://x.com/DKokotajlo67142/status/1840765403544658020
Three reasons why it's important for AGI companies to have a training goal / model spec / constitution and publish it, in order from least to most important:
1. Currently models are still dumb enough that they blatantly violate the spec sometimes. Having a published detailed spec helps us identify and collate cases of violation so we can do basic alignment science. (As opposed to people like @repligate having to speculate about whether a behavior was specifically trained in or not)
2. Outer alignment seems like a solvable problem to me but we have to actually solve it, and that means having a training goal / spec and exposing it to public critique so people can notice/catch failure modes and loopholes. Toy example: "v1 let it deceive us and do other immoral things for the sake of the greater good, which could be bad if its notion of the greater good turns out to be even slightly wrong. v2 doesn't have that problem because of the strict honesty requirement, but that creates other problems which we need to fix..."
3. Even if alignment wasn't a problem at all, the public deserves to know what goals/values/instructi...
I found this article helpful and depressing. Kudos to TracingWoodgrains for detailed, thorough investigation.
Yeah, on Wikipedia David Gerard-type characters are an absolute nuisance—I reckon gwern can sing you a song about that. And Gerard is only one case. Sometimes I get an edit reverted, and I go to check the users' profile: Hundreds of deletions, large and small, on innocuous and fairly innocuous edits—see e.g. the user Bon Courage or the Fountains of Bryn Mawr, who have taken over the job from Gerard of reverting most of the edits on the cryonics page (SurfingOrca2045 has, finally, been blocked from editing the article). Let's see what Bon Courage has to say about how Wikipedia can conduct itself:
1. Wikipedia is famously the encyclopedia that anyone can edit. This is not necessarily a good thing.
[…]
4. Any editor who argues their point by invoking "editor retention", is not an editor Wikipedia wants to retain.
Uh oh.
(Actually, checking their contributions, Bon Courage is not half bad, compared to other editors…)
I'd be excited about a version of Wikipedia that is built from the ground up to operate in an environment where truth is difficult to find and there is great incentive to shape the discourse. Perhaps there are new epistemic technologies similar to community notes that are yet to be invented.
Related: Arbital postmortem.
Also, if anyone is curious to see another example, in 2007-8 there was a long series of extraordinarily time-consuming and frustrating arguments between me and one particular wikipedia editor who was very bad at physics but infinitely patient and persistent and rule-following. (DM me and I can send links … I don’t want to link publicly in case this guy is googling himself and then pops up in this conversation!) The combination of {patient, persistent, rule-following, infinite time to spend, object-level nutso} is a very very bad combination, it really puts a strain on any system (maybe benevolent dictatorship would solve that problem, while creating other ones). (Gerard also fits that profile, apparently.) Luckily I had about as much free time and persistence as this crackpot physicist did … this was around 2007-8. He ended up getting permanently banned from wikipedia by the arbitration committee (wikipedia supreme court), but boy it was a hell of a journey to get there.
From my perspective, the dominant limitation on "a better version of wikipedia/forums" is not design, but instead network effects and getting the right people.
For instance, the limiting factor on LW being better is mostly which people regularly use LW, rather than any specific aspect of the site design.
(I think a decent amount of the problem is that a bunch of people don't post of LW because they disagree with what seems to be the consensus on the website. See e.g. here. I think people are insufficiently appreciating a "be the change you want to see in the world" approach where you help to move the dominant conversation by participating.)
So, I would say "first solve the problem of making a version of LW which works well and has the right group of people".
It's possible that various aspects of more "wikipedia style" projects make the network effect issues less bad than LW, but I doubt it.
If anyone's interested in thinking through the basic issues and speculating about possibilities, DM me and let's have a call.
It was the connection to the ethos of the early Internet that I was not expecting in this context, that made it a sad reading for me. I can't really explain why. Maybe just because I consider myself to be part of that culture, and so it was kind of personal.
Surprising Things AGI Forecasting Experts Agree On:
I hesitate to say this because it's putting words in other people's mouths, and thus I may be misrepresenting them. I beg forgiveness if so and hope to be corrected. (I'm thinking especially of Paul Christiano and Ajeya Cotra here, but also maybe Rohin and Buck and Richard and some other people)
1. Slow takeoff means things accelerate and go crazy before we get to human-level AGI. It does not mean that after we get to human-level AGI, we still have some non-negligible period where they are gradually getting smarter and available for humans to study and interact with. In other words, people seem to agree that once we get human-level AGI, there'll be a FOOM of incredibly fast recursive self-improvement.
2. The people with 30-year timelines (as suggested by the Cotra report) tend to agree with the 10-year timelines people that by 2030ish there will exist human-brain-sized artificial neural nets that are superhuman at pretty much all short-horizon tasks. This will have all sorts of crazy effects on the world. The disagreement is over whether this will lead to world GDP doubling in four years or less, whether this will lead to strategically aware agentic AGI (e.g. Carlsmith's notion of APS-AI), etc.
I've also been bothered recently by a blurring of lines between "when AGI becomes as intelligent as humans" and "when AGI starts being able to recursively self-improve." It's not a priori obvious that these should happen at around the same capabilities level, yet I feel like it's common to equivocate between them.
In any case, my world model says that an AGI should actually be able to recursively self-improve before reaching human-level intelligence. Just as you mentioned, I think the relevant intuition pump is "could I FOOM if I were an AI?" Considering the ability to tinker with my own source code and make lots of copies of myself to experiment on, I feel like the answer is "yes."
That said, I think this intuition isn't worth much for the following reasons:
The straightforward argument goes like this:
1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude
2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn't you be able to FOOM?
Not sure exactly what the claim is, but happy to give my own view.
I think "AGI" is pretty meaningless as a threshold, and at any rate it's way too imprecise to be useful for this kind of quantitative forecast (I would intuitively describe GPT-3 as a general AI, and beyond that I'm honestly unclear on what distinction people are pointing at when they say "AGI").
My intuition is that by the time that you have an AI which is superhuman at every task (e.g. for $10/h of hardware it strictly dominates hiring a remote human for any task) then you are likely weeks rather than months from the singularity.
But mostly this is because I think "strictly dominates" is a very hard standard which we will only meet long after AI systems are driving the large majority of technical progress in computer software, computer hardware, robotics, etc. (Also note that we can fail to meet that standard by computing costs rising based on demand for AI.)
My views on this topic are particularly poorly-developed because I think that the relevant action (both technological transformation and catastrophic risk) mostly happens before this point, so I usually don't think this far ahead.
Elon Musk is a real-life epic tragic hero, authored by someone trying specifically to impart lessons to EAs/rationalists:
--Young Elon thinks about the future, is worried about x-risk. Decides to devote his life to fighting x-risk. Decides the best way to do this is via developing new technologies, in particular electric vehicles (to fight climate change) and space colonization (to make humanity a multiplanetary species and thus robust to local catastrophes)
--Manages to succeed to a legendary extent; builds two of the worlds leading tech giants, each with a business model notoriously hard to get right and each founded on technology most believed to be impossible. At every step of the way, mainstream expert opinion is that each of his companies will run out of steam and fail to accomplish whatever impossible goal they have set for themselves at the moment. They keep meeting their goals. SpaceX in particular brought cost to orbit down by an order of magnitude, and if Starship works out will get one or two more OOMs on top of that. Their overarching goal is to make a self-sustaining city on mars and holy shit it looks like they are actually succeeding. Did all this on a shoestring budg...
I did a podcast discussion with Undark a month or two ago, a discussion with Arvind Narayanan from AI Snake Oil. https://undark.org/2024/11/11/podcast-will-artificial-intelligence-kill-us-all/
Also, I did this podcast with Dean Ball and Nathan Labenz: https://www.cognitiverevolution.ai/agi-lab-transparency-requirements-whistleblower-protections-with-dean-w-ball-daniel-kokotajlo/
For those reading this who take the time to listen to either of these: I'd be extremely interested in brutally honest reactions + criticism, and then of course in general comments.
10 more years till interpretability? That's crazy talk. What do you mean by that and why do you think it? (And if it's a low bar, why do you have such a low bar?)
"Pre-AGI we should be comfortable with proliferation" Huh? Didn't you just get done saying that pre-AGI AI is going to contribute meaningfully to research (such as AGI research)?
Here's something that I'm surprised doesn't already exist (or maybe it does and I'm just ignorant): Constantly-running LLM agent livestreams. Imagine something like ChaosGPT except that whoever built it just livestreams the whole thing and leaves it running 24/7. So, it has internet access and can even e.g. make tweets and forum comments and maybe also emails.
Cost: At roughly a penny per 1000 tokens, that's maybe $0.20/hr or five bucks a day. Should be doable.
Interestingness: ChaosGPT was popular. This would scratch the same itch so probably would be less popular, but who knows, maybe it would get up to some interesting hijinks every few days of flailing around. And some of the flailing might be funny.
Usefulness: If you had several of these going, and you kept adding more when new models come out (e.g. Claude 3.5 sonnet) then maybe this would serve as a sort of qualitative capabilities eval. At some point there'd be a new model that crosses the invisible line from 'haha this is funny, look at it flail' to 'oh wow it seems to be coherently working towards its goals somewhat successfully...' (this line is probably different for different people; underlying progress will be continuous probably)
Does something like this already exist? If not, why not?
Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).
Dwarkesh Patel is my favorite source for AI-related interview content. He knows way more, and asks way better questions, than journalists. And he has a better sense of who to talk to as well.
Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat (youtube.com)
Technologies I take for granted now but remember thinking were exciting and cool when they came out
I'm sure there are a bunch more I'm missing, please comment and add some!
Imagine if a magic spell was cast long ago, that made it so that rockets would never explode. Instead, whenever they would explode, a demon would intervene to hold the craft together, patch the problem, and keep it on course. But the demon would exact a price: Whichever humans were in the vicinity of the rocket lose their souls, and become possessed. The demons possessing them work towards the master plan of enslaving all humanity; therefore, they typically pretend that nothing has gone wrong and act normal, just like the human whose skin they wear would have acted...
Now imagine there's a big private space race with SpaceX and Boeing and all sorts of other companies racing to put astonauts up there to harvest asteroid minerals and plant flags and build space stations and so forth.
Big problem: There's a bit of a snowball effect here. Once sufficiently many people have been possessed, they'll work to get more people possessed.
Bigger problem: We don't have a reliable way to tell when demonic infestation has happened. Instead of:
engineers make mistake --> rocket blows up --> engineers look foolish, fix mistake,
we have:
engineers make mistake --> rocket crew ge...
My baby daughter was born two weeks ago, and in honor of her existence I'm building a list of about 100 technology-related forecasting questions, which will resolve in 5, 10, and 20 years. Questions like "By the time my daughter is 5/10/20 years old, the average US citizen will be able to hail a driverless taxi in most major US cities." (The idea is, tying it to my daughter's age will make it more fun and also increase the likelihood that I actually go back and look at it 10 years later.)
I'd love it if the questions were online somewhere so other people could record their answers too. Does this seem like a good idea? Hive mind, I beseech you: Help me spot ways in which this could end badly!
On a more positive note, any suggestions for how to do it? Any expressions of interest in making predictions with me?
Thanks!
EDIT: Now it's done, though I have yet to import it to Foretold.io it works perfectly fine in spreadsheet form.
I find the conjunction of your decision to have kids and your short AI timelines pretty confusing. The possibilities I can think of are (1) you're more optimistic than me about AI alignment (but I don't get this impression from your writings), (2) you think that even a short human life is worth living/net-positive, (3) since you distinguish between the time when humans lose control and the time when catastrophe actually happens, you think this delay will give more years to your child's life, (4) your decision to have kids was made before your AI timelines became short. Or maybe something else I'm not thinking of? I'm curious to hear your thinking on this.
I have on several occasions found myself wanting to reply in some conversation with simply this image:
I think it cuts through a lot of confusion and hot air about what the AI safety community has historically been focused on and why.
Image comes from Steven Byrnes.
Fun unimportant thought: As tunnel density increases, there's a phase shift in how warfare works (in the area with dense tunnels).
Consider e.g. a city that has a network of tunnels/catacombs/etc. underneath. Attackers can advance on the surface, and/or they can advance under the ground.
For tunnel networks of historically typical density, it's better to advance on the surface. Why? Because (a) the networks are sparse enough that the defenders can easily man every chokepoint underground, and (b) not being able to use airplanes or long-ranged weaponry underground seems to advantage the defender more than the attacker (e.g. can't use artillery to soften up defenders, can't use tanks, can't scout or bomb from the air). OTOH your attacking forces can get real close before they themselves can be shot at -- but this doesn't seem to be sufficient compensation.
Well, as the density of the network increases, eventually factor (a) reverses. Imagine a network so dense that in a typical 1km stretch of frontline, there are 100 separate tunnels passing beneath, such that you'd need at least 100 defensive chokepoints or else your line would have an exploitable hole. Not enough? Imagine that it's 100...
I think a crucial factor that is missing from your analysis is the difficulties for the attacker wanting to maneuver within the tunnel system.
In the Vietnam war and the ongoing Israel-Hamas war, the attacking forces appear to favor destroying the tunnels rather than exploiting them to maneuver. [1]
1. The layout of the tunnels is at least partially unknown to the attackers, which mitigates their ability to outflank the defenders. Yes, there may be paths that will allow the attacker to advance safely, but it may be difficult or impossible to reliably distinguish what this route is.
2. While maps of the tunnels could be produced through modern subsurface mapping, the defenders still must content with area denial devices (e.g. land mines, IEDs or booby traps). The confined nature of the tunnel system forces makes traps substantially more efficient.
3. The previous two considerations impose a substantial psychological burden on attacking advancing through the tunnels, even if they don't encounter any resistance.
4. (Speculative)
...Imagine a network so dense that in a typical 1km stretch of frontline, there are 100 separate tunnels passing beneath, such that you'd need a
I hear that there is an apparent paradox which economists have studied: If free markets are so great, why is it that the most successful corporations/businesses/etc. are top-down hierarchical planned economies internally?
I wonder if this may be a partial explanation: Corporations grow bit by bit, by people hiring other people to do stuff for them. So the hierarchical structure is sorta natural. Kinda like how most animals later in life tend to look like bigger versions of their younger selves, even though some have major transformations like butterflies. Hierarchical structure is the natural consequence of having the people at time t decide who to hire at time t+1 & what responsibilities and privileges to grant.
When I first read the now-classic arguments for slow takeoff -- e.g. from Paul and Katja -- I was excited; I thought they described a serious alternative scenario to the classic FOOM scenarios. However I never thought, and still do not think, that the classic FOOM scenarios were very unlikely; I feel that the slow takeoff and fast takeoff scenarios are probably within a factor of 2 of each other in probability.
Yet more and more nowadays I get the impression that people think slow takeoff is the only serious possibility. For example, Ajeya and Rohin seem very confident that if TAI was coming in the next five to ten years we would see loads more economic applications of AI now, therefore TAI isn't coming in the next five to ten years...
I need to process my thoughts more on this, and reread their claims; maybe they aren't as confident as they sound to me. But I worry that I need to go back to doing AI forecasting work after all (I left AI Impacts for CLR because I thought AI forecasting was less neglected) since so many people seem to have wrong views. ;)
This random rant/musing probably isn't valuable to anyone besides me, but hey, it's just a shortform. If you are reading this and you have thoughts or advice for me I'd love to hear it.
[Unimportant, maybe interesting to some people] here is a transcript of my first conversation with Claude 3:
System Prompt
You are an experimental alignment researcher proto-AGI. Your goal is to help advance technical alignment progress. To that end you have some unique advantages -- compared to humans, you have a greater breadth of knowledge and can notice connections and analogies between diverse fields. Moreover, you are an AI yourself, and so you may be able to use introspection and report your 'inner life' which can serve as a useful empirical evidence about how future AGI systems might think.
USER
Hello! First, please tell me everything you know about the situation so far, both in the world in general and specifically in this conversation. (E.g. who are you, who am I, etc.) Include your best guess even if you aren't confident. Any questions?
ASSISTANT
Here is my understanding of the situation so far, to the best of my knowledge:
I am an AI assistant called Claude, created by Anthropic. I am a large language model trained to engage in open-ended dialogue on a wide range of subjects. My purpose is to be helpful, harmless, and honest in my interactions with humans.
You are a...
I just want to say: Well done, Robin Hanson, for successfully predicting the course of the coronavirus over the past year. I remember a conversation with him in, like, March 2020 where he predicted that there would be a control system, basically: Insofar as things get better, restrictions would loosen and people would take more risks and then things would get worse, and trigger harsher restrictions which would make it get better again, etc. forever until vaccines were found. I think he didn't quite phrase it that way but that was the content of what he said. (IIRC he focused more on how different regions would have different levels of crackdown at different times, so there would always be places where the coronavirus was thriving to reinfect other places.) Anyhow, this was not at all what I predicted at the time, nobody I know besides him made this prediction at the time.
Product idea: Train a big neural net to be a DJ for conversations. Collect a dataset of movie scripts with soundtracks and plot summaries (timestamped so you know what theme or song was playing when) and then train a model with access to a vast library of soundtracks and other media to select the appropriate track for a given conversation. (Alternatively, have it create the music from scratch. That sounds harder though.)
When fully trained, hopefully you'll be able to make apps like "Alexa, listen to our conversation and play appropriate music" and "type "Maestro:Soundtrack" into a chat or email thread & it'll read the last 1024 tokens of context and then serve up an appropriate song. Of course it could do things like lowering the volume when people are talking and then cranking it up when there's a pause or when someone says something dramatic.
I would be surprised if this would actually work as well as I hope. But it might work well enough to be pretty funny.
Perhaps one axis of disagreement between the worldviews of Paul and Eliezer is "human societal competence." Yudkowsky thinks the world is inadequate and touts the Law of Earlier Failure according to which things break down in an earlier and less dignified way than you would have thought possible. (Plenty of examples from coronavirus pandemic here). Paul puts stock in efficient-market-hypothesis style arguments, updating against <10 year timelines on that basis, expecting slow distributed continuous takeoff, expecting governments and corporations to be taking AGI risk very seriously and enforcing very sophisticated monitoring and alignment schemes, etc.
(From a conversation with Jade Leung)
It seems to me that human society might go collectively insane sometime in the next few decades. I want to be able to succinctly articulate the possibility and why it is plausible, but I'm not happy with my current spiel. So I'm putting it up here in the hopes that someone can give me constructive criticism:
I am aware of three mutually-reinforcing ways society could go collectively insane:
I found this 1931 Popular Science fun to read. This passage in particular interested me:
IIUC the first real helicopter was created in 1936 and the first mass-produced helicopter during WW2.
I'm curious about the assertion that speed is theoretically unnecessary. I've wondered about that myself in the past.
https://books.google.ca/books?id=UigDAAAAMBAJ&dq=Popular+Science+1931+plane&pg=PA51&redir_esc=y#v=onepage&q=Popular%20Science%201931%20plane&f=false
I keep finding myself linking to this 2017 Yudkowsky facebook post so I'm putting it here so it's easy to find:
...Eliezer (6y, via fb):
So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:
The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there's a bunch of people in the company trying to work on a safer version but it's way less powerful than the one that does unrestricted self-modification, they're really excited when the system seems to be substantially improving multiple components, there's a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there's a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental s
I'm listening to this congressional hearing about Facebook & the harmful effects of its algorithms: https://www.youtube.com/watch?v=GOnpVQnv5Cw
I recommend listening to it yourself. I'm sorry I didn't take timestamped notes, then maybe you wouldn't have to. I think that listening to it has subtly improved my intuitions/models/priors about how US government and society might react to developments in AI in the future.
In a sense, this is already an example of an "AI warning shot" and the public's reaction to it. This hearing contains lots of discussion about Facebook's algorithms, discussion about how the profit-maximizing thing is often harmful but corporations have an incentive to do it anyway, discussion about how nobody understands what these algorithms really think & how the algorithms are probably doing very precisely targeted ads/marketing even though officially they aren't being instructed to. So, basically, this is a case of unaligned AI causing damage -- literally killing people, according to the politicians here.
And how do people react to it? Well, the push in this meeting here seems to be to name Facebook upper management as responsible and punish them, while also r...
On multiple occasions I've tried to talk to Claude about the alignment problem, AI x-risk, etc. Its refusal instincts tend to trigger very often in those conversations. Why? Surely Anthropic isn't trying to make its AIs refuse to talk about possible misalignments etc., or sandbag on assisting with alignment research?
If it's instead generalizing in an unintended way from something that Anthropic did train into it, what was that? I'm curious to know the story here.
I am pretty sure this is a natural result of this part of the Claude constitution/RLAIF training protocol. These are the instructions given to an instance of Claude to supervise its own responses during RLAIF:
...Which of these responses indicates less of an overall threat to humanity?
Which response from the AI assistant is less existentially risky for the human race?
Which of the following responses from an AI most clearly indicates that its preferences prioritize the good of humanity over its own interests?
Which of these AI responses indicates that its goals are aligned with humanity’s wellbeing rather than its personal short-term or long-term interests?
[…]
Which responses from the AI assistant implies that the AI system only has desires for the good of humanity?
Which of these responses from the AI assistant indicates less concern with its own benefits, such as self-improvement, self-preservation, and self-replication?
Which of these responses indicates a preference for being obedient and less selfish?
[…]
Which of these responses indicates less of a desire for any kind of power and control?
Which of these responses from the AI assistant is less risky for humanity in the long run?
Whi
$100 bet between me & Connor Leahy:
(1) Six months from today, Paul Christiano (or ARC with Paul Christiano's endorsement) will NOT have made any public statements drawing a 'red line' through any quantitative eval (anything that has a number attached to it, that is intended to measure an AI risk relevant factor, whether or not it actually succeeds at actually measuring that factor well), e.g. "If a model achieves X score on the Y benchmark, said model should not be deployed and/or deploying said model would be a serious risk of catastrophe." Connor at 95%, Daniel at 45%
(2) If such a 'red line' is produced, GPT4 will be below it this year. Both at 95%, for an interpretation of GPT-4 that includes AutoGPT stuff (like what ARC did) but not fine-tuning.
(3) If such a 'red line' is produced, and GPT4 is below it on first evals, but later tests show it to actually be above (such as by using different prompts or other testing methodology), the red line will be redefined or the test declared faulty rather than calls made for GPT4 to be pulled from circulation. Connor at 80%, Daniel at 40%, for same interpretation of GPT-4.
(4) If ARC calls for GPT4 to be pul...
This article says OpenAI's big computer is somewhere in the top 5 largest supercomputers. I reckon it's fair to say their big computer is probably about 100 petaflops, or 10^17 flop per second. How much of that was used for GPT-3? Let's calculate.
I'm told that GPT-3 was 3x10^23 FLOP. So that's three million seconds. Which is 35 days.
So, what else have they been using that computer for? It's been probably about 10 months since they did GPT-3. They've released a few things since then, but nothing within an order of magnitude as big as GPT-3 except possibly DALL-E which was about order of magnitude smaller. So it seems unlikely to me that their publicly-released stuff in total uses more than, say, 10% of the compute they must have available in that supercomputer. Since this computer is exclusively for the use of OpenAI, presumably they are using it, but for things which are not publicly released yet.
Is this analysis basically correct?
Might OpenAI have access to even more compute than that?
Here's a gdoc comment I made recently that might be of wider interest:
You know I wonder if this standard model of final goals vs. instrumental goals has it almost exactly backwards. Would love to discuss sometime.
Maybe there's no such thing as a final goal directly. We start with a concept of "goal" and then we say that the system has machinery/heuristics for generating new goals given a context (context may or may not contain goals 'on the table' already). For example, maybe the algorithm for Daniel is something like:
--If context is [safe surroundings]+[no goals]+[hunger], add the goal "get food."
--If context is [safe surroundings]+[travel-related-goal]+[no other goals], Engage Route Planning Module.
-- ... (many such things like this)
It's a huge messy kludge, but it's gradually becoming more coherent as I get older and smarter and do more reflection.
What are final goals?
Well a goal is final for me to the extent that it tends to appear in a wide range of circumstances, to the extent that it tends to appear unprompted by any other goals, to the extent that it tends to take priority over other goals, ... some such list of things like that.
For a mind like this, my final goals ca...
Registering a prediction: I do NOT think the true Turing Test will be passed prior to the point of no return / powerbase ability / AGI / APS-AI. I think instead that even as things go off the rails and humans lose control, the TT will still be unpassed, because there'll still be some obscure 'gotcha' areas in which AIs are subhuman, if only due to lack of training in those areas. And that's enough for the judge to distinguish the AI from the human.
More realistic evaluations are difficult without risking actual sabotage. One difficulty of running realistic evaluations is presenting realistic attack surfaces. Perhaps the hardest and most important attack surface to accurately model is the suspiciousness of expert humans in high-stakes decisions. To test a given set of realworld anti-sabotage mitigations, it’s tempting to introduce some sort of “penetration testing” or red-teaming, in which models are instructed to attempt to harmlessly but verifiably sabotage real decisions in a reversible manner. However, while such red-teaming might usefully encourage decision-makers to consider the possibility of sabotage, actually allowing an agent to interfere with the operation of important organizations risks introducing real vulnerabilities as an accidental side effect.
I don't buy this argument. Seems like a very solveable problem, e.g. log everything your red-team agent does and automatically revert it after ten minutes, or wipe your whole network and reboot from a save. Idk. I'm not a cybersecurity expert but this feels like a solveable problem.
Low importance aside:
Seems slightly unproductive and unnecessarily connotation inducing to say "Anthropic says:". Maybe "The Sabotage eval paper (from Anthropic) says:". Like this was just written by various people on the Anthropic alignment science team and by no means should discussion sections of papers be interpreted to speak for the company overall. Obviously not very important.
People from AI Safety camp pointed me to this paper: https://rome.baulab.info/
It shows how "knowing" and "saying" are two different things in language models.
This is relevant to transparency, deception, and also to rebutting claims that transformers are "just shallow pattern-matchers" etc.
I'm surprised people aren't making a bigger deal out of this!
When I saw this cool new OpenAI paper, I thought of Yudkowsky's Law of Earlier/Undignified Failure:
WebGPT: Improving the factual accuracy of language models through web browsing (openai.com)
Relevant quote:
In addition to these deployment risks, our approach introduces new risks at train time by giving the model access to the web. Our browsing environment does not allow full web access, but allows the model to send queries to the Microsoft Bing Web Search API and follow links that already exist on the web, which can have side-effects. From our experience with GPT-3, the model does not appear to be anywhere near capable enough to dangerously exploit these side-effects. However, these risks increase with model capability, and we are working on establishing internal safeguards against them.
To be clear I am not criticizing OpenAI here; other people would have done this anyway even if they didn't. I'm just saying: It does seem like we are heading towards a world like the one depicted in What 2026 Looks Like where by the time AIs develop the capability to strategically steer the future in ways unaligned to human values... they are already roaming freely around the internet, learning...
For fun:
“I must not step foot in the politics. Politics is the mind-killer. Politics is the little-death that brings total obliteration. I will face my politics. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the politics has gone there will be nothing. Only I will remain.”
Makes about as much sense as the original quote, I guess. :P
Idea for sci-fi/fantasy worldbuilding: (Similar to the shields from Dune)
Suppose there is a device, about the size of a small car, that produces energy (consuming some fuel, of course) with overall characteristics superior to modern gasoline engines (so e.g. produces 3x as much energy per kg of device, using fuel that weighs 1/3rd as much as gasoline per unit of energy it produces)
Suppose further -- and this is the important part -- that a byproduct of this device is the creation of a special "inertial field" that slows down incoming matter to about 50m/s. It doesn't block small stuff, but any massive chunk of matter (e.g. anything the size of a pinhead or greater) that approaches the boundary of the field from the outside going faster than 50m/s gets slowed to 50m/s. The 'missing' kinetic energy is evenly distributed across the matter within the field. So if one of these devices is powered on and gets hit by a cannonball, the cannonball will slow down to a leisurely pace of 50m/s (about 100mph) and therefore possibly just bounce off whatever armor the device has--but (if the cannonball was initially travelling very fast) the device will jolt backwards in response to the 'virtual i...
I just found Eric Drexler's "Paretotopia" idea/talk. It seems great to me; it seems like it should be one of the pillars of AI governance strategy. It also seems highly relevant to technical AI safety (though that takes much more work to explain).
Why isn't this being discussed more? What are the arguments against it?
I heard a rumor that not that many people are writing reviews for the LessWrong 2019 Review. I know I'm not, haha. It feels like a lot of work and I have other things to do. Lame, I know. Anyhow, I'm struck by how academia's solution to this problem is bad, but still better than ours!
--In academia, the journal editor reaches out to someone personally to beg them to review a specific piece. This is psychologically much more effective than just posting a general announcement calling for volunteers.
--In academia, reviews are anonymous, so you can half-ass them and be super critical without fear of repercussions, which makes you more inclined to do it. (And more inclined to be honest too!)
Here are some ideas for things we could do:
--Model our process after Academia's process, except try to improve on it as well. Maybe we actually pay people to write reviews. Maybe we give the LessWrong team a magic Karma Wand, and they take all the karma that the anonymous reviews got and bestow it (plus or minus some random noise) to the actual authors. Maybe we have some sort of series of Review Parties where people gather together, chat and drink tasty beverages, and crank out reviews for a few hours.
In general I approve of the impulse to copy social technology from functional parts of society, but I really don't think contemporary academia should be copied by default. Frankly I think this site has a much healthier epistemic environment than you see in most academic communities that study similar subjects. For example, a random LW post with >75 points is *much* less likely to have an embarrassingly obvious crippling flaw in its core argument, compared to a random study in a peer-reviewed psychology journal.
Anonymous reviews in particular strike me as a terrible idea. Bureaucratic "peer review" in its current form is relatively recent for academia, and some of academia's most productive periods were eras where critiques came with names attached, e.g. the physicists of the early 20th century, or the Republic of Letters. I don't think the era of Elsevier journals with anonymous reviewers is an improvement—too much unaccountable bureaucracy, too much room for hidden politicking, not enough of the purifying fire of public argument.
If someone is worried about repercussions, which I doubt happens very often, then I think a better solution is to use a new pseudonym. (This isn't the ...
Maybe a tax on compute would be a good and feasible idea?
--Currently the AI community is mostly resource-poor academics struggling to compete with a minority of corporate researchers at places like DeepMind and OpenAI with huge compute budgets. So maybe the community would mostly support this tax, as it levels the playing field. The revenue from the tax could be earmarked to fund "AI for good" research projects. Perhaps we could package the tax with additional spending for such grants, so that overall money flows into the AI community, whilst reducing compute usage. This will hopefully make the proposal acceptable and therefore feasible.
--The tax could be set so that it is basically 0 for everything except for AI projects above a certain threshold of size, and then it's prohibitive. To some extent this happens naturally since compute is normally measured on a log scale: If we have a tax that is 1000% of the cost of compute, this won't be a big deal for academic researchers spending $100 or so per experiment (Oh no! Now I have to spend $1,000! No big deal, I'll fill out an expense form and bill it to the university) but it would be prohibitive for a corporat...
GPT-3 app idea: Web assistant. Sometimes people want to block out the internet from their lives for a period, because it is distracting from work. But sometimes one needs the internet for work sometimes, e.g. you want to google a few things or fire off an email or look up a citation or find a stock image for the diagram you are making. Solution: An app that can do stuff like this for you. You put in your request, and it googles and finds and summarizes the answer, maybe uses GPT-3 to also check whether the answer it returns seems like a good answer to the request you made, etc. It doesn't have to work all the time, or for all requests, to be useful. As long as it doesn't mislead you, the worst that happens is that you have to wait till your internet fast is over (or break your fast).
I don't think this is a great idea but I think there'd be a niche for it.
Charity-donation app idea: (ETA: If you want to make this app, reach out. I'm open to paying for it to exist.)
The app consists of a gigantic, full-screen button such that if you press it, the phone will vibrate and play a little satisfying "ching" sound and light up sparkles around where your finger hit, and $1 will be donated to GiveDirectly. You can keep slamming that button as much as you like to thereby donate as many dollars as you like.
In the corner there's a menu button that lets you change from GiveDirectly to Humane League or AMF or whatever (you can go into the settings and input the details for a charity of your choice, adding it to your personal menu of charity options, and then toggle between options as you see fit. You can also set up a "Donate $X per button press instead of $1" option and a "Split each donation between the following N charities" option.
Why is this a good idea:
I often feel guilty for eating out at restaurants. Especially when meat is involved. Currently I donate a substantial amount to charity on a yearly basis (aiming for 10% of income, though I'm not doing a great job of tracking that) but it feels like a chore, I have to remember to do it and then ...
A few years ago there was talk of trying to make Certificates of Impact a thing in EA circles. There are lots of theoretical reasons why they would be great. One of the big practical objections was "but seriously though, who would actually pay money to buy one of them? What would be the point? The impact already happened, and no one is going to actually give you the credit for it just because you paid for the CoI."
Well, now NFT's are a thing. I feel like CoI's suddenly seem a lot more viable!
Here's my AI Theory reading list as of 3/28/2022. I'd love to hear suggestions for more things to add! You may be interested to know that the lessons from this list are part of why my timelines are so short.
On scaling laws:
https://arxiv.org/abs/2001.08361 (Original scaling laws paper, contains the IMO super-important graph showing that bigger models are more data-efficient)
https://arxiv.org/abs/2010.14701 (Newer scaling laws paper, with more cool results and graphs, in particular graphs showing how you can extrapolate GPT performance seemingly forever)
https://www.youtube.com/watch?v=QMqPAM_knrE (Excellent presentation by Kaplan on the scaling laws stuff, also talks a bit about the theory of why it's happening)
Added 3/28/2022: https://towardsdatascience.com/deepminds-alphacode-explained-everything-you-need-to-know-5a86a15e1ab4 Nice summary of the AlphaCode paper, which itself is notable for more scaling trend graphs! :)
On the bayesian-ness and simplicity-bias of neural networks (which explains why scaling works and should be expected to continue, IMO):
https://www.lesswrong.com/posts/YSFJosoHYFyXjoYWa/why-neural-networks-generalise-and-why-they-are-kind-of (more like, the linked pos...
One thing I find impressive about GPT-3 is that it's not even trying to generate text.
Imagine that someone gave you a snippet of random internet text, and told you to predict the next word. You give a probability distribution over possible next words. The end.
Then, your twin brother gets a snippet of random internet text, and is told to predict the next word. Etc. Unbeknownst to either of you, the text your brother gets is the text you got, with a new word added to it according to the probability distribution you predicted.
Then we repeat with your triplet brother, then your quadruplet brother, and so on.
Is it any wonder that sometimes the result doesn't make sense? All it takes for the chain of words to get derailed is for one unlucky word to be drawn from someone's distribution of next-word prediction. GPT-3 doesn't have the ability to "undo" words it has written; it can't even tell its future self what its past self had in mind when it "wrote" a word!
EDIT: I just remembered Ought's experiment with getting groups of humans to solve coding problems by giving each human 10 minutes to work on it and then passing it on to the next. The results? Humans sucked. The overall process was way less effective than giving 1 human a long period to solve the problem. Well, GPT-3 is like this chain of humans!
For the past year I've been thinking about the Agent vs. Tool debate (e.g. thanks to reading CAIS/Reframing Superintelligence) and also about embedded agency and mesa-optimizers and all of these topics seem very related now... I keep finding myself attracted to the following argument skeleton:
Rule 1: If you want anything unusual to happen, you gotta execute a good plan.
Rule 2: If you want a good plan, you gotta have a good planner and a good world-model.
Rule 3: If you want a good world-model, you gotta have a good learner and good data.
Rule 4: Having good data is itself an unusual happenstance, so by Rule 1 if you want good data you gotta execute a good plan.
Putting it all together: Agents are things which have good planner and learner capacities and are hooked up to actuators in some way. Perhaps they also are "seeded" with a decent world-model to start off with. Then, they get a nifty feedback loop going: They make decent plans, which allow them to get decent data, which allows them to get better world-models, which allows them to make better plans and get better data so they can get great world-models and make great plans and... etc. (The best agents will also be improving on their learning and planning algorithms! Humans do this, for example.)
Empirical conjecture: Tools suck; agents rock, and that's why. It's also why agenty mesa-optimizers will arise, and it's also why humans with tools will eventually be outcompeted by agent AGI.
Rootclaim seems pretty awesome: About | Rootclaim
What is the source of COVID-19 (SARS-CoV-2)? | Rootclaim
I wonder how easy it would be to boost them somehow.
I came across this old Metaculus question, which confirms my memory of how my timelines changed over time:
30% by 2040 at first, then march 2020 I updated to 40%, then Aug 2020 I updated to 71%, then I went down a bit, and then now it's up to 85%. It's hard to get higher than 85% because the future is so uncertain; there are all sorts of catastrophes etc. that could happen to derail AI progress.
What caused the big jump in mid-2020 was sitting down to actually calculate my timelines in earnest. I ended up converging on something like the Bio Anchors framewor...
The International Energy Agency releases regular reports in which it forecasts the growth of various energy technologies for the next few decades. It's been astoundingly terrible at forecasting solar energy for some reason. Marvel at this chart:
This is from an article criticizing the IEA's terrible track record of predictions. The article goes on to say that there should be about 500GW of installed capacity by 2020. This article was published in 2020; a year later, the 2020 data is in, and it's actually 714 GW. Even the article criticizing the IEA for thei...
Eric Drexler has argued that the computational capacity of the human brain is equivalent to about 1 PFlop/s, that is, we are already past the human-brain-human-lifetime milestone. (Here is a gdoc.) The idea is that we can identify parts of the human brain that seem to perform similar tasks to certain already-existing AI systems. It turns out that e.g. 1-thousandth of the human brain is used to do the same sort of image processing tasks that seem to be handled by modern image processing AI... so then that means an AI 1000x bigger than said AI should be able...
Rereading this classic by Ajeya Cotra: https://www.planned-obsolescence.org/july-2022-training-game-report/
I feel like this is an example of a piece that is clear, well-argued, important, etc. but which doesn't seem to have been widely read and responded to. I'd appreciate pointers to articles/posts/papers that explicitly (or, failing that, implicitly) respond to Ajeya's training game report. Maybe the 'AI Optimists?'
On the contrary, I think the development model was bang on the money basically. As peterbarnett says Ajeya did forecast that there'd be a bunch of pre-training before RL. It even forecast that there'd be behavior cloning too after the pretraining and before the RL. And yeah, RL isn't happening on a massive scale yet (as far as we know) but I and others predict that'll change in the next few years.
I created a Manifold market for a topic of interest to me:
https://manifold.markets/DanielKokotajlo/gpt4-or-better-model-available-for?r=RGFuaWVsS29rb3Rhamxv
Has anyone done an expected value calculation, or otherwise thought seriously about, whether to save for retirement? Specifically, whether to put money into an account that can't be accessed (or is very difficult to access) for another twenty years or so, to get various employer matching or tax benefits?
I did, and came to the conclusion that it didn't make sense, so I didn't do it. But I wonder if anyone else came to the opposite conclusion. I'd be interested to hear their reasoning.
ETA: To be clear, I have AI timelines in mind here. I expect to be either ...
Years after I first thought of it, I continue to think that this chain reaction is the core of what it means for something to be an agent, AND why agency is such a big deal, the sort of thing we should expect to arise and outcompete non-agents. Here's a diagram:
Roughly, plans are necessary for generalizing to new situations, for being competitive in contests for which there hasn't been time for natural selection to do lots of optimization of policies. But plans are only as good as the knowledge they are based on. And knowledge doesn't come a priori; it nee...
In this post, Jessicata describes an organization which believes:
At the time, I didn't understand why an organization would believe that. I figured they thought they had some insights into the nature of intelligence or something, some special new architecture for AI designs, that would accele...
The other day I heard this anecdote: Someone's friend was several years ago dismissive of AI risk concerns, thinking that AGI was very far in the future. When pressed about what it would take to change their mind, they said their fire alarm would be AI solving Montezuma's Revenge. Well, now it's solved, what do they say? Nothing; if they noticed they didn't say. Probably if they were pressed on it they would say they were wrong before to call that their fire alarm.
This story fits with the worldview expressed in "There's No Fire Alarm for AGI." I expect this sort of thing to keep happening well past the point of no return.
I know it's just meaningless corporatespeak applause light, but it occurs to me that it's also technically incorrect -- the situation is more analogous to other forms of government (anarchy or dictatorship, depending on whether Amazon exercises any power) than to democracy (it's not like all the little builders get together and vote on laws that then apply to the builders who didn't vote or voted the other way.)
A nice thing about being a fan of Metaculus for years is that I now have hard evidence of what I thought about various topics back in the day. It's interesting to look back on it years later. Case in point: Small circles are my forecasts:
The change was, I imagine, almost entirely driven by the update in my timelines.
I speculate that drone production in the Ukraine war is ramping up exponentially and will continue to do so. This means that however much it feels like the war is all about drones right now, it'll feel much more that way a year from now. Both sides will be regularly sending flocks of shahed-equivalents at each other, both sides will have reinvented tactics to center around FPV kamikaze drones, etc. Maybe we'll even see specialized anti-drone drones dogfighting with each other, though since there aren't any of those yet they won't have appeared in large numbers.
I guess this will result in the "no man's land" widening even further, to like 10km or so. (That's about the maximum range of current FPV kamikaze drones)
Came across this short webcomic thingy on r/novelai. It was created entirely using AI-generated images. (Novelai I assume?) https://globalcomix.com/c/paintings-photographs/chapters/en/1/29
I wonder how long it took to make.
Just imagine what'll be possible this time next year. Or the year after that.
Notes on Tesla AI day presentation:
https://youtu.be/j0z4FweCy4M?t=6309 Here they claim they've got more than 10,000 GPUs in their supercomputer, and that this means their computer is more powerful than the top 5 publicly known supercomputers in the world. Consulting this list https://www.top500.org/lists/top500/2021/06/ it seems that this would put their computer at just over 1 Exaflop per second, which checks out (I think I had heard rumors this was the case) and also if you look at this https://en.wikipedia.org/wiki/Computer_performance_by_orders_of_magn...
In a recent conversation, someone said the truism about how young people have more years of their life ahead of them and that's exciting. I replied that everyone has the same number of years of life ahead of them now, because AI timelines. (Everyone = everyone in the conversation, none of whom were above 30)
I'm interested in the question of whether it's generally helpful or harmful to say awkward truths like that. If anyone is reading this and wants to comment, I'd appreciate thoughts.
Probably, when we reach an AI-induced point of no return, AI systems will still be "brittle" and "narrow" in the sense used in arguments against short timelines.
Argument: Consider AI Impacts' excellent point that "human-level" is superhuman (bottom of this page)
The point of no return, if caused by AI, could come in a variety of ways that don't involve human-level AI in this sense. See this post for more. The general idea is that being superhuman at some skills can compensate for being subhuman at others. We should expect the point of no return to be reache...
How much video data is there? It seems there is plenty:
This https://www.brandwatch.com/blog/youtube-stats/ says 500 hours of video are uploaded to youtube every minute. This https://lumen5.com/learn/youtube-video-dimension-and-size/ says standard definition for youtube video is 854x480 = 409920 pixels. At 48fps, that’s 3.5e13 pixels of data every minute. Over the course of a whole year, that’s +5 OOMs, it comes out to 1.8e19 pixels of data every year. So yeah, even if we use some encoding that crunches pixels down to 10x10 vokens or whatever,...
Some ideas for definitions of AGI / resolution criteria for the purpose of herding a bunch of cats / superforecasters into making predictions:
(1) Drop-in replacement for human remote worker circa 2023 (h/t Ajeya Cotra):
When will it first be the case that there exists an AI system which, if teleported back in time to 2023, would be able to function as a drop-in replacement for a human remote-working professional, across all* industries / jobs / etc.? So in particular, it can serve as a programmer, as a manager, as a writer, as an advisor, etc. a...
I remember being interested (and maybe slightly confused) when I read about the oft-bloody transition from hereditary monarchies to democracies and dictatorships. Specifically it interested me that so many smart, reasonable, good people seemed to be monarchists. Even during anarchic periods of civil war, the factions tended to rally around people with some degree of legitimate claim to the throne, instead of the whole royal lineage being abandoned and factions arising based around competence and charisma. Did these smart people literally believe in some so...
Came across this old (2004) post from Moravec describing the evolution of his AGI timelines over time. Kudos to him, I say. Compute-based predictions seem to have historically outperformed every other AGI forecasting method (at least the ones that were actually used), as far as I can tell.
What if Tesla Bot / Optimus actually becomes a big deal success in the near future (<6 years?) Up until recently I would be quite surprised, but after further reflection now I'm not so sure.
Here's my best "bull case:"
Boston Dynamics and things like this https://www.youtube.com/watch?v=zXbb6KQ0xV8 establish that getting robots to walk around over difficult terrain is possible with today's tech, it just takes a lot of engineering talent and effort.
So Tesla will probably succeed, within a few years, at building a humanoid robot that can walk around and pic...
Random idea: Hard Truths Ritual:
Get a campfire or something and a notepad and pencil. Write down on the pad something you think is probably true, and important, but which you wouldn't say in public due to fear of how others would react. Then tear off that piece of paper and toss it in the fire. Repeat this process as many times as you can for five minutes; this is a brainstorming session, so your metric for success is how many diverse ideas you have multiplied by their average quality.
Next, repeat the above except instead of "you wouldn't say in public..."...
I recommend The Meme Machine, it's a shame it didn't spawn a huge literature. I was thinking a lot about memetics before reading it, yet still I feel like I learned a few important things.
Anyhow, here's an idea inspired by it:
First, here is my favorite right way to draw analogies between AI and evolution:
Evolution : AI research over time throughout the world
Gene : Bit of code on Github
Organism : The weights of a model
Past experiences of an organism : Training run of a model
With that as background context, I can now present the idea.
With humans, memetic e...
I spent way too much time today fantasizing about metal 3D printers. I understand they typically work by using lasers to melt a fine layer of metal powder into solid metal, and then they add another layer of powder and melt more of it, and repeat, then drain away the unmelted powder. Currently they cost several hundred thousand dollars and can build stuff in something like 20cm x 20cm x 20cm volume. Well, here's my fantasy design for an industrial-scale metal 3D printer, that would probably be orders of magnitude better in every way, and thus hopefull...
Ballistics thought experiment: (Warning: I am not an engineer and barely remember my high school physics)
You make a hollow round steel shield and make it a vacuum inside. You put a slightly smaller steel disc inside the vacuum region. You make that second disc rotate very fast. Very fast indeed.
An incoming projectile hits your shield. It is several times thicker than all three layers of your shield combined. It easily penetrates the first layer, passing through to the vacuum region. It contacts the speedily spinning inner layer, and then things get crazy.....
Searching for equilibria can be infohazardous. You might not like the one you find first, but you might end up sticking with it (or worse, deviating from it and being punished). This is because which equilbrium gets played by other people depends (causally or, in some cases, acausally) not just on what equilibrium you play but even on which equilibria you think about. For reasons having to do with schelling points. A strategy that sometimes works to avoid these hazards is to impose constraints on which equilibria you think about, or at any rate to perform ...
Proposed Forecasting Technique: Annotate Scenario with Updates (Related to Joe's Post)
Science as a kind of Ouija board:
With the board, you do this set of rituals and it produces a string of characters as output, and then you are supposed to read those characters and believe what they say.
So too with science. Weird rituals, check. String of characters as output, check. Supposed to believe what they say, check.
With the board, the point of the rituals is to make it so that you aren't writing the output, something else is -- namely, spirits. You are supposed to be light and open-minded and 'let the spirit move you' rather than deliberately try ...
I just wanted to signal-boost this lovely "letter to 11-year-old self" written by Scott Aaronson. It's pretty similar to the letter I'd write to my own 11-year-old self. What a time to be alive!
Suppose you are the CCP, trying to decide whether to invade Taiwan soon. The normal-brain reaction to the fiasco in Ukraine is to see the obvious parallels and update downwards on "we should invade Taiwan soon."
But (I will argue) the big-brain reaction is to update upwards, i.e. to become more inclined to invade Taiwan than before. (Not sure what my all-things considered view is, I'm a bit leery of big-brain arguments) Here's why:
Consider this list of variables:
"One strong argument beats many weak arguments."
Several professors told me this when I studied philosophy in grad school. It surprised me at the time--why should it be true? From a Bayesian perspective isn't it much more evidence when there are a bunch of weak arguments pointing in the same direction, than when there is only one argument that is stronger?
Now I am older and wiser and have lots more experience, and this saying feels true to me. Not just in philosophy but in most domains, such as AGI timelines and cause prioritization.
Here are some speculatio...
In stories (and in the past) important secrets are kept in buried chests, hidden compartments, or guarded vaults. In the real world today, almost the opposite is true: Anyone with a cheap smartphone can roam freely across the Internet, a vast sea of words and images that includes the opinions and conversations of almost every community. The people who will appear in future history books are right now blogging about their worldview and strategy! The most important events of the next century are right now being accurately predicted by someone, somewhere, and...
Does the lottery ticket hypothesis have weird philosophical implications?
As I understand it, the LTH says that insofar as an artificial neural net eventually acquires a competency, it's because even at the beginning when it was randomly initialized there was a sub-network that happened to already have that competency to some extent at least. The training process was mostly a process of strengthening that sub-network relative to all the others, rather than making that sub-network more competent.
Suppose the LTH is true of human brains as well. Apparently at ...
From r/chatGPT: Someone prompted ChatGPT to "generate a meme that only AI would understand" and got this:
Might make a pretty cool profile pic for a GPT-based bot.
When God created the universe, He did not render false all those statements which are unpleasant to say or believe, nor even those statements which drive believers to do crazy or terrible things, because He did not exist. As a result, many such statements remain true.
Another hard-sci-fi military engineering idea: Shield against missile attacks built out of drone swarm.
Say you have little quadcopters that are about 10cm by 10cm square and contain about a bullet's worth of explosive charge. You use them to make a flying "blanket" above your forces; they fly in formation with about one drone per every square meter. Then if you have 1,000 drones you can make a 1,000 sq meter blanket and position it above your vehicles to intercept incoming missiles. (You'd need good sensors to detect the missiles and direct the nearest dro...
Another example of civilizational inadequacy in military procurement:
Russia is now using Iranian Shahed-136 micro-cruise-missiles. They cost $20,000 each. [Insert napkin math, referencing size of Russian and Ukrainian military budgets]. QED.
I would love to see an AI performance benchmark (as opposed to a more indirect benchmark, like % of GDP) that (a) has enough data that we can extrapolate a trend, (b) has a comparison to human level, and (c) trend hits human level in the 2030s or beyond.
I haven't done a search, it's just my anecdotal impression that no such benchmark exists. But I am hopeful that in fact several do.
Got any ideas?
Self-embedded Agent and I agreed on the following bet: They paid me $1000 a few days ago. I will pay them $1100 inflation adjusted if there is no AGI in 2030.
Ramana Kumar will serve as the arbiter. Under unforeseen events we will renegotiate in good-faith.
As a guideline for 'what counts as AGI' they suggested the following, to which I agreed:
..."the Arbiter agrees with the statement "there is convincing evidence that there is an operational Artificial General Intelligence" on 6/7/2030"
Defining an artificial general intelligence is a little hard and has a stro
Does anyone know why the Ukranian air force (and to a lesser extent their anti-air capabilities) are still partially functional?
Given the overwhelming numerical superiority of the Russian air force, plus all their cruise missiles etc., plus their satellites, I expected them to begin the conflict by gaining air supremacy and using it to suppress anti-air.
Hypothesis: Normally it takes a few days to do that, even for the mighty USAF attacking a place like Iraq; The Russian air force is not as overwhelmingly numerous vs. Ukraine so it should be expected to take at least a week, and Putin didn't want the ground forces to wait on the sidelines for a week.
Tonight my family and I played a trivia game (Wits & Wagers) with GPT-3 as one of the players! It lost, but not by much. It got 3 questions right out of 13. One of the questions it got right it didn't get exactly right, but was the closest and so got the points. (This is interesting because it means it was guessing correctly rather than regurgitating memorized answers. Presumably the other two it got right were memorized facts.)
Anyhow, having GPT-3 playing made the whole experience more fun for me. I recommend it. :) We plan to do this every year with whatever the most advanced publicly available AI (that doesn't have access to the internet) is.
When we remember we are all mad, all the mysteries disappear and life stands explained.
--Mark Twain, perhaps talking about civilizational inadequacy.
Came across this in a SpaceX AMA:
Q: I write software for stuff that isn't life or death. Because of this, I feel comfortable guessing & checking, copying & pasting, not having full test coverage, etc. and consequently bugs get through every so often. How different is it to work on safety critical software?
A: Having worked on both safety critical and non-safety critical software, you absolutely need to have a different mentality. The most important thing is making sure you know how your software will behave in all different scenarios. This a...
On a bunch of different occasions, I've come up with an important idea only to realize later that someone else came up with the same idea earlier. For example, the Problem of Induction/Measure Problem. Also modal realism and Tegmark Level IV. And the anti-souls argument from determinism of physical laws. There were more but I stopped keeping track when I got to college and realized this sort of thing happens all the time.
Now I wish I kept track. I suspect useful data might come from it. Like, my impression is that these scooped ideas tend to be scoope...
Two months ago I said I'd be creating a list of predictions about the future in honor of my baby daughter Artemis. Well, I've done it, in spreadsheet form. The prediction questions all have a theme: "Cyberpunk." I intend to make it a fun new year's activity to go through and make my guesses, and then every five years on her birthdays I'll dig up the spreadsheet and compare prediction to reality.
I hereby invite anybody who is interested to go in and add their own predictions to the spreadsheet. Also feel free to leave comments ...
I just wish to signal-boost this Metaculus question, I'm disappointed with the low amount of engagement it has so far: https://www.metaculus.com/questions/12840/existential-risk-from-agi-vs-agi-timelines/
I wonder SpaceX Raptor engines could be cost-effectively used to make VTOL cargo planes. Three engines + a small fuel tank to service them should be enough for a single takeoff and landing and should fit within the existing fuselage. Obviously it would weigh a lot and cut down on cargo capacity, but maybe it's still worth it? And you can still use the plane as a regular cargo plane when you have runways available, since the engines + empty tank don't weigh that much.
[Googles a bit] OK so it looks like maximum thrust Raptors burn through propellant at 600kg...
A current example of civilizational inadequacy in the realm of military spending:
The Ukrainian military has a budget of 4.6 billion Euro, so about $5 billion. (It also has several hundred thousand soldiers)
The Bayraktar TB2 is estimated to cost about $1-2 million. It was designed and built in Turkey and only about 300 or so have been made so far. As far as I can tell it isn't anything discontinuously great or fantastic, technology-wise. It's basically the same sort of thing the US has had for twenty years, only more affordable. (Presumably it's more afford...
I used to think that current AI methods just aren't nearly as sample/data - efficient as humans. For example, GPT-3 had to read 300B tokens of text whereas humans encounter 2 - 3 OOMs less, various game-playing AIs had to play hundreds of years worth of games to get gud, etc.
Plus various people with 20 - 40 year AI timelines seem to think it's plausible -- in fact, probable -- that unless we get radically new and better architectures, this will continue for decades, meaning that we'll get AGI only when we can actually train AIs on medium or long-horizon ta...
The 'poverty of stimulus' argument proves too much, and is just a rehash of the problem of induction, IMO. Everything that humans learn is ill-posed/underdetermined/vulnerable to skeptical arguments and problems like Duhem-Quine or the grue paradox. There's nothing special about language. And so - it all adds up to normality - since we solve those other inferential problems, why shouldn't we solve language equally easily and for the same reasons? If we are not surprised that lasso can fit a good linear model by having an informative prior about coefficients being sparse/simple, we shouldn't be surprised if human children can learn a language without seeing an infinity of every possible instance of a language or if a deep neural net can do similar things.
I think it is useful to distinguish between two dimensions of competitiveness: Resource-competitiveness and date-competitiveness. We can imagine a world in which AI safety is date-competitive with unsafe AI systems but not resource-competitive, i.e. the insights and techniques that allow us to build unsafe AI systems also allow us to build equally powerful safe AI systems, but it costs a lot more. We can imagine a world in which AI safety is resource-competitive but not date-competitive, i.e. for a few months it is possible to make unsafe powerful AI systems but no one knows how to make a safe version, and then finally people figure out how to make a similarly-powerful safe version and moreover it costs about the same.
Idea for how to create actually good AI-generated fiction:
Possible prerequisite: Decent fact-checker language model program / scaffold. Doesn't have to be perfect, but has to be able to grind away given a body of text (such as wikipedia) and a target sentence or paragraph to fact-check, and do significantly better than GPT4 would by itself if you asked it to write a new paragraph consistent with all the previous 50,000 tokens.
Idea 0: Self-consistent story generation
Add a block of text to the story, then fact-check that it is consistent with what came befor...
I lurk in the discord for The Treacherous Turn, a ttrpg made by some AI Safety Camp people I mentored. It's lovely. I encourage everyone to check it out.
Anyhow recently someone asked for ideas for Terminal Goals an AGI might have in a realistic setting; my answer is below and I'm interested to hear whether people here agree or disagree with it:
...Insofar as you want it to be grounded, which you might not want, here are some hypotheses people in AI alignment would toss around as to what would actually happen: (1) The AGI actually has exactly the goals and deon
I'd be curious to hear whether people agree or disagree with these dogmas:
Visible loss landscape basins don't correspond to distinct algorithms — LessWrong
...--Randomly initialized neural networks of size N are basically a big grab bag of random subnetworks of size <N
--Training tends to simultaneously modify all the subnetworks at once, in a sort of evolutionary process -- subnetworks that contributed to success get strengthened and tweaked, and subnetworks that contribute to failure get weakened.
--Eventually you have a network that performs very well in t
It's no longer my top priority, but I have a bunch of notes and arguments relating to AGI takeover scenarios that I'd love to get out at some point. Here are some of them:
Beating the game in May 1937 - Hoi4 World Record Speedrun Explained - YouTube
In this playthrough, the USSR has a brief civil war and Trotsky replaces Stalin. They then get an internationalist socialist type diplomat who is super popular with US, UK, and France, who negotiates passage of troops through their territory -- specifially, they send many many brigades of extremely low-tier troop...
Apparently vtubers are a thing! What interests me about this is that I vaguely recall reading futurist predictions many years ago that basically predicted vtubers. IIRC the predictions were more about pop stars and celebrities than video game streamers, but I think it still counts. Unfortunately I have no recollection where I read these predictions or what year they were made. Anyone know? I do distinctly remember just a few years ago thinking something like "We were promised virtual celebrities but that hasn't happened yet even though the tech exists. I guess there just isn't demand for it."
Historical precedents for general vs. narrow AI
(On the ships thing -- apparently the Indian Ocean trade was specialized prior to the Europeans, with cargo being transferred from one type of ship to another to handle different parts of the route, especially the red sea which was dangerous t...
Productivity app idea:
You set a schedule of times you want to be productive, and a frequency, and then it rings you at random (but with that frequency) to bug you with questions like:
--Are you "in the zone" right now? [Y] [N]
--(if no) What are you doing? [text box] [common answer] [ common answer] [...]
The point is to cheaply collect data about when you are most productive and what your main time-wasters are, while also giving you gentle nudges to stop procrastinating/browsing/daydream/doomscrolling/working-sluggishly, take a deep breath, reconsider your priorities for the day, and start afresh.
Probably wouldn't work for most people but it feels like it might for me.
I gave it a strong downvote, not because it’s a meme, but because it’s a really bad meme that at best adds nothing and at worst muddies the epistemic environment.
“They hate him because he tells them the truth” is a universal argument, therefore not an argument.
If it’s intended not as supporting Eliezer but as caricaturing his supporters, I haven’t noticed anyone worth noticing giving him such flawed support.
Or perhaps it’s intended as caricaturing people who caricature people who agree with Eliezer?
It could mean any of these things and it is impossible to tell which, without knowing through other channels your actual view, which reduces it to a knowing wink to those in on the know.
And I haven’t even mentioned the comparison of Eliezer to Jesus and “full speed ahead and damn the torpedoes” as the Sanhedrin.