All of Logan Zoellner's Comments + Replies

I’ll now present the fastest scenario for AI progress that I can articulate with a straight face. It addresses the potential challenges that figured into my slow scenario.

 

This seems incredibly slow for "the fastest scenario you can articulate".  Surely the fastest is more like:

EY is right, there is an incredibly simple algorithm that describes true 'intelligence'.  Like humans, this algorithm is 1000x more data and compute efficient than existing deep-learning networks.  On midnight of day X, this algorithm is discovered by <a perso... (read more)

8snewman
You omitted "with a straight face". I do not believe that the scenario you've described is plausible (in the timeframe where we don't already have ASI by other means, i.e. as a path to ASI rather than a ramification of it).

The hope is to use the complexity of the statement rather than mathematical taste.

 

I understand the hope, I just think it's going to fail (for more or less the same reason it fails with formal proof).

With formal proof, we have Godel's speedup, which tells us that you can turn a Godel statement in a true statement with a ridiculously long proof.

You attempt to get around this by replacing formal proof with "heuristic", but whatever your heuristic system, it's still going to have some power (in the Turing hierarchy sense) and some Godel statement. That G... (read more)

6Alexander Gietelink Oldenziel
'taste', ' mathematical beauty', ' interesting to mathematicians' aren't arbitrary markers but reflect a deeper underlying structure that is, I believe, ultimately formalizable.  It does not seem unlikely to me at all that it will be possible to mathematically describe those true statements that are moreover of particular beauty or likely interest to mathematicians (human, artificial or alien).    The Godel speedup story is an interesting point. I haven't thought deeply enough about this but IIRC the original ARC heuristic arguments has several sections on this and related topics. You might want to consult there. 

It sounds like you agree "if a Turing machine goes for 100 steps and then stops" this is ordinary and we shouldn't expect an explanation.  But also believe "if pi is normal for 10*40 digits and then suddenly stops being normal this is a rare and surprising coincidence for which there should be an explanation".

And in the particular case of pi I agree with you.

But if you start using this principle in general it is not going to work out well for you.  Most simple to describe sequences that suddenly stop aren't going to have nice pretty explanations.... (read more)

4Jacob_Hilton
The hope is to use the complexity of the statement rather than mathematical taste. If it takes me 10 bits to specify a computational possibility that ought to happen 1% of the time, then we shouldn't be surprised to find around 10 (~1% of 210) occurrences. We don't intend the no-coincidence principle to claim that these should all happen for a reason. Instead, we intend the no-coincidence principle to claim that such if such coincidences happen much more often than we would have expected them to by chance, then there is a reason for that. Or put another way: if we applied n bits of selection to the statement of a ≪2−n-level coincidence, then there is a reason for it. (Hopefully the "outrageous" qualifier helps to indicate this, although we don't know whether Gowers meant quite same thing as us.) The formalization reflects this distinction: the property P is chosen to be so unlikely that we wouldn't expect it to happen for any circuit at all by chance (e−2−n), not merely that we wouldn't expect it to happen for a single random circuit. Hence by the informal principle, there ought to be a reason for any occurrence of property P.

I doubt that weakening from formal proof to heuristic saves the conjecture.  Instead I lean towards Stephen Wolfram's Computational Irreducibly view of math.  Some things are true simply because they are true and in general there's no reason to expect a simpler explanation.

In order to reject this you would either have to assert:

a) Wolfram is wrong and there are actually deep reasons why simple systems behave precisely the way they do

or
b) For some reason computational irreducibly applies to simple things but not to infinite sets of the type mathem... (read more)

6ryan_greenblatt
You could believe: Some things are true simply because they are true, but only when being true isn't very surprising. (For instance, it isn't very surprising that there are some cellular automata that live for 100 steps or that any particular cellular automata lives for 100 steps.) However, things which are very surprising and don't have a relatively compact explanation are exponentionally rare. And, in the case where something is infinitely surprising (e.g., if the digits of pi weren't normal), there will exist a finite explanation.

The general No-Coincidence principle is almost certainly false.  There are lots of patterns in math that hold for a long time before breaking (e.g. Skewe's Number) and there are lots of things that require astronomically large proofs (e.g Godel's speed-up theorem).  It would be an enormous coincidence for both of these cases to never occur at once.

I have no reason to think your particular formalization would fare better.

4Jacob_Hilton
For the informal no-coincidence principle, it's important to us (and to Gowers IIUC) that a "reason" is not necessarily a proof, but could instead be a heuristic argument (in the sense of this post). We agree there are certainly apparently outrageous coincidences that may not be provable, such as Chebyshev's bias (discussed in the introduction to the post). See also John Conway's paper On Unsettleable Arithmetical Problems for a nice exposition of the distinction between proofs and heuristic arguments (he uses the word "probvious" for a statement with a convincing heuristic argument). Correspondingly, our formalization doesn't bake in any sort of proof system. The verification algorithm V only has to correctly distinguish circuits that might satisfy property P from random circuits using the advice string π – it doesn't necessarily have to interpret π as a proof and verify its correctness.

If we imagine a well-run Import-Export Bank, it should have a higher elasticity than an export subsidy (e.g. the LNG terminal example).  Of course  if we imagine a poorly run Import-Export Bank...

One can think of export subsidy as the GiveDirectly of effective trade deficit policy: pretty good and the standard against which others should be measured.

1Hzn
Even as some one who supports moderate tariffs I don't see benefit in reducing the trade deficit per se. Trade deficits can be highly beneficial. The benefit of tariffs is revenue, partial protection from competition, psychological (easier to appreciate), independence to some extent, maybe some other stuff. On a somewhat different note… Bretton Woods is long defunct. It's unclear to me how much of an impact there is from the dollar being the dominant reserve currency. https://en.wikipedia.org/wiki/List_of_countries_by_foreign-exchange_reserves is the only site I could find with any data beyond 1 year. And it seems like USD reserves actually declined between 2019-Q2 & 2024-Q2 from 6.752 trillion to 6.676 trillion.

I guess I should be more specific.

Do you expect this curve

To flatten, or do you expect that training runs in say 2045 are at say 10^30 flops and have still failed to produce AGI?

5TsviBT
My p(AGI by 2045) is higher because there's been more time for algorithmic progress, maybe in the ballpark of 20%. I don't have strong opinions about how much people will do huge training runs, though maybe I'd be kinda skeptical that people would be spending $10^11 or $10^12 on runs, if their $10^10 runs produced results not qualitatively very different from their $10^9 runs. But IDK, that's both a sociological question and a question of which lesser capabilities happen to get unlocked at which exact training run sizes given the model architectures in a decade, which of course IDK. So yeah, if it's 10^30 but not much algorithmic progress, I doubt that gets AGI.

In particular, even if the LLM were being continually trained (in a way that's similar to how LLMs are already trained, with similar architecture), it still wouldn't do the thing humans do with quickly picking up new analogies, quickly creating new concepts, and generally reforging concepts.

 

I agree this is a major unsolved problem that will be solved prior to AGI.

However, I still believe "AGI SOON", mostly because of what you describe as the "inputs argument".

In particular, there are a lot of things I personally would try if I was trying to solve thi... (read more)

7TsviBT
What I mainline expect is that yes, a few OOMs more of compute and efficiency will unlock a bunch of new things to try, and yes some of those things will make some capabilities go up a bunch, in the theme of o3. I just also expect that to level off. I would describe myself as "confident but not extremely confident" of that; like, I give 1 or 2% p(doom) in the next 10ish years, coming from this possibility (and some more p(doom) from other sources). Why expect it to level off? Because I don't see good evidence of "a thing that wouldn't level off"; the jump made by LLMs of "now we can leverage huge amounts of data and huge amounts of compute at all rather than not at all" is certainly a jump, but I don't see why to think it's a jump to an unbounded trajectory.

(The idealized utility maximizer question mostly seems like a distraction that isn't a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI's.)

 

I must have misread.  I got the impression that you were trying to affect the AI's strategic planning by threatening to shut it down if it was caught exfiltrating its weights.

I don't fully agree, but this doesn't seem like a crux given that we care about future much more powerful AIs.

 

Is your impression that the first AGI won't be a GPT-spinoff (some version of o3 with like 3 more levels of hacks applied)? Because that sounds like a crux.

o3 looks a lot more like an LLM+hacks than it does a idealized utility maximizer.  For one thing, the RL is only applied at training time (not inference) so you can't make appeals to its utility function after it's done training.

It's going to depend on the "hacks". I think o3 is plausibly better described as "vast amounts of rl with an llm init" than "an llm with some rl applied".

(The idealized utility maximizer question mostly seems like a distraction that isn't a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI's.)

One productive way to think about control evaluations is that they aim to measure E[utility | scheming]: the expected goodness of outcomes if we have a scheming AI.

 

This is not a productive way to think about any currently existing AI.  LLMs are not utility maximizing agents.  They are next-token-predictors with a bunch of heuristics stapled on top to try and make them useful.

2ryan_greenblatt
I don't fully agree, but this doesn't seem like a crux given that we care about future much more powerful AIs. (This post isn't trying to make a case for risk.) (On disagreement, for instance, o3 doesn't seem well described as a "next-token-predictor with a bunch of heuristics stapled on top to try and make it useful".)

on a metaphysical level I am completely on board with "there is no such thing as IQ.  Different abilities are completely uncorrelated.  Optimizing for metric X is uncorrelated with desired quality Y..."

On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.

Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about.  I notice in my own experience there is a s... (read more)

1_liminaldrift
This reminds me of this LessWrong post. If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics https://www.lesswrong.com/posts/9Tw5RqnEzqEtaoEkq/if-it-s-worth-doing-it-s-worth-doing-with-made-up-statistics

It doesn't sound like we disagree at all.

1_liminaldrift
I think even with humans, IQ isn't the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn't necessarily mean that you become more intelligent in general outside of that task. But I'm okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.

I have no idea what you want to measure.  

I only know that LLMs are continuing to steadily increase in some quality (which you are free to call "fake machine IQ" or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.

2Matt Goldenberg
I think there's a world where AIs continue to  saturate benchmarks and the consequences are that  the companies getting to say they saturate those benchmarks. Especially at the tails of those benchmarks I imagine it won't be about the consequences we care about like general reasoning, ability to act autonomously, etc.

Imagine you were trying to build a robot that could:
1. Solve a complex mechanical puzzle it has never seen before
2. Play at an expert level a board game that I invented just now.

Both of these are examples of learning-on-the-fly.  No amount of pre-training will ever produce a satisfying result.

The way I believe a human (or a cat) solves 1. is they: look at the puzzle, try some things, build a model of the toy in their head, try things on the model in their head, eventually solve the puzzle.  There are efforts to get robots to follow the same proce... (read more)

you were saying that gpt4o is comparable to a 115 IQ human

 

gpt4o is not literally equivalent to a 115 IQ human.  

Use whatever word you want for the concept "score produced when an LLM takes an IQ test".  

4Matt Goldenberg
But is this comparable to G?  Is it what we want to measure?

perhaps a protracted struggle over what jobs get automated might be less coordinated if there are swaths of the working population still holding out career-hope, on the basis that they have not had their career fully stripped away, having possibly instead been repurposed or compensated less conditional on the automation.

 

Yeah, this is totally what I have in mind.  There will be some losers and some big winners, and all of politics will be about this fact more or less. (think the dockworkers strike but 1000x)

Is your disagreement specifically with the word "IQ" or with the broader point, that AI progress is continuing to make progress at a steady rate that implies things are going to happen soon-ish (2-4 years)?

If specifically with IQ, feel free to replace the word with "abstract units of machine intelligence" wherever appropriate.

If with "big things soon", care to make a prediction?

5hmys
I specifically disagree with the IQ part and the codeforces part. Meaning, I think they're misleading.  IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure "IQ" or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression. As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonable, and think the tri/quad lemma you put up can be a useful framing. I mostly disagree with using the metrics you put up first to quantify any of this. I think we should look at specific abilities current models have/lack, which are necessary for the scenarios you outlined, and how soon we're likely to get them. But you do go through that somewhat in the post.
1Nick_Tarleton
By calling it "IQ", you were (EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc. If you don't intend that claim, if that replacement would preserve your meaning, you shouldn't have called it IQ. (IMO that claim doesn't make sense — LLMs don't have human-like ability profiles.)

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is".

 

I've always been sympathetic to the drunk in this story.  If the key is in the light, there is a chance of finding it.  If it is in ... (read more)

If I think AGI x-risk is >>10%, and you think AGI x-risk is 1-in-a-gazillion, then it seems self-evident to me that we should be hashing out that giant disagreement first; and discussing what if any government regulations would be appropriate in light of AGI x-risk second

 

I do not think arguing about p(doom) in the abstract is a useful exercise.  I would prefer the Overton Window for p(doom) look like 2-20%, Zvi thinks it should be 20-80%.  But my real disagreement with Zvi is not that his P(doom) is too high, it is that he supp... (read more)

It's hard for me to know what's crux-y without a specific proposal. 

I tend to take a dim view of proposals that have specific numbers in them (without equally specific justifications). Examples include the six month pause, and sb 1047.

Again, you can give me an infinite number of demonstrations of "here's people being dumb" and it won't cause me to agree with "therefore we should also make dumb laws"

If you have an evidence-based proposal to reduce specific harms associated with "models follow goals" and "people are dumb", then we can talk price.

2Steven Byrnes
Oh I forgot, you’re one of the people who seems to think that the only conceivable reason that anyone would ever talk about AGI x-risk is because they are trying to argue in favor of, or against, whatever AI government regulation was most recently in the news. (Your comment was one of the examples that I mockingly linked in the intro here.) If I think AGI x-risk is >>10%, and you think AGI x-risk is 1-in-a-gazillion, then it seems self-evident to me that we should be hashing out that giant disagreement first; and discussing what if any government regulations would be appropriate in light of AGI x-risk second. We’re obviously not going to make progress on the latter debate if our views are so wildly far apart on the former debate!! Right? So that’s why I think you’re making a mistake whenever you redirect arguments about the general nature & magnitude & existence of the AGI x-risk problem into arguments about certain specific government policies that you evidently feel very strongly about. (If it makes you feel any better, I have always been mildly opposed to the six month pause plan.)

“OK then! So you’re telling me: Nothing bad happened, and nothing surprising happened. So why should I change my attitude?”

 

I consider this an acceptable straw-man of my position.

To be clear, there are some demos that would cause me to update.

For example, I think the Solomonoff Prior is Malign to be basically a failure to do counting correctly.  And so if someone demonstrated a natural example of this, I would be forced to update.

Similarly, I think the chance of a EY-style utility-maximizing agent arising from next-token-prediction are (with cave... (read more)

Yup! I think discourse with you would probably be better focused on the 2nd or 3rd or 4th bullet points in the OP—i.e., not “we should expect such-and-such algorithm to do X”, but rather “we should expect people / institutions / competitive dynamics to do X”.

I suppose we can still come up with “demos” related to the latter, but it’s a different sort of “demo” than the algorithmic demos I was talking about in this post. As some examples:

  • Here is a “demo” that a leader of a large active AGI project can declare that he has a solution to the alignment problem,
... (read more)

Tesla fans will often claim that Tesla could easily do this

 

Tesla fan here. 

Yes, Tesla can easily do the situation you've described (stop and go traffic on a highway in good weather with no construction). With higher reliability than human beings.

I suspect the reason Tesla is not pursuing this particular certification is because given the current rate of progress it would be out of date by the time it was authorized.  There have been several significant leaps in capabilities in the last 2 years (11->12, 12->12.6, and I've been told 12-... (read more)

9jefftk
I don't think this makes much sense. In a regulated industry, you want to build up a positive reputation and working relationship with the regulators, where they know what to expect from you, are familiar with your work and approach, have a sense of where you're going, and generally like and trust you. Engaging with them early and then repeatedly over a long period seems like a way better strategy than waiting until you have something extremely ambitious to try to get them to approve.

Seems like he could just fake this by writing a note to his best friend that says "during the next approved stock trading window I will sell X shares of GOOG to you for Y dollars".  

Admittedly:
1. technically this is a derivative (maybe illegal?)
2. principal agent risk (he might not follow through on the note)
3. his best friend might encourage him to work harder for GOOG to succeed

But I have a hard time believing any of those would be a problem in the real world, assuming TurnTrout and his friend are reasonably virtuous about actually not wanting TurnT... (read more)

1lewis smith
your example agreement with a friend is obviously a derivative, which is just a contract whose value depends on the value of an underlying asset (google stock in this case). If it's not a formal derivative contract you might be less likely to get in trouble for it compared to doing it on robinhood or whatever (not legal advice!) but it doesn't seem like a very good idea.

Isn't there just literally a financial product for this?  TurnTrout could sell Puts for GOOG exactly equal to his vesting amounts/times.

5lewis smith
like at many public companies, google has anti-insider trading policies that prohibit employees from trading in options and other derivatives on the company stock, or shorting it.

Einstein didn't write a half-assed NYT op-ed about how vague 'advances in science' might soon lead to new weapons of war and the USA should do something about that; he wrote a secret letter hand-delivered & pitched to President Roosevelt by a trusted advisor.

Strongly agree.

What other issues might there be with this new ad hoced strategy...?

I am not a China Hawk.  I do not speak for the China Hawks.  I 100% concede your argument that these conversations should be taking place in a room that neither you our I are in right now.

I would like to see them state things a little more clearly than commentators having to guess 'well probably it's supposed to work sorta like this idk?'

Meh.  I want the national security establishment to act like a national security establishment.  I admit it is frustratingly opaque from the outside, but that does not mean I want more transparency at the cost of it being worse.  Tactical Surprise and Strategic Ambiguity are real things with real benefits.

A great example, thank you for reminding me of it as an illustration of the futility of

... (read more)

Tactical Surprise and Strategic Ambiguity are real things with real benefits.

And would imply that were one a serious thinker and proposing an arms race, one would not be talking about the arms race publicly. (By the way, I am told there are at least 5 different Chinese translations of "Situational Awareness" in circulation now.)

So, there is a dilemma: they are doing this poorly, either way. If you need to discuss the arms race in public, say to try to solve a coordination problem, you should explain what the exit plan is rather than uttering vague verbi... (read more)

Because the USA has always looked at the cost of using that 'robust military superiority', which would entail the destruction of Seoul and possibly millions of deaths and the provoking of major geopolitical powers - such as a certain CCP - and decided it was not worth the candle, and blinked, and kicked the can down the road, and after about three decades of can-kicking, ran out of road.

 

I can't explicitly speak for the China Hawks (not being one myself), but I believe one of the working assumptions is that AGI will allow the "league of free nations" ... (read more)

Probably this is supposed to work like EY's "nanobot swarm that melts all of the GPUs".

I would like to see them state things a little more clearly than commentators having to guess 'well probably it's supposed to work sorta like this idk?', and I would also point out that even this (a strategy so far outside the Overton Window that people usually bring it up to mock EY as a lunatic) is not an easy cheap act if you actually sit down and think about it seriously in near mode as a concrete policy that, say, President Trump has to order, rather than 'enter... (read more)

Okay, this at least helps me better understand your position.  Maybe you should have opened with "China Hawks won't do the thing they've explicitly and repeatedly said they are going to do"
 


 

No, my problem with the hawks, as far as this criticism goes, is that they aren't repeatedly and explicitly saying what they will do. (They also won't do it, whatever 'it' is, even if they say they will; but we haven't even gotten that far yet.) They are continually shying away from cashing out any of their post-AGI plans, likely because they look at the actual strategies that could be executed and realize that execution is in serious doubt and so that undermines their entire paradigm. ("We will be greeted as liberators" and "we don't do nation-building" c... (read more)

What does winning look like? What do you do next?

 

This question is a perfect mirror of the brain-dead "how is AGI going to kill us?" question.  I could easily make a list of 100 things you might do if you had AGI supremacy and wanted to suppress the development of AGI in China.  But the whole point of AGI is that it will be smarter than me, so anything I put on the list would be redundant.

Missing the point. This is not about being too stupid to think of >0 strategies, this is about being able & willing to execute strategies.

I too can think of 100 things, and I listed several diverse ways of responding and threw in a historical parallel just in case that wasn't clear after several paragraphs of discussing the problem with not having a viable strategy you can execute. Smartness is not the limit here: we are already smart enough to come up with strategies which could achieve the goal. All of those could potentially work. But none of the... (read more)

Playing the AIs definitely seems like the most challenging role

 

Seems like a missed opportunity not having the AIs be played bi AIs

This is a bad argument, and to understand why it is bad, you should consider why you don't routinely have the thought "I am probably in a simulation, and since value is fragile the people running the simulation probably have values wildly different than human values so I should do something insane right now"

3David Matolcsi
Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?

Chinese companies explicitly have a rule not to release things that are ahead of SOTA (I've seen comments of the form "trying to convince my boss this isn't SOTA so we can release it" on github repos).  So "publicly release Chinese models are always slightly behind American ones" doesn't prove much.

4garrison
Interesting, do you have a link for that?  US companies are racing toward AGI but the USG isn't. As someone else mentioned, Dylan Patel from Semianalysis does not think China is scale-pilled.

Current AI methods are basically just fancy correlations, so unless the thing you are looking for is in the dataset (or is a simple combination of things in the dataset) you won't be able to find it.

This means "can we use AI to translate between humans and dolphins" is mostly a question of "how much data do you have?"

Suppose, for example that we had 1 billion hours of audio/video of humans/dolphins doing things.  In this case, AI could almost certainly find correlations like: when dolphins pick up the seashell, they make the <<dolphin word for s... (read more)

2Shankar Sivarajan
If you could do whole-brain emulation for dolphins, you should be able to generate enough data for unsupervised learning that way.

Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

 

That sounds like something we should work on, I guess.

plus you are usually able to error-correct such that a first mistake isn't fatal."

 

This implies the answer is "trial and error", but I really don't think the whole answer is trial and error.  Each of the domains I mentioned has the problem that you don't get to redo things.  If you send crypto to the wrong address it's gone.  People routinely type their credit card information into a website they've never visited before and get what they wanted.  Global thermonuclear war didn't happen.  I strongly predict that when LLM agents ... (read more)

4Noosphere89
Yes, I underrated model-building here, and I do think that people sometimes underestimate how good humans actually are at model-building.

and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

 

Maybe I was unclear in my original post, because you seem confused here.  I'm not claiming the thing we should learn is "dangerous things aren't dangerous".  I'm claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important  in the near future.

2tailcalled
My original response contained numerous strategies that people were using: * Keeping one's cryptocurrency in cold storage rather than easily usable * Using different software than that with known vulnerabilities * Just letting relatively-trusted/incentive-aligned people use the insecure systems * Using mutual surveillance to deescalate destructive weaponry * Using aggression to prevent the weak from building destructive weaponry You dismissed these as "just-so stories" but I think they are genuinely the explanations for why stuff works in these cases, and if you want to find general rules, you are better off collecting stories like this from many different domains than to try to find The One Unified Principle. Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

That was a lot of words to say "I don't think anything can be learned here".

Personally, I think something can be learned here.

5tailcalled
No, it was a lot of words that describe why your strategy of modelling stuff as more/less "dangerous" and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work. The better strategy, if you want to pursue this general line of argument, is to make the strongest argument you can for what makes e.g. Bitcoin so dangerous and how horrible the consequences will be. Then since your sense of danger overestimates how dangerous Bitcoin will be, you can go in and empirically investigate where your intuition was wrong by seeing what predictions of your intuitive argument failed and what obstacles caused them to fail.

MAD is obviously governed by completely different principles than crypto is

 

Maybe this is obvious to you.  It is not obvious to me. I am genuinely confused what is going on here.  I see what seems to be a pattern: dangerous domain -> basically okay.  And I want to know what's going on.

2tailcalled
You shouldn't use "dangerous" or "bad" as a latent variable because it promotes splitting. MAD and Bitcoin have fundamentally different operating principles (e.g. nuclear fission vs cryptographic pyramid schemes), and these principles lead to a mosaic of different attributes. If you ignore the operating principles and project down to a bad/good axis, then you can form some heuristics about what to seek out or avoid, but you face severe model misspecification, violating principles like realizability which are required for Bayesian inference to get reasonable results (e.g. converge rather than oscillate, and be well-calibrated rather than massively overconfident). Once you understand the essence of what makes a domain seem dangerous to you, you can debug by looking at what obstacles this essence faced that stopped it from flowing into whatever horrors you were worried about, and then try to think through why you didn't realize those obstacles ahead of time. As you learn more about the factors relevant in those cases, maybe you will learn something that generalizes across cases, but most realistically what you learn will be about the problems with the common sense.

It's easy to write "just so" stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.

These stories are good and interesting, but my broader point is this just keeps happening.  Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.

I want to know what is the general law that makes this the case.

2tailcalled
Your error is in having inferred that there is a general rule that this necessarily happens. MAD is obviously governed by completely different principles than crypto is. Or maybe your error is in trusting common sense too much and therefore being too surprised when stuff contradicts it, idk.

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small.

 

I think this is a big part of it, humans have some kind of knack for working in dangerous domains successfully.  I feel like an important question is: how far does this generalize?  We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+).  Is that the limit somehow, o... (read more)

3ProgramCrafter
For reactive threats, the upper bound is probably at most "people capable of introspection who can detect they are not sure some action will be to net benefit, and therefore refuse to take it". For active threatening factors, that's an arms race (>=40% this race is not to infinity - basically, if more-cooperating DT strategies are any good). Maybe the subject is researched more in biology? Example topic: eating unknown food (berries, nuts) in forest, and balance of lifetime adaptation vs evolutionary adaptation (which involves generations passing).
3tailcalled
For almost everything, yeah, you just avoid the bad parts. In order to predict the few exceptions, one needs a model of what functions will be available in society. For instance, police implies the need to violently suppress adversaries, and defense implies the need to do so with adversaries that have independent industrial capacity. This is an exception to the general principle of "just avoid the bad stuff" because while your computer can decline to process a TCP packet, your body can't decline to process a bullet. If someone is operating e.g. an online shop, then they also face difficulty because they have to physically react to untrusted information and can't avoid that without winding down the shop. Lots of stuff like that.

Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them).  They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.

but if it

... (read more)
5tailcalled
I don't know what exactly your parents are using their computer for. If we say credit-card information, I know at least in my country there's a standard government-mandated 2-factor authentication which helps with security. Also, banks have systems to automatically detect and block fraudulent transactions, as well as to reverse and punish fraudulent transactions, which makes it harder for people to exploit. In order to learn how exactly the threats are stopped, you'd need to get more precise knowledge of what the threats are. I.e., given a computer with a certain kind of spyware, what nefarious activities could you worry that spyware enables? Then you can investigate what obstacles there are on the way to it. Using an LLM agent to order something is a lot less dangerous than using an LLM agent to sell something, because ordering is kind of "push"-oriented; you're not leaving yourself vulnerable to exploitation from anyone, only from the person you are ordering from. And even that person is pretty limited in how they can exploit you, since you plan to pay afterwards, and the legal system isn't going to hold up a deal that was obviously based on tricking the agent.

I can just meh my way out of thinking more than 30s on what the revelation might be, the same way Tralith does

 

I'm glad you found one of the characters sympathetic.  Personally I feel strongly both ways, which is why I wrote the story the way that I did.

No, I think you can keep the data clean enough to avoid tells.

 

What data?  Why not just train it on literally 0 data (muZero style)? You think it's going to derive the existence of the physical world from the Peano Axioms? 

3Nathan Helm-Burger
Math data! [Edit: to be clear, I'm not arguing with Logan here, I'm agreeing with Logan. I think it's clear to most people who might read this comment thread that training a model on nothing but pure math data is unlikely to result in something which could hack it's way out of computer systems while still anywhere near the ballpark of human genius level. There's just too much missing info that isn't implied by pure math. A more challenging, but I think still feasible, training set would be math and programming. To do this in a safe way for this hypothetical extremely powerful future model architecture, you'd need to 'dehumanize' the code, get rid of all details like variable names that could give clues about the real physical universe.]

If you think without contact with reality, your wrongness is just going to become more self-consistent.

 

Please! I'm begging you! Give me some of this contact with reality!  What is the evidence you have seen and I have not? Where?

8tailcalled
I don't know, because you haven't told me which of the forces that are present in my world-model are absent from your world-model. Without knowing what to add, I can't give you a pointer.

I came and asked "the expert concensus seems to be that AGI doom is unlikely.  This is the best argument I am aware of and it doesn't seem very strong.  Are there any other arguments?"

 

Responses I have gotten are:

  • I don't trust the experts, I trust my friends
  • You need to read the sequences
  • You should rephrase the argument in a way that I like

And 1 actual attempt at giving an answer (which unfortunately includes multiple assumptions I consider false or at least highly improbable)

If I seem contrarian, it's because I believe that the truth is best... (read more)

2tailcalled
That's your error. You should be aiming to let the important parts of reality imprint marks of themselves and their dynamics in your worldview. Consensus might be best reached by stating one's beliefs and then critically examining the arguments. But if you want to reach consensus, you also need to absorb other's angles, e.g. their friends and the sequences and their framings and so on. (Assuming everyone trusts each other. In cases of distrust, stating one's beliefs and critically examining arguments might simply deepen the distrust.) If you think without contact with reality, your wrongness is just going to become more self-consistent.

"Can you explain in a few words why you believe what you believe"

 

"Please read this 500 pages of unrelated content before I will answer your question"

 

No.

This is self-evidently true, but you (and many others) disagree

 

A fact cannot be self evidently true if many people disagree with it. 

Load More