Comment Permalink

tailcalled5mo2-2

You shouldn't use "dangerous" or "bad" as a latent variable because it promotes splitting. MAD and Bitcoin have fundamentally different operating principles (e.g. nuclear fission vs cryptographic pyramid schemes), and these principles lead to a mosaic of different attributes. If you ignore the operating principles and project down to a bad/good axis, then you can form some heuristics about what to seek out or avoid, but you face severe model misspecification, violating principles like realizability which are required for Bayesian inference to get reasonable results (e.g. converge rather than oscillate, and be well-calibrated rather than massively overconfident).

Once you understand the essence of what makes a domain seem dangerous to you, you can debug by looking at what obstacles this essence faced that stopped it from flowing into whatever horrors you were worried about, and then try to think through why you didn't realize those obstacles ahead of time. As you learn more about the factors relevant in those cases, maybe you will learn something that generalizes across cases, but most realistically what you learn will be about the problems with the common sense.

See in context

14 What can we learn from insecure domains?

by Logan Zoellner

1st Nov 2024

1 min read

14

Cryptocurrency is terrible. With a single click of a button, it is possible to accidentally lose all of your funds. 99.9% of all cryptocurrency projects are complete scams (conservative estimate). Crypto is also tailor-made for ransomware attacks, since it makes it possible to send money in such a way that the receiver has perfect anonymity.

Similarly, Cyber Security is terrible. Basically every computer on the internet is infected with multiple types of malware. If you have ever owned a web-server with a public IPV4 address, you undoubtedly have had the pleasure of viewing a log file that looks like this:

Every IPV4 address on earth is under constant attack by malware

In a few months, the world is about to be introduced to a brand new insecure by design platform, the LLM agent:

No one worth taking seriously believes that Microsoft Copilot (or Anthropic, or any other LLM agent) is going to be remotely secure against prompt injection attacks.

One fascinating thing (to me) about these examples is that they all basically work fine^[1]. Despite being completely broken, normal people with normal intelligence use these systems routinely without losing 100% of their funds. This happens despite the fact that people with above-average intelligence have a financial incentive to take advantage of these security flaws.

One possible conclusion is along the lines of "everything humanity has ever built is constantly on fire. We must never built something existentially dangerous or we're already dead."

However we already did:

And like everything else, the story of nuclear weapons is that they are horribly insecure and error prone.

What I want to know is why? Why is it that all of these systems, despite being hideously error prone and blatantly insecure by design somehow still work?

I consider each of these systems (and many like them) a sort of standing challenge to the fragile world hypothesis. If the world is so fragile, why does it keep not ending?

^{^}
If anyone would like to make a bet, I predict 2 years from now LLM agents:
1. will be vulnerable to nearly trivial forms of prompt-injection
2. Millions of people will use them to do things like spend money that common-sense tells you not to do on a platform this insecure by design

Site MetaWorld Optimization

Frontpage

14

New Comment

21 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:14 PM

[-]ProgramCrafter5mo72

99.9% of all cryptocurrency projects are complete scams (conservative estimate).

On first skim, I agree with the estimate as stated and would post a limit order for either side. I'd also like to note that "crypto in general is terrible" instead of "all crypto is terrible", as there have been applications developed that do not allow you to lose all funds without explicit acknowledgement.

Similarly, Cyber Security is terrible. Basically every computer on the internet is infected with multiple types of malware.

It is presumably terrible (or, 30%, result of availability bias), and I've observed bugs happen because functionality upgrade did not consider its interaction with all other code. However, I disagree that every computer is infected; probably you meant that it is under constant stream of attack attempts?

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small. As a matter of IT, it would be really nice to have systems which don't logically fail at all, but that requires good education and pressure-resistance skills for software developers.

[-]Logan Zoellner5mo20

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small.

I think this is a big part of it, humans have some kind of knack for working in dangerous domains successfully. I feel like an important question is: how far does this generalize? We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+). Is that the limit somehow, or does this knack extend across even more orders of magnitude?

If imagine a world where 100 IQ humans are using an internet that contains malware written by 1000 IQ AGI, do humans just "avoid the bad parts"? What goes wrong exactly, and where?

[-]ProgramCrafter5mo30

I feel like an important question is: how far does this generalize? We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+). Is that the limit somehow, or does this knack extend across even more orders of magnitude?
If imagine a world where 100 IQ humans are using an internet that contains malware written by 1000 IQ AGI, do humans just "avoid the bad parts"?

For reactive threats, the upper bound is probably at most "people capable of introspection who can detect they are not sure some action will be to net benefit, and therefore refuse to take it". For active threatening factors, that's an arms race (>=40% this race is not to infinity - basically, if more-cooperating DT strategies are any good).

Maybe the subject is researched more in biology? Example topic: eating unknown food (berries, nuts) in forest, and balance of lifetime adaptation vs evolutionary adaptation (which involves generations passing).

[-]tailcalled5mo31

For almost everything, yeah, you just avoid the bad parts.

In order to predict the few exceptions, one needs a model of what functions will be available in society. For instance, police implies the need to violently suppress adversaries, and defense implies the need to do so with adversaries that have independent industrial capacity. This is an exception to the general principle of "just avoid the bad stuff" because while your computer can decline to process a TCP packet, your body can't decline to process a bullet.

If someone is operating e.g. an online shop, then they also face difficulty because they have to physically react to untrusted information and can't avoid that without winding down the shop. Lots of stuff like that.

[-]tailcalled5mo62

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

[-]Logan Zoellner5mo22

Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them). They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.

but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

My prediction is despite having glaring "security flaws" (prompt injection, etc) people will nonetheless use LLM agents for tons of stuff that common sense tells you shouldn't be doing in an insecure system.

I fully expect to live in a world where its BOTH true that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.

I want to know WHAT IS IT that makes it so things can be both deeply flawed and basically fine simultaneously.

[-]tailcalled5mo50

Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them). They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.

I don't know what exactly your parents are using their computer for.

If we say credit-card information, I know at least in my country there's a standard government-mandated 2-factor authentication which helps with security. Also, banks have systems to automatically detect and block fraudulent transactions, as well as to reverse and punish fraudulent transactions, which makes it harder for people to exploit.

In order to learn how exactly the threats are stopped, you'd need to get more precise knowledge of what the threats are. I.e., given a computer with a certain kind of spyware, what nefarious activities could you worry that spyware enables? Then you can investigate what obstacles there are on the way to it.

I fully expect to live in a world where its BOTH true that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.

Using an LLM agent to order something is a lot less dangerous than using an LLM agent to sell something, because ordering is kind of "push"-oriented; you're not leaving yourself vulnerable to exploitation from anyone, only from the person you are ordering from. And even that person is pretty limited in how they can exploit you, since you plan to pay afterwards, and the legal system isn't going to hold up a deal that was obviously based on tricking the agent.

[-]Logan Zoellner5mo40

It's easy to write "just so" stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.

These stories are good and interesting, but my broader point is this just keeps happening. Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.

I want to know what is the general law that makes this the case.

[-]tailcalled5mo20

Your error is in having inferred that there is a general rule that this necessarily happens. MAD is obviously governed by completely different principles than crypto is. Or maybe your error is in trusting common sense too much and therefore being too surprised when stuff contradicts it, idk.

[-]Logan Zoellner5mo30

MAD is obviously governed by completely different principles than crypto is

Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain -> basically okay. And I want to know what's going on.

[-]tailcalled5mo2-2

[-]Logan Zoellner5mo10

That was a lot of words to say "I don't think anything can be learned here".

Personally, I think something can be learned here.

[-]tailcalled5mo52

No, it was a lot of words that describe why your strategy of modelling stuff as more/less "dangerous" and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

The better strategy, if you want to pursue this general line of argument, is to make the strongest argument you can for what makes e.g. Bitcoin so dangerous and how horrible the consequences will be. Then since your sense of danger overestimates how dangerous Bitcoin will be, you can go in and empirically investigate where your intuition was wrong by seeing what predictions of your intuitive argument failed and what obstacles caused them to fail.

[-]Logan Zoellner5mo20

and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

Maybe I was unclear in my original post, because you seem confused here. I'm not claiming the thing we should learn is "dangerous things aren't dangerous". I'm claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important in the near future.

[-]tailcalled5mo20

My original response contained numerous strategies that people were using:

Keeping one's cryptocurrency in cold storage rather than easily usable
Using different software than that with known vulnerabilities
Just letting relatively-trusted/incentive-aligned people use the insecure systems
Using mutual surveillance to deescalate destructive weaponry
Using aggression to prevent the weak from building destructive weaponry

You dismissed these as "just-so stories" but I think they are genuinely the explanations for why stuff works in these cases, and if you want to find general rules, you are better off collecting stories like this from many different domains than to try to find The One Unified Principle. Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

[-]Logan Zoellner5mo20

Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

That sounds like something we should work on, I guess.

[-]Rob Lucas5mo10

I think tailcalled's point here is an important one. You've got very different domains with very different dynamics, and it's not apriori obvious that the same general principle is involved in making all of these at first glance dangerous systems relatively safe. It's not even clear to me that they are safer than you'd expect. Of course that depends on how safe you'd expect them to be.

Many people have lost their money from crypto scams. Catastrophic nuclear war hasn't happened yet, but it seems like we may have had some close calls, and looked at on a chance/year basis it still seems we're in a bad equilibrium. It's not at all clear that nuclear weapons are safer than we'd naively assume. Cybersecurity issues haven't destroyed the global economy, but, for instance on the order of a hundred of billion dollars of pandemic relief funds were stolen by scammers.

That said, if I were looking for a general principle that might be at play in all of these cases I'd look at something like offensive/defense balance.

[-]tailcalled5mo33

Offense/defense balance can be handled just by ensuring security via offense rather than via defense.

I guess as a side-note, I think it's better to study oxidation, the habitable zone, famines, dodo extinction, etc. if one needs something beyond the basic "dangerous domains" that are mentioned in the OP.

[-]Noosphere895mo50

To answer the question posed, I'd say the answer to the question posed is "Because our security isn't nearly as on fire as people think, plus you are usually able to error-correct such that a first mistake isn't fatal."

For the computer security point, there's a pretty large bias towards treating people that predict disaster as having more credit on the evidence then they do.

lc points out the general reasons for why, and anonymousaisafety points out that a large class of security bugs do not in fact work as useful exploits in practice, contra what Rowhammer worried people say:

https://www.lesswrong.com/posts/xsB3dDg5ubqnT7nsn/poc-or-or-gtfo-culture-as-partial-antidote-to-alignment

https://www.lesswrong.com/posts/etNJcXCsKC6izQQZj/pivotal-outcomes-and-pivotal-processes#ogt6CZkMNZ6oReuTk

For the cryptocurrency example, it's fundamental basis is flawed, because it doesn't have the properties a good money should have, and the money component is in practice enables only speculation about it's value, and all of the good things crypto does have in reality would likely involve removing the money component, or using only play/fictional money that is allowed to increase.

Somewhat more generally, a big portion of the answer is that we can in fact learn from failures, and slowly based on empirical experience figure out ways for them to succeed, and a lot of proposed blockers to empirical testing turn out not to be nearly as much of a blocker as people say it is.

[-]Logan Zoellner5mo30

plus you are usually able to error-correct such that a first mistake isn't fatal."

This implies the answer is "trial and error", but I really don't think the whole answer is trial and error. Each of the domains I mentioned has the problem that you don't get to redo things. If you send crypto to the wrong address it's gone. People routinely type their credit card information into a website they've never visited before and get what they wanted. Global thermonuclear war didn't happen. I strongly predict that when LLM agents come out, most people will successfully manage to use them without first falling for a string of prompt-injection attacks and learning from trial-and-error what prompts are/aren't safe.

Humans are doing more than just trial and error, and figuring out what it is seems important.

[-]Noosphere895mo40

Yes, I underrated model-building here, and I do think that people sometimes underestimate how good humans actually are at model-building.

Moderation Log