Logan Zoellner - LessWrong

China Hawks are Manufacturing an AI Arms Race

Chinese companies explicitly have a rule not to release things that are ahead of SOTA (I've seen comments of the form "trying to convince my boss this isn't SOTA so we can release it" on github repos). So "publicly release Chinese models are always slightly behind American ones" doesn't prove much.

Could we use current AI methods to understand dolphins?

Answer by Logan ZoellnerNov 10, 20241-1

Current AI methods are basically just fancy correlations, so unless the thing you are looking for is in the dataset (or is a simple combination of things in the dataset) you won't be able to find it.

This means "can we use AI to translate between humans and dolphins" is mostly a question of "how much data do you have?"

Suppose, for example that we had 1 billion hours of audio/video of humans/dolphins doing things. In this case, AI could almost certainly find correlations like: when dolphins pick up the seashell, they make the <<dolphin word for seashell>> sound, when humans pick up the seashell they make the <<human word for seashell>> sound. You could then do something like CLIP to find a mapping between <<human word for seashell>> and <<dolphin word for seashell>>. The magic step here is because we use the same embedding model for video in both cases, <<seashell>> is located at the same position in both our dolphin and human CLIP models.

But notice that I am already simplifying here. There is no such thing as <<human word for seashell>>. Instead, humans have many different languages. For example Papua New Guinea has over 800 languages in a land area of a mere 400k square kilometers. Because dolphins are living in what is essentially a hunter-gatherer existence, none of the pressures (trade, empire building) that cause human languages to span widespread areas exist. Most likely each pod of dolphins has at a minimum its own dialect. (one pastime I noticed when visiting the UK was that people there liked to compare how towns only a few miles apart had different words for the same things)

Dolphin lives are also much simpler than human lives, so their language is presumably also much simpler. Maybe like Eskimos have 100 words for snow, dolphins have 100 words for water. But it's much more likely that without the need to coordinate resources for complex tasks like tool-making, dolphins simply don't have as complex a grammar as humans do. Less complex grammar means less patterns means less for the machine learning to pick up on (machine learning loves patterns).

So, perhaps the correct analogy is: if we had a billion hours of audio/video of a particular tribe of humans and billion hours of a particular pod of dolphins we could feed it into a model like CLIP and find sounds with similar embeddings in both languages. As pointed out in other comments, it would help if the humans and dolphins were doing similar things, so for the humans you might want to pick a group that focused on underwater activities.

In reality (assuming AGI doesn't get there first, which seems quite likely), the fastest path to human-dolphin translation will take a hybrid approach. AI will be used to identify correlations in dolphin language. For example this study that claims to have identified vowels in whale speech. Once we have a basic mapping: dolphin sounds -> symbols humans can read, some very intelligent and very persistent human being will stare at those symbols, make guesses about what they mean, and then do experiments to verify those guesses. For example, humans might try replaying the sounds they think represent words/sentences to dolphins and seeing how they respond. This closely matches how new human languages are translated: a human being lives in contact with the speakers of the language for an extended period of time until they figure out what various words mean.

What would it take for an only-AI approach to replicate the path I just talked about (AI generates a dictionary of symbols that a human then uses to craft a clever experiment that uses the least amount of data possible)? Well, it would mean overcoming the data inefficiency of current machine learning algorithms. Comparing how many "input tokens" it takes to train a human child vs GPT-3, we can estimate that humans are ~1000x more data efficient than modern AI techniques.

Overcoming this barrier will likely require inference+search techniques where the AI uses a statistical model to "guess" at an answer and then checks that answer against a source of truth. One important metric to watch is the ARC prize, which intentionally has far less data than traditional machine learning techniques require. If ARC is solved, it likely means that AI-only dolphin-to-human translation is on its way (but it also likely means that AGI is immanent).

So, to answer your original question: "Could we use current AI methods to understand dolphins?" Yes, but doing so would require an unrealistically large amount of data and most likely other techniques will get there sooner.

What can we learn from insecure domains?

Logan Zoellner20d20

Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

That sounds like something we should work on, I guess.

What can we learn from insecure domains?

Logan Zoellner20d30

plus you are usually able to error-correct such that a first mistake isn't fatal."

This implies the answer is "trial and error", but I really don't think the whole answer is trial and error. Each of the domains I mentioned has the problem that you don't get to redo things. If you send crypto to the wrong address it's gone. People routinely type their credit card information into a website they've never visited before and get what they wanted. Global thermonuclear war didn't happen. I strongly predict that when LLM agents come out, most people will successfully manage to use them without first falling for a string of prompt-injection attacks and learning from trial-and-error what prompts are/aren't safe.

Humans are doing more than just trial and error, and figuring out what it is seems important.

What can we learn from insecure domains?

Logan Zoellner20d20

and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

Maybe I was unclear in my original post, because you seem confused here. I'm not claiming the thing we should learn is "dangerous things aren't dangerous". I'm claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important in the near future.

What can we learn from insecure domains?

Logan Zoellner20d10

That was a lot of words to say "I don't think anything can be learned here".

Personally, I think something can be learned here.

What can we learn from insecure domains?

Logan Zoellner20d30

MAD is obviously governed by completely different principles than crypto is

Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain -> basically okay. And I want to know what's going on.

What can we learn from insecure domains?

Logan Zoellner20d40

It's easy to write "just so" stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.

These stories are good and interesting, but my broader point is this just keeps happening. Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.

I want to know what is the general law that makes this the case.

What can we learn from insecure domains?

Logan Zoellner20d20

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small.

I think this is a big part of it, humans have some kind of knack for working in dangerous domains successfully. I feel like an important question is: how far does this generalize? We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+). Is that the limit somehow, or does this knack extend across even more orders of magnitude?

If imagine a world where 100 IQ humans are using an internet that contains malware written by 1000 IQ AGI, do humans just "avoid the bad parts"? What goes wrong exactly, and where?

What can we learn from insecure domains?

Logan Zoellner20d22

Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them). They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.

but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

My prediction is despite having glaring "security flaws" (prompt injection, etc) people will nonetheless use LLM agents for tons of stuff that common sense tells you shouldn't be doing in an insecure system.

I fully expect to live in a world where its BOTH true that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.

I want to know WHAT IS IT that makes it so things can be both deeply flawed and basically fine simultaneously.

LESSWRONG
LW

Posts

Wiki Contributions

Comments