I am told that unsupervised machine translation is a thing. This is amazing. I ask: Could we use it to understand dolphin language? (Or whales, perhaps?)

I don't currently see a convincing reason why not. Maybe dolphins aren't actually that smart or communicative and their clicks are mostly just very simple commands or requests, but that should just make it really easy to do this. Maybe the blocker is that dolphins have such a different set of concepts than English that it would be too hard?

New Answer
New Comment

2 Answers sorted by

DanArmak

130

The approach of the linked article tries to match words meaning the same thing across languages by separately building a vector embedding of each language corpus and then looking for structural (neighborhood) similarity between the embeddings, with an extra global 'rotation' step mapping the two vector spaces on one another.

So if both languages have a word for "cat", and many other words related to cats, and the relationship between these words is the same in both languages (e.g. 'cat' is close to 'dog' in a different way than it is close to 'food'), then these words can be successfully translated.

But if one language has a tiny vocabulary compared to the other one, and the vocabulary isn't even a subset of the other language's (dolphins don't talk about cats), then you can't get far. Unless you have an English training dataset that only uses words that do have translations in Dolphin. But we don't know what dolphins talk about, so we can't build this dataset.

Also, this is machine learning on text with distinct words; do we even have a 'separate words' parser for dolphin signals?

If Language A and Language B have word embeddings that partially overlap and partially don't, that doesn't necessarily mean it's impossible to match the part that does overlap. After all, that always happens to some extent, even between English and French (i.e. not every English word corresponds to a single French word or vice-versa), but the matching is still possible. It would obviously be a much much more extreme non-overlap for English vs dolphin, and that certainly makes it less likely to work, but doesn't prove it impossible. (It might require changi

... (read more)
7DanArmak
I think the disparity in number of words is proportionally so large that this method won't work. The (small) hypothetical set of dolphin words wouldn't match to a small subset of English words, because what's being matched is really the (embedded) structure of the relationship between the words, and any sufficiently small subset of English words loses most of its interesting structure because its 'real' structure relates it to many words outside that subset. Support that dolphins (hypothetically! counterfactually! not realistically!) use only 10 words to talk about fish, but humans use 100 words to do the same. I expect you can't match the relationship structure of the 10 dolphin words to the much more complex structure of the 100 human words. But no subset of ~10 English words out of the 100 is a meaningful subset that humans could use to talk about fish.
3Daniel Kokotajlo
Thanks, I found that explanation very helpful.

So the blocker I mentioned. OK, thanks. Well, maybe we could make a translator between whales and dolphins then.

Or we could make a translator between a corpus of scuba diver conversations and dolphins.

We might be able to parse dolphin signals into separate words using ordinary unsupervised learning, no?

Why does the relative size of the vocabularies matter? I'd guess it would be irrelevant, the main factor would be how much overlap the two languages have. Maybe the absolute (as opposed to relative) sizes would matter.

Logan Zoellner

1-1

Current AI methods are basically just fancy correlations, so unless the thing you are looking for is in the dataset (or is a simple combination of things in the dataset) you won't be able to find it.

This means "can we use AI to translate between humans and dolphins" is mostly a question of "how much data do you have?"

Suppose, for example that we had 1 billion hours of audio/video of humans/dolphins doing things.  In this case, AI could almost certainly find correlations like: when dolphins pick up the seashell, they make the <<dolphin word for seashell>> sound, when humans pick up the seashell they make the <<human word for seashell>> sound.  You could then do something like CLIP to find a mapping between <<human word for seashell>> and <<dolphin word for seashell>>.  The magic step here is because we use the same embedding model for video in both cases, <<seashell>> is located at the same position in both our dolphin and human CLIP models.

But notice that I am already simplifying here.  There is no such thing as <<human word for seashell>>.  Instead, humans have many different languages.  For example Papua New Guinea has over 800 languages in a land area of a mere 400k square kilometers.  Because dolphins are living in what is essentially a hunter-gatherer existence, none of the pressures (trade, empire building) that cause human languages to span widespread areas exist.  Most likely each pod of dolphins has at a minimum its own dialect. (one pastime I noticed when visiting the UK was that people there liked to compare how towns only a few miles apart had different words for the same things)

Dolphin lives are also much simpler than human lives, so their language is presumably also much simpler.  Maybe like Eskimos have 100 words for snow, dolphins have 100 words for water.  But it's much more likely that without the need to coordinate resources for complex tasks like tool-making, dolphins simply don't have as complex a grammar as humans do.  Less complex grammar means less patterns means less for the machine learning to pick up on (machine learning loves patterns).  

So, perhaps the correct analogy is: if we had a billion hours of audio/video of a particular tribe of humans and billion hours of a particular pod of dolphins we could feed it into a model like CLIP and find sounds with similar embeddings in both languages.  As pointed out in other comments, it would help if the humans and dolphins were doing similar things, so for the humans you might want to pick a group that focused on underwater activities.

In reality (assuming AGI doesn't get there first, which seems quite likely), the fastest path to human-dolphin translation will take a hybrid approach.  AI will be used to identify correlations in dolphin language.  For example this study that claims to have identified vowels in whale speech.  Once we have a basic mapping: dolphin sounds -> symbols humans can read, some very intelligent and very persistent human being will stare at those symbols, make guesses about what they mean, and then do experiments to verify those guesses.  For example, humans might try replaying the sounds they think represent words/sentences to dolphins and seeing how they respond.  This closely matches how new human languages are translated: a human being lives in contact with the speakers of the language for an extended period of time until they figure out what various words mean.

What would it take for an only-AI approach to replicate the path I just talked about (AI generates a dictionary of symbols that a human then uses to craft a clever experiment that uses the least amount of data possible)?  Well, it would mean overcoming the data inefficiency of current machine learning algorithms.  Comparing how many "input tokens" it takes to train a human child vs GPT-3, we can estimate that humans are ~1000x more data efficient than modern AI techniques.  

Overcoming this barrier will likely require inference+search techniques where the AI uses a statistical model to "guess" at an answer and then checks that answer against a source of truth.  One important metric to watch is the ARC prize, which intentionally has far less data than traditional machine learning techniques require.  If ARC is solved, it likely means that AI-only dolphin-to-human translation is on its way (but it also likely means that AGI is immanent).

So, to answer your original question: "Could we use current AI methods to understand dolphins?"  Yes, but doing so would require an unrealistically large amount of data and most likely other techniques will get there sooner.

If you could do whole-brain emulation for dolphins, you should be able to generate enough data for unsupervised learning that way.

3 comments, sorted by Click to highlight new comments since:

Now it's been four years of fast AI progress; I wonder if there are any updates? Has anyone tried to use machine learning to translate nonhuman communications?

This project seems to be trying to translate whale language.

I've learned a weird amount about whales from here this week.

If unsupervised translation is possible for creatures with a language as different from ours as whales, that would be amazing. Especially if it could be done without monitoring their behaviors (although that might be asking for too much)