I believe this has been done in Google's Multilingual Neural Machine Translation (GNMT) system that enables zero-shot translations (translating between language pairs without direct training examples). This system leverages shared representations across languages, allowing the model to infer translations for unseen language pairs.
I made basically the same proposal here, but phrased as a task of translating between a long alien message and human languages: https://www.lesswrong.com/posts/J3zA3T9RTLkKYNgjw/is-llm-translation-without-rosetta-stone-possible See also the comments, which contain a reference to a paper with a related approach on unsupervised machine translation. Also this comment echoes your post:
I think this is a really interesting question since it seems like it should neatly split the "LLMs are just next token predictors" crows from the "LLMs actually display understanding" crowd.
The Core Idea
What if we could test whether language models truly understand meaning, rather than just matching patterns? Here's a simple thought experiment:
If successful, this would suggest the model has learned to understand the underlying meanings, not just statistical patterns between languages. Theoretically, if Language A and Language B each form true mappings (MA and MB) to the same concept space R', then the model should be able to perform translation through the composition MA·MB^(-1), effectively going from Language A to concepts and then to Language B, without ever seeing parallel examples. This emergent translation capability would be a strong indicator of genuine semantic understanding, as it requires the model to have internalized the relationship between symbols and meanings in each language independently.
Why This Matters
This approach could help distinguish between:
It's like testing if someone really understands two languages versus just memorizing a translation dictionary.
Some Initial Thoughts
Potential Setup
Example (Very Simplified)
Concept: "red circle"
Without ever showing the model that "zix-kol" = "nare-tup", can it figure out the translation by understanding that both phrases refer to the same concept?
Open Questions
Limitations
As an undergraduate student outside the AI research community, I acknowledge:
Call for Discussion
I'm sharing this idea in hopes that:
About Me
I'm an engineering student interested in AI understanding and alignment. While I may not have the resources to develop this idea fully, I hope sharing it might spark useful discussions or inspire more developed approaches.
Feedback Welcome
If you have thoughts, suggestions, or see potential in this idea, I'd love to hear from you. Please feel free to comment or reach out.