Testing "True" Language Understanding in LLMs: A Simple Proposal

MtryaSam

The Core Idea

What if we could test whether language models truly understand meaning, rather than just matching patterns? Here's a simple thought experiment:

Create two artificial languages (A and B) that bijectively map to the same set of basic concepts R'
Ensure these languages are designed independently (no parallel texts)
Test if an LLM can translate between them without ever seeing translations

If successful, this would suggest the model has learned to understand the underlying meanings, not just statistical patterns between languages. Theoretically, if Language A and Language B each form true mappings (MA and MB) to the same concept space R', then the model should be able to perform translation through the composition MA·MB^(-1), effectively going from Language A to concepts and then to Language B, without ever seeing parallel examples. This emergent translation capability would be a strong indicator of genuine semantic understanding, as it requires the model to have internalized the relationship between symbols and meanings in each language independently.

Why This Matters

This approach could help distinguish between:

Surface-level pattern matching
Genuine semantic understanding
Internal concept representation

It's like testing if someone really understands two languages versus just memorizing a translation dictionary.

Some Initial Thoughts

Potential Setup

Start with a small, controlled set of basic concepts (colors, numbers, simple actions)
Design Language A with one set of rules/structure
Design Language B with completely different rules/structure
Both languages should map clearly to the same concepts without ambiguity

Example (Very Simplified)

Concept: "red circle"

Language A: "zix-kol" (where "zix" = red, "kol" = circle)
Language B: "nare-tup" (where "nare" = red, "tup" = circle)

Without ever showing the model that "zix-kol" = "nare-tup", can it figure out the translation by understanding that both phrases refer to the same concept?

Open Questions

How do we ensure the languages are truly independent?
What's the minimum concept space needed for a meaningful test?
How do we efficiently validate successful translations?

Limitations

As an undergraduate student outside the AI research community, I acknowledge:

This is an initial thought experiment
Implementation would require significant resources and expertise
Many practical challenges would need to be addressed

Call for Discussion

I'm sharing this idea in hopes that:

Researchers with relevant expertise might find it interesting
It could contribute to discussions about AI understanding
Others might develop or improve upon the concept

About Me

I'm an engineering student interested in AI understanding and alignment. While I may not have the resources to develop this idea fully, I hope sharing it might spark useful discussions or inspire more developed approaches.

Feedback Welcome

If you have thoughts, suggestions, or see potential in this idea, I'd love to hear from you. Please feel free to comment or reach out.

LESSWRONG
LW

-3