Hello. I am a lurker, but I checked the search and didn't see anyone discussing Wittgenstein's ideas concerning the "essence" of language and his talk of "Language Games" so I thought I'd ask.

Wittgenstein is a linguistic philosopher, who in very brief terms clarified our usage of language. While many people conceived of language as clear distinct and obvious, Wittgenstein used the example of the word "game" to show how there is no consistent and encompassing definition for plenty of words we regularly use. Among other things, he observed that language rather exists as a web of connotations that depend and change with context, and that this connotation can only truly be understood when observing the use of language, rather than some detached definition. 

"Wittgenstein In Philosophical Context:

Essential Definition - Socrates, boil it down to its essence

Extensive Definition - Wittgenstein, use it in a sentence"

The above dichotomy frames the use of "words" by philosophers in a contradictory manner. And perhaps in qualia we do conceive them as different sorts of definitions, but imo it's just a matter of framing and we can readily say that "How you use a word in a sentence is itself the essence of a word". And intend the Ai therefore to conceive of the "essence" of words accordingly.

Descriptively speaking, Wittgenstein has always appeared unambiguously correct on this matter to me.


 

This all being said, I am wondering something relating to Wittgenstein:

Does AI safety, and AI engineers in general, have a similar conception of language? When CGPT reads a sentence, does it intentionally treat each word 's essence as some rigid unchanging thing derived from some dictionary definition, or as a web of connotation to other words? This might seem rather trivial, but when interpreting a prompt like "save my life" it seems clear why truly understanding each word's meaning is so important for potential AGI. So then, is Wittgenstein or rather this conception of language taken seriously and intentionally consciously implemented? Is there even an intention of ensuring that Ai truly consciously understands language? It seems like this is a prerequisite to actually ensuring any AGI we build is 100% aligned. If the language we use to communicate with the AGI is up to interpretation it seems alignment is simply obviously impossible.

New Comment
6 comments, sorted by Click to highlight new comments since:

I think the connection between language modeling and Wittgenstein is a pretty clear one to make for philosophers, dating back before LLMs to more simplistic models like Word2Vec. But the people implementing them were generally not thinking about Wittgenstein - although many of them were thinking about connectionist models of the mind.

You say "AI", though I'm assuming you're specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.

LLMs aren't programmed, they're trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).

The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is, and the difference between English and Spanish, purely from the text. None of that is explicitly programmed into it; the programmers have no say in the matter.

As far as how it comes to understands language, and how that related to Wittgenstein's thoughts on language, we don't know much at all. You can ask it. And we've done some experiments like that recent one with the LLM that was made to think it was the Golden Gate Bridge, which you probably heard about. But that's about it; we don't really know how LLMs "think" internally. (We know what's going on at a low-level, but not at a high-level.)

[-][anonymous]20

These are all interpretations I failed to contradict and so I can't really blame you for voicing them. 

 

That being said, I do understand all that you're saying, I do understand how modern Ai works, but I was under the impression that a large amount of "fine-tuning" by personal humans has been done for each of these "word predictors" (that we call LLM or GPT). 

Such that, sure, they are still primarily word predictors, but what words they will predict--thus what outputs the end user receives--has and will be refined and constrained to not contain "undesirable" things.

Undesirable things such as slurs or how to build a bomb--but in this case I'm asking about whether the LLM output will imply, use, or propagate incorrect understandings of language. 

 

The point being that because we are under the impression that Optimality will determine the ontology of the Ai (if it ever became an Agent or otherwise) intractably, you should ensure the Ai is Optimized for using and conceiving of language correctly, even if won't """consciously""" do so for a while. 

As you're probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?

I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.

[-][anonymous]10

who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?

This is one of the bigger reasons why I really don't like RLHF--because inevitably you're going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.

But, if it is the method used, I would have hoped that some minimum discussion of Linguistic Philosophy would've been had among those who are aligning this Ai. It's impossible for the Utility function of the Ai to be amenable to humans if it doesn't use language the same way, ESPECIALLY if Language is it's way of conceiving the word (LLM). Unfortunately, it looks like all this linguistic philosophy isn't even discussed. 

Hmm the more I learn about this whole Ai Alignment situation the more worried I get. Maybe I'll have to stop doing moral philosophy and get involved. 

 

I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples?

Wittgenstein, especially his earlier work, is nearly illegible to me. Of course it's not, it just takes a great many rereads of the same paragraphs to understand. 

Luckily, Philosophical Investigations is much more approachable and sensible. That being said, it can still be difficult for people not immersed in the field to readily digest. For that I'd recommend https://plato.stanford.edu/entries/wittgenstein/

and my favorite lecturer who did a fantastic accessible 45 min lesson on Wittgenstein:

This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.

What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of "Here's a completion that should be positively reinforced because it demonstrates correct understanding of language, and here's a completion of the same text that should be negatively reinforced because it demonstrates incorrect understanding of language"? (Bear in mind that the prompts shouldn't be about language, as that would probably just teach the model what to say when it's discussing language in particular.)

It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way

What makes you think that humans all use language the same way, if there's more than one plausible option? People are extremely diverse in their perspectives.