Emile comments on Open Thread: March 4 - 10 - Less Wrong

3 Post author: Coscott 04 March 2014 03:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (391)

You are viewing a single comment's thread. Show more comments above.

Comment author: lucidian 05 March 2014 03:47:38AM 8 points [-]

Cog sci question about how words are organized in our minds.

So, I'm a native English speaker, and for the last ~1.5 years, I've been studying Finnish as a second language. I was making very slow progress on vocabulary, though, so a couple days ago I downloaded Anki and moved all my vocab lists over to there. These vocab lists basically just contained random words I had encountered on the internet and felt like writing down; a lot of them were for abstract concepts and random things that probably won't come up in conversation, like "archipelago" (the Finnish word is "saaristo", if anyone cares). Anyway, the point is that I am not trying to learn the vocabulary in any sensible order, I'm just shoving random words into my brain.

While studying today, I noticed that I was having a lot more trouble with certain words than with others, and I started to wonder why, and what implications this has for how words are organized in our minds, and whether anyone has done studies on this.

For instance, there seemed to be a lot of "hash collisions": vocabulary words that I kept confusing with one another. Some of these were clearly phonetic: hai (shark) and kai (probably). Another phonetic pair: toivottaa (to wish) and taivuttaa (to inflect a word). Some were a combination of phonetic and semantic: virhe (error), vihje (hint), vaihe (phase, stage), and vika (fault). Some of them I have no idea why I kept confusing: kertautua (to recur) and kuvastaa (to mirror, to reflect).

There were also a few words that I just had inordinate amounts of trouble remembering, and I don't know why: eksyä (to get lost), ehtiä (to arrive in time), löytää (to find), kyllästys (saturation), sisältää (to include), arvata (to guess). Aside from the last one, all of these have the letter ä in them, so maybe that has something to do with it. Also, the first two words don't have a single English verb as an equivalent.

There were also some words that were easier than I expected: vankkuri (wagon), saaristo (archipelago), and some more that I don't remember now because they quickly vanished from my deck. Both of these words are unusual but concrete concepts.

Do different people struggle with the same words when learning a language? Are some Finnish words just inherently "easy" or "hard" for English speakers to learn? If it's different for each person, how does the ease of learning certain words relate to a person's life experiences, interests, common thoughts, etc.?

What do hash collisions tell us about how words are organized in our minds? Can they tell us anything about the features we might be using to recognize words? For instance, English speakers often seem to have trouble remembering and distinguishing Chinese names; they all seem to "sound the same". Why does this happen? Here's a hypothesis: when we hear a word, based on its features, it is mapped to a specific part of a learned phonetic space before being used to access semantic content. Presumably we would learn this phonetic space to maximize the distance between words in a language, since the farther apart words are, the less chance they have of accessing the wrong semantic content. Maybe certain Finnish words sound the same to me because they map to nearby regions of my phonetic space, but a speaker of some other language wouldn't confuse these particular words because they'd have a different phonetic space? I'm just speculating wildly here.

I'd be interested to hear everyone else's vocab-learning experiences and crazy hypotheses for what's going on. Also, does anyone know any actual research that's been done on this stuff?

Comment author: Emile 05 March 2014 08:52:48AM 2 points [-]

I tend to think of this in terms of compression: you can use various compression schemes to store english words in fewer bits, but that will make you store foreign words in more bits. For example, you could order letters by frequency and represent frequent letters with fewer bits. You can do the same with groups of letters (e.g. "thing" = "th" + "ing", both very frequent combinations in English), or take advantage of conditional probabilities ('t' much more likely to be followed by 'h' than 'n') to squeeze a few more bits of compression. Similarly, if a westerner wanted to describe the Chinese character 語 without any prior knowledge of Chinese, the description would be very long, but a Chinese speaker would describe it as "the key for speech, and a five above a mouth".

This is just another way of describing what you call phonetic space.

Simple issues of frequency makes learners see words as "closer" than native speakers do, another problem is when the "phonetic space" of one language has more(or different) dimensions than those of another; e.g. many people find it hard to learn words when the distinction between voiced and unvoiced "th" is important, or when the tone of a syllable also carries meaning (as in Chinese). The Chinese words for "mother", "insult" and "horse" all sound like exactly the same word, "ma", to non-Chinese speakers.