You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Adele_L comments on The rational way to name rivers - Less Wrong Discussion

2 Post author: PhilGoetz 06 August 2014 03:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (42)

You are viewing a single comment's thread.

Comment author: Adele_L 07 August 2014 03:21:20AM 9 points [-]

Another really cool language design is Korean hangul. The form of each letter represents how you put your mouth to vocalize it - among many nice features.

Comment author: [deleted] 08 August 2014 12:05:25AM *  3 points [-]

English has Shavian.

There's also Deseret, which I've made some tools for, but it's not featural (beyond some isolated cases, like ligatures for some-but-not-all diphthongs) and is somewhat confusing to learn.

Neither of these will be generally usable for the immediate future, since they're both in Unicode's astral planes, and some common piece of web framework (old versions of MySQL, IIRC) silently fails on encountering astral-plane characters. Font support is another issue, but Deseret is slightly better-supported than Shavian -- my Win8 install came with a font for the former, but not the latter.

(If there's anything after the following colon, LW doesn't have this bug: ๐‘„๐ฎ๐‘… ๐ฎ๐‘† ๐ฉ ๐ป๐ฏ๐‘…๐ป)

Comment author: ChristianKl 08 August 2014 01:41:33PM 0 points [-]

If you wanted to use them you could build Chrome and firefox plugin that automatically parses all English text into Deseret. At the same time you could write a wordpress plugin that automatically offers users under des.domain.name a version of the website in Deseret.

Comment author: [deleted] 09 August 2014 09:56:27PM *  0 points [-]

That would be difficult. Deseret script is phonetic, so you'd have to either look up the pronunciation for each word or eat the imperfection from the ~40% of words that can't be easily predicted.

Deseret script as it's supposed to be used is even harder to automate conversion into than that alone would suggest: you're supposed to write the stressed equivalent of unstressed vowels. So the words "photograph" and "photography", for example, should be ๐‘๐ฌ๐ป๐ฌ๐‘€๐‘‰๐ฐ๐‘ and ๐‘๐ฌ๐ป๐ช๐‘€๐‘‰๐ฐ๐‘๐ฎ (IPA: foสŠฬฏtoสŠฬฏgrรฆf and foสŠฬฏtษ‘grรฆfษช, my keyboard transliteration: fo;to;graf and fo;tografi). I don't think this is very common in practice, however -- which is a problem for back-converting Deseret to Latin, since the unstressed schwa can be written either ๐ฒ or ๐ฎ by people who don't distinguish them.

Also, textspeak is built into it: the name of the letter ๐’ is 'bee', so the word 'bee' can be written '๐บ'. This can even hold within a word: the Wikipedia page has an example of a coin with the text "๐๐„๐ข๐†๐ค๐ ๐“๐… ๐œ ๐ข๐ƒ๐ก๐”". The first word there is 'holiness', but it's written /hoสŠฬฏlษชns/ (ho;lins), since the name of the letter ๐ is pronounced 'ess'. Usually you see this in the definite article, which is just written ๐‘„, but you could also write 'entry', 'zebra', and 'jeep' as '๐‘Œ๐ป๐‘‰๐ฎ', '๐‘†๐บ๐‘‰๐ฒ', and '๐พ๐น'. (ntrษช zbrษ™ dส’p / ntri zbru jp -- and 'entry' could also be written with a final -๐จ instead of -๐ฎ)

It would be possible to automatically convert Latin to Deseret (or Shavian) and back, but it wouldn't be easy, and it probably couldn't be done quickly enough to have a browser plugin do it.

edit: a Latin -> Deseret converter already exists, but it's crap: can't take more than a few words at a time, returns allcaps, adds semicolons for no reason after some letters, can't handle textspeak even for the definite article, and makes vowel choices that I wouldn't make. (Looks like it writes all unstressed vowels with ๐†.)

Comment author: ChristianKl 09 August 2014 10:18:48PM 0 points [-]

That would be difficult. Deseret script is phonetic, so you'd have to either look up the pronunciation for each word or eat the imperfection from the ~40% of words that can't be easily predicted.

Yes you need a phonetic dictionary. eSpeak is a project where people already dealt with the problem of predicting phonetics. You could start with the values that eSpeak produces and allow users to edit them in some sort of Wiki to improve on the eSpeak IPA values.

It would be possible to automatically convert Latin to Deseret (or Shavian) and back, but it wouldn't be easy, and it probably couldn't be done quickly enough to have a browser plugin do it.

Local database lookups are very fast I don't see how speed on a client side browser plugin would be an issue.

Comment author: [deleted] 09 August 2014 10:54:40PM 0 points [-]

Local database lookups are very fast I don't see how speed on a client side browser plugin would be an issue.

Fast enough that you can do a few hundred of them per page? (Not rhetorical; I don't know.)

Textspeak substitution wouldn't actually be a problem; I don't know why I thought otherwise. And back-conversion to Latin would just require brute-forcing words that don't show up in the dictionary.

Comment author: ChristianKl 10 August 2014 12:05:42AM 0 points [-]

Fast enough that you can do a few hundred of them per page? (Not rhetorical; I don't know.)

Yes, select queries don't take much time when you have an index. Thank Moore's law.

Comment author: IlyaShpitser 07 August 2014 07:47:06PM 0 points [-]

Yes, Hangul is our Marain.

Comment author: [deleted] 07 August 2014 07:49:58PM 0 points [-]

I'd back this if it included hanja on the side.

Comment author: Creutzer 07 August 2014 07:32:02PM *  0 points [-]

The Tengwar of Tolkien's share with hangul at least the encoding of phonological features, by the way.