ChristianKl comments on The rational way to name rivers - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (42)
English has Shavian.
There's also Deseret, which I've made some tools for, but it's not featural (beyond some isolated cases, like ligatures for some-but-not-all diphthongs) and is somewhat confusing to learn.
Neither of these will be generally usable for the immediate future, since they're both in Unicode's astral planes, and some common piece of web framework (old versions of MySQL, IIRC) silently fails on encountering astral-plane characters. Font support is another issue, but Deseret is slightly better-supported than Shavian -- my Win8 install came with a font for the former, but not the latter.
(If there's anything after the following colon, LW doesn't have this bug: ๐๐ฎ๐ ๐ฎ๐ ๐ฉ ๐ป๐ฏ๐ ๐ป)
If you wanted to use them you could build Chrome and firefox plugin that automatically parses all English text into Deseret. At the same time you could write a wordpress plugin that automatically offers users under des.domain.name a version of the website in Deseret.
That would be difficult. Deseret script is phonetic, so you'd have to either look up the pronunciation for each word or eat the imperfection from the ~40% of words that can't be easily predicted.
Deseret script as it's supposed to be used is even harder to automate conversion into than that alone would suggest: you're supposed to write the stressed equivalent of unstressed vowels. So the words "photograph" and "photography", for example, should be ๐๐ฌ๐ป๐ฌ๐๐๐ฐ๐ and ๐๐ฌ๐ป๐ช๐๐๐ฐ๐๐ฎ (IPA: foสฬฏtoสฬฏgrรฆf and foสฬฏtษgrรฆfษช, my keyboard transliteration: fo;to;graf and fo;tografi). I don't think this is very common in practice, however -- which is a problem for back-converting Deseret to Latin, since the unstressed schwa can be written either ๐ฒ or ๐ฎ by people who don't distinguish them.
Also, textspeak is built into it: the name of the letter ๐ is 'bee', so the word 'bee' can be written '๐บ'. This can even hold within a word: the Wikipedia page has an example of a coin with the text "๐๐๐ข๐๐ค๐ ๐๐ ๐ ๐ข๐๐ก๐". The first word there is 'holiness', but it's written /hoสฬฏlษชns/ (ho;lins), since the name of the letter ๐ is pronounced 'ess'. Usually you see this in the definite article, which is just written ๐, but you could also write 'entry', 'zebra', and 'jeep' as '๐๐ป๐๐ฎ', '๐๐บ๐๐ฒ', and '๐พ๐น'. (ntrษช zbrษ dสp / ntri zbru jp -- and 'entry' could also be written with a final -๐จ instead of -๐ฎ)
It would be possible to automatically convert Latin to Deseret (or Shavian) and back, but it wouldn't be easy, and it probably couldn't be done quickly enough to have a browser plugin do it.
edit: a Latin -> Deseret converter already exists, but it's crap: can't take more than a few words at a time, returns allcaps, adds semicolons for no reason after some letters, can't handle textspeak even for the definite article, and makes vowel choices that I wouldn't make. (Looks like it writes all unstressed vowels with ๐.)
Yes you need a phonetic dictionary. eSpeak is a project where people already dealt with the problem of predicting phonetics. You could start with the values that eSpeak produces and allow users to edit them in some sort of Wiki to improve on the eSpeak IPA values.
Local database lookups are very fast I don't see how speed on a client side browser plugin would be an issue.
Fast enough that you can do a few hundred of them per page? (Not rhetorical; I don't know.)
Textspeak substitution wouldn't actually be a problem; I don't know why I thought otherwise. And back-conversion to Latin would just require brute-forcing words that don't show up in the dictionary.
Yes, select queries don't take much time when you have an index. Thank Moore's law.