John_Maxwell_IV comments on PSA: Learn to code - Less Wrong

34 Post author: John_Maxwell_IV 25 May 2012 06:50PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

You are viewing a single comment's thread. Show more comments above.

Comment author: John_Maxwell_IV 25 May 2012 09:46:22PM 3 points [-]

Just for the benefit of bystanders, most computer programs to do what I described are far easier to understand than the one wmorgan wrote.

Comment author: fubarobfusco 26 May 2012 04:46:43AM *  2 points [-]

It's actually quite straightforward. It's just written in a language that most coders don't use, and moreover it uses data types that most "coderly" languages don't have. It would be pretty obvious to many experienced Unix sysadmins, though; there's nothing here that a sysadmin wouldn't use in doing log analysis or the like.

The most accessible data types in Unix shellscript are strings, semi-lazy streams of strings, and processes. A shell pipeline, such as the above, is a sequence of processes connected by streams; each process's output is the next one's input.

  1. cat /usr/share/dict/words | \ Create a stream of strings from a single file, namely a standard list of English words.
  2. sed -e 's/.*\(.\)/\1/' | \ For each word, extract the last letter.
  3. tr A-Z a-z | \ Change any uppercase letters to lowercase.
  4. sort | \ Sort the stream, so that all identical letters are adjacent to one another.
  5. uniq -c | \ Count identical adjacent letters.
  6. sort -rn Sort numerically so that the letters with the highest counts come first.

It is not really clear to me that this is particularly less expressive than the straightforward way to do an equivalent operation in modern Python, as follows:

import collections
c = collections.Counter()
for line in file("/usr/share/dict/words"):
line = line.strip().lower()
if not line:
continue
c[line[-1]] += 1
for (ltr, count) in c.most_common():
print ltr, count
Comment author: shokwave 28 May 2012 01:10:57PM 1 point [-]

Ways in which it might be less expressive: using the small, efficient pieces of Unix takes a while to be conceptually similar to using the different functions of a programming language. Using a regex.

(Inferential distance is hard to estimate; I know I like to hear where I'm incorrectly assuming short distances; I hope you do too).