Happy new year! This is the supposedly-bimonthly-but-we-keep-skipping 'What are you working On?' thread. Previous threads are here. So here's the question:
What are you working on?
Here are some guidelines:
- Focus on projects that you have recently made progress on, not projects that you're thinking about doing but haven't started.
- Why this project and not others? Mention reasons why you're doing the project and/or why others should contribute to your project (if applicable).
- Talk about your goals for the project.
- Any kind of project is fair game: personal improvement, research project, art project, whatever.
- Link to your work if it's linkable.
I'm currently a post-doc doing language technology/NLP type stuff. I'm considering quitting soon to work full time on a start-up. I'm working on three things at the moment.
The start-up is a language learning web app: http://www.cloze.it . What sets it apart from other language-learning software is my knowledge of linguistics, proficiency with text processing, and willingness to code detailed language-specific features. Most tools want to be as language neutral as possible, which limits their scope a lot. So they tend to all have the same set of features, centred around learning basic vocab.
Something that's always bugged me about being an academic is, we're terrible at communicating to people outside our field. This means that whenever I see a post using an NLP tool, they're using a crap tool. So I wrote a blog post explaining a simple POS tagger that was better than the stuff in e.g. nltk (nltk is crap): http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/ The POS tagger post has gotten over 15k views (mostly from reddit), so I'm writing a follow up about a concise parser implementation. The parser is 500 lines, including the tagger, and faster and more accurate than the Stanford parser (the Stanford parser is also crap).
I'm doing minor revisions for a journal article on parsing conversational speech transcripts, and detecting disfluent words. The system gets good results when run on text transcripts. The goal is to allow speech recognition systems to produce better transcripts, with punctuation added, and stutters etc removed. I'm also working on a follow up paper to that one, with further experiments.
Overall the research is going well, and I find it very engaging. But I'm at the point where I have to start writing grant applications, and selling software seems like a much better expected-value bet.
TIL: NLP can mean Natural Language Processing, as well as Neuro Linguistic Programming. I was confused for a while there.