You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Punoxysm comments on Request for suggestions: ageing and data-mining - Less Wrong Discussion

14 Post author: bokov 24 November 2014 11:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (48)

You are viewing a single comment's thread. Show more comments above.

Comment author: Punoxysm 26 November 2014 03:26:41PM 2 points [-]

Would anyone want to literally do this on something as complex as patient data?

If not, why not just say try to come up with as good of models as you can?

Pick a couple of quantities of interest and try to model them as accurately as you can.

Comment author: Daniel_Burfoot 26 November 2014 04:55:57PM 3 points [-]

There is a problem that some data may really fundamentally be a distraction, and so modeling it is just a waste of time.

But it is very hard to tell ahead of time whether or not a piece of data is going to be relevant to a downstream analysis. As an example, in my work on text analysis, the issue of capitalization takes a lot of effort in proportion to how interesting it seems. It is tempting to just throw away caps information by lowercasing everything. But capitalization actually has clues that are relevant to parsing and other analysis - in particular, it allows you to identify acronyms, which usually stand for proper nouns.