bokov comments on Request for suggestions: ageing and data-mining - Less Wrong

14 Post author: bokov 24 November 2014 11:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (48)

You are viewing a single comment's thread. Show more comments above.

Comment author: Daniel_Burfoot 25 November 2014 04:09:50AM 4 points [-]

Research access to large amounts of anonymized patient data.

Take all the data you have, come up with some theory to describe it, build the scheme into a lossless data compressor, and invoke it on the data set. Write down the compression rate you achieve, and then try to do better. And better. And better. This goal will force you to systematically improve your understanding of the data.

(Note that transforming a sufficiently well specified statistical model into a lossless data compressor is a solved problem, and the solution is called arithmetic encoding - I can give you my implementation, or you can find one on the web. So what I'm really suggesting is just that you build statistical models of the raw data, and try systematically to improve those models).

Comment author: bokov 03 December 2014 04:24:50PM 1 point [-]

(Note that transforming a sufficiently well specified statistical model into a lossless data compressor is a solved problem, and the solution is called arithmetic encoding - I can give you my implementation, or you can find one on the web.

The unsolved problems are the ones hiding behind the token "sufficiently well specified statistical model".

That said, thanks for the pointer to arithmetic encoding, that may be useful in the future.