You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

philh comments on May Monthly Bragging Thread - Less Wrong Discussion

8 Post author: philh 04 May 2014 08:21AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread.

Comment author: philh 04 May 2014 12:44:13PM 21 points [-]

In the film industry, when we want to guess how much money a film is going to make, we think of similar films to compare it to and see how much money those made. Six months ago, I started a project to think of similar films for you. We rolled it out to our users a couple of weeks ago, and they love it. Quite apart from the films it recommends, my boss says that the way it displays the comparisons is by far the best that he's ever seen. The database it draws from only goes back a few years, so we managed to get budgeting approval to hire a team of interns for a few weeks to fill in the history. My company is probably paying about £10,000 to collect data for a system that I wrote, by myself, in six months while working on other projects. (That's not actually a lot of money, but my system one still thinks it is.)

Relatedly, I solved this bug while working on it, by delving deeper into the network stack than I ever have before.

Comment author: Joshua_Blaine 04 May 2014 10:25:51PM *  12 points [-]

You made a thing that's being used by other people. People who are paying you to use it. That's pretty great!

Comment author: William_Quixote 05 May 2014 05:46:17PM *  0 points [-]

To get data to feed your model consider buying a data set from an industry provider like box office mojo. Depending on what fields you need, they have a very solid data set with long history that could probably be purchased for less than 10000 euro depending on the confi and on sell terms your company could agree to.

Comment author: philh 05 May 2014 10:20:25PM 0 points [-]

"Import" might have been more accurate than "collect". We have access to the history from various sources, but we can't yet import it automatically. I'm pretty sure the data needs to be hunted down, collated and cleaned up in difficult-to-automate ways, but I haven't actually turned my attention to the problem yet.