An interesting article. I would have really liked the author to complete the circle. A discussion of solutions to the problems would be highly complementary.
we’d assumed that every user’s click-through was statistically independent, when in fact they were highly correlated, so many of the results which we thought were significant were in fact basically noise.
a) Does there exist an approach to model the non-independent click behavior of users?
b) A lot of progress in CTR prediction assumes independence of clicks. What is the likely benefit of the non-independence assumption of clicks, besides being closer to reality?
c) How can one use Pearl's techniques to model the non-independent behavior of user clicks?
That's a good point. If a "skewed distribution" exists, one might use this point as an explanation. I would be really interested in knowing whether such a "skewed distribution" exists. However, how would one go about constructing such a distribution? Which characteristics of drivers should one consider? Extrapolating to other avenues like teaching, construction work, sales etc. can one define variables to measure such a "distribution" of competence?
This is an awesome document. Thanks
Understanding things in depth is a significant part of the process. You would also like to retain things in your memory. For this, you may like to use Anki (spaced iteration, interleaving, etc.)