TLDR: Honesty is the best policy, and don't be a try-hard.
I understand that data collection is difficult and empathize with the people responsible for doing the work.
The thing is, SF used to publish everything as soon as they could! We accepted that numbers could be revised up or down as data was fully coded. This 5 day lag is IMO far on the wrong side of timeliness vs correctness.
Obvious next step: if there's a lot of low hanging fruit like this, let's find it? Have you considered using your LW/Twitter/blog to publicly solicit obvious, simple, and high leverage solutions to other big problems?
In dath ilan, it is virtuous to write more stories about dath ilan.
I expect agave to be generally preferred over table sugar and HFCS due to having a significantly lower glycemic index. I'm unfamiliar with Karo.
Something I've been wondering for a while: are organizations/journalists/individuals filing FOIA requests to get emails and other relevant documents about how the CDC and FDA made their COVID decisions?
Potentially interested!
Big picture, if your friend wants a different blend of upside-to-work, perhaps they should consider hiring someone to work 15-20 hrs/wk, freeing them up to do <5 hrs/wk of supervision?
This post is a bit hard to parse - please consider replacing "a.test" with something like "test.com/a" or "a.test.com/page" to clarify whether the issue is per-page caching or per-domain caching.
I posted my answer a bit late but this was a ton of fun!
I did some initial exploration of the dataset and came to similar conclusions as others on the thread.
I then decided this was a good excuse to finally learn how to use LightGBM, one of the best-in-class tools for creating decision trees, and widely used in the data science industry. In other words, let's make the computer do the fun part!
The goal was to output something like:
What I actually got:
I used default settings, transformed color/fangs/nostrils into 0-N categorical variables and marked them accordingly, then basically did "give me a regression with a single tree and 15 leaves".
As others have mentioned, all gray turtles have fangs and weigh noticeably less (4-7 pounds), so this is obvious nonsense.
This tool is supposedly the non-AI state-of-the-art. It confidently fails with out-of-the-box settings. I remain baffled as to how anyone in tech ever gets anything done, myself included.