Experience that I had recently that I found interesting:
So, you may have noticed that I'm interested in causality. Part of my upcoming research is using pcalg (which you may have heard of) to identify the relationships between sensors on semiconductor manufacturing equipment, so that we can apply work done earlier in my lab where we identify which subsystem of a complex dynamic system is the root cause of an error. It's previously been applied in automotive engineering, where we have strong first principles models of how the systems interact, but now we want to do it in semiconductor, where we don't have first principles models of how the systems interact, and need to learn those models from data.
Time to get R installed and pcalg downloaded correctly on Ubuntu: ~2 hours. (One of the packages that pcalg requires requires R 3.0, which you need to modify the ubuntu update files to get instead of R 2.14, and a handful of other things went wrong.)
Time to figure out how to get my data into R with labels: ~2 minutes.
Time to run the algorithms to discover the causal network for the subsystem I have data for now: ~2 seconds.
I'm not sure I should also count the time spent learning about causality in the first place (which I would probably estimate at ~2 weeks), but it's striking how much of the investment in generating the results is capital, and how little of it is labor. That is, now that I have the package downloaded, I can do this easily for other datasets. Time to start picking some low-hanging fruit.
(Living in the future is awesome: as much as I complained about all the various rabbit holes I had to go down while installing pcalg, it took way less time than it would have taken me to code the algorithms myself, and I doubt I would have done anywhere near as a good a job at it.)
I'm not sure I should also count the time spent learning about causality in the first place (which I would probably estimate at ~2 weeks), but it's striking how much of the investment in generating the results is capital, and how little of it is labor. That is, now that I have the package downloaded, I can do this easily for other datasets. Time to start picking some low-hanging fruit.
Absolutely. When I look at my own projects, they go like 'gathering and cleaning data: 2 months. Figuring out the right analysis the first time: 2 days. Runtime of analysi...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.