More thoughts here, but TL;DR I’ve decided to revert the dashboard back to its original state & have republished the stale data. (Just flagging for readers who wanted to dig into the metrics.)
Hey! Sorry for the silence, I was feeling a bit stressed by this whole thread, and so I wanted to step away and think about this before responding. I’ve decided to revert the dashboard back to its original state & have republished the stale data. I did some quick/light data checks but prioritised getting this out fast. For transparency: I’ve also added stronger context warnings and I took down the form to access our raw data in sheet form but intend to add it back once we’ve fixed the data. It’s still on our stack to Actually Fix this at some point but we’re still figuring out the timing on that.
On reflection, I think I probably made the wrong call here (although I still feel a bit sad / misunderstood but 🤷🏻♀️). It was a unilateral + lightly held call I made in the middle of my work day — like truly I spent 5 min deciding this & maybe another ~15 updating the thing / leaving a comment. I think if I had a better model for what people wanted from the data, I would have made a different call. I’ve updated on “huh, people really care about not deleting data from the internet!” — although I get that the reaction here might be especially strong because it’s about CEA (vs the general case). Sorry, I made a mistake.
Future facing thoughts: I generally hold myself to a higher standard for accuracy when putting data on the internet, but I also do value not bottlenecking people in investigating questions that feel important to me (e.g. qs about EA growth rates), so to be clear I’m prioritizing the latter goal right now. I still in general stand by, “what even is the point of my job if I don’t stand by the data I communicate to others?” :) I want people to be able to trust that the work they see me put out in the world has been red-teamed & critiqued before publication.
Although I’m sad this caused an unintended kerfuffle, it’s a positive update for me that “huh wow, people actually care a lot that this project is kept alive!”. This honestly wasn’t obvious to me — this is a low traffic website that I worked on a while ago, and don’t hear about much. Oli says somewhere that he’s seen it linked to “many other times” in the past year, but TBH no one has flagged that to me (I’ve been busy with other projects). I’m still glad that we made this thing in the first place and am glad people find the data interesting / valuable (for general CEA transparency reasons, as an input to these broader questions about EA, etc.). I’ll probably prioritize maintenance on this higher in the future.
Now that the data is back up I’m going to go back to ignoring this thread!
Hey! I just saw your edited text and wanted to jot down a response:
Edit: I'll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn't align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.
I'm sorry this feels bad to you. I care about being truth seeking and care about the empirical question of "what's happening with EA growth?". Part of my motivation in getting this dashboard published in the first place was to contribute to the epistemic commons on this question.
I also disagree that CEA retracts data that doesn't align with "the right story on growth”. E.g. here's a post I wrote in mid 2023 where the bottom line conclusion was that growth in meta EA projects was down in 2023 v 2022. It also publishes data on several cases where CEA programs grew slower in 2023 or shrank. TBH I also think of this as CEA contributing to the epistemic commons here — it took us a long time to coordinate and then get permission from people to publish this. And I’m glad we did it!
On the specific call here, I'm not really sure what else to tell you re: my motivations other than what I've already said. I'm going to commit to not responding further to protect my attention, but I thought I'd respond at least once :)
Quick thoughts on this:
I’m probably going to drop responding to “was this a bad call” and prioritize “just get the dashboard back up soon”.
Hi! A quick note: I created the CEA Dashboard which is the 2nd link you reference. The data here hadn’t been updated since August 2024, and so was quite out of date at the time of your comment. I've now taken this dashboard down, since I think it's overall more confusing than helpful for grokking the state of CEA's work. We still intend to come back and update it within a few months.
Just to be clear on why / what’s going on:
Thanks!
Thanks, I found this interesting! I remember reading that piece by Froolow but I didn't realize the refactoring was such a big part of it (and that the GiveWell CEA was formatted in such a dense way, wow).
This resonates a lot with my experience auditing sprawling, messy Excel models back in my last job (my god are there so many shitty Excel models in the world writ large).
FWIW if I were building a model this complex, I'd personally pop it into Squiggle / Squigglehub — if only because at that point, properly multiplying probabilities together and keeping track of my confidence interval starts to really matter to me :)
I also spent a cursed day looking into the literature for NONS. I was going to try and brush this up into a post, but I'm probably not going to do that after all. Here are my scrappy notes if anyone cares to read them.
You're citing the same main two studies on Enovid that I found (Phase 3 lancet or "Paper 1", Phase 2 UK trial or "Paper 2"), so in case it's helpful, here are my notes under "Some concerns you might have" re: the Lancet paper:
.
Note the evidence base on explicitly prophylactic use of NONS is not very good. Here's the only study I could find (after maybe an hour of searching), and it's a retrospective epidemiological case study (i.e. not randomly assigned), again by the manufacturers.
They're running a Phase 3 prophylactic RCT right now, which in theory is supposed to wrap up this month, but who knows when we'll see the results.
For example: let’s say you want to know the impact of daily jogs on happiness. You randomly instruct 80 people to either jog daily or to simply continue their regular routine. As a per protocol analyst, you drop the many treated people who did not go jogging. You keep the whole control group because it wasn’t as hard for them to follow instructions.
I didn't realize this was a common practice, that does seem pretty bad!
Do you have a sense of how commonplace this is?
What’s depressing is that there is a known fix for this: intent-to-treat analysis. It looks at effects based on the original assignment, regardless of whether someone complied or not.
In my econometrics classes, we would have been instructed to take an instrumental variables approach, where "assignment to treatment group" is an instrument for "does the treatment", and then you can use a two stage least squares regression to estimate the effect of treatment on outcome. (My mind is blurry on the details.)
IIUC this sounds similar to intent-to-treat analysis, except allowing you to back out the effect of actually doing the treatment, which is presumably what you care about in most cases.
I have built three or four traditional-style lumenators, as described in Eliezer and Raemon’s posts. There’s a significant startup cost — my last one cost $500 for the materials (with $300 of that being the bulbs), and the assembly always takes me several hours and is rife with frustration — but given that they last for years, it’s worth it to me.
Reading this post inspired me to figure out how to set up a lumenator in my room, so thank you for writing it! :)
I just set mine up and FWIW I got 62,400 lumens for $87 ($3.35 / bulb if you buy 26, 2600 lumens, 85 CRI, 5000k, non-dimmable). These aren't dimmable, but are over half the price of the 83 CRI Great Eagle bulbs you mentioned (which are $6.98 / bulb right now).
My full set up cost $212.
[musing] Actually another mistake here which I wish I just said in the first comment: I didn't have a strong enough TAP for, if someone says a negative thing about your org (or something that could be interpreted negatively), you should have a high bar for not taking away data (meaning more broadly than numbers) that they were using to form that perception, even if you think the data is wrong for reasons they're not tracking. You can like, try and clarify the misconception (ideally, given time & energy constraints etc.), and you can try harder to avoid putting wrong things out there, but don't just take it away -- it's not on to reader to treat you charitably and it kind of doesn't matter what your motives were.
I think I mostly agree with something like that / I do think people should hold orgs to high standards here. I didn't pay enough attention to this and regret it. Sorry! (I'm back to ignoring this thread lol but just felt like sharing a reflection 🤷🏻♀️)