Suppose it were discovered with a high degree of confidence that insects could suffer a significant amount, and almost all insect lives are worse than not having lived. What (if anything) would/should the response of the EA community be?
There's a lot of uncertainty in this field. I would hope to see a lot of people very quickly shift a lot of effort into researching:
- Effective interventions for reducing the number of insects in the environment (without, e.g., crashing the climate)
- Comparative effects of different kinds of land use (e.g. farming crops or vegetables, pasture, left wild, whatever) on insect populations
- Ability of various other invertebrates to suffer (how about plankton, or nematodes? The same high-confidence evidence showing insects suffer might also show the same for their smaller, more numerous cousins)
- Shifting public perceptions of gene drives
- Research into which pesticides cause the least suffering
Currently it seems like Brian Tomasik & the Foundational Research Institute, and Sentience Politics, are paying some attention to considerations like this.
You seem to mix up calibration and Brier scores.
Your first paragraph is correct. That is calibration. That is why 50/50 items are not useful for calibration. If you get less than 90% of your 90% items correct, you are a normal overconfident person. If your 50/50 items are not 50% correct, something odd is going on, like you are abnormally biased by the way questions are phrased.
Brier scores allow any input. 50% is a useful prediction for Brier scores. If you say that the French incumbent has a 50% chance of winning the election, that doesn't affect your calibration, but it is bad for your Brier score.
Yes, I see - it seems like there are two ways to do this exercise.
1) Everybody writes their own predictions and arranges them into probability bins (either artificially after coming up with them, or just writing 5 at 60%, 5 at 70%, etc.) You then check your calibration with a graph like Scott Alexander's.
2) Everybody writes their estimations for the same set of predictions - maybe you generate 50 as a group, and everyone writes down their most likely outcome and how confident they are in it. You then check your Brier score.
Both of these seem useful for different things - in 2), it's a sort of raw measure of how good at making accurate guesses you are. Lower confidence levels make your score worse. In 1), you're looking at calibration across probabilities - there are always going to be things you're only 50% or 70% sure about, and making those intervals reflect reality is as important as things you're 95% certain on.
I will edit the original post (in a bit) to reflect this.
50% predictions can be useful if you are systematic about which option you count as "yes". e.g., "I estimate a 50% chance that I will finish writing my book this year" is a meaningful prediction. If I am subject to standard biases, then we would expect this to have less than a 50% chance of happening, so the outcomes of predictions like this provide a meaningful test of my prediction ability.
2 conventions you could use for 50% predictions: 1) pose the question such that "yes" means an event happened and "no" is the default, or 2) pose the question such that "yes" is your preferred outcome and "no" is the less desirable outcome.
Actually, it is probably better to pick one of these conventions and use it for all predictions (so you'd use the whole range from 0-100, rather than just the top half of 50-100). "70% chance I will finish my book" is meaningfully different than "70% chance I will not finish my book"; we are throwing away information about possible miscalibrated by treating them both merely as 70% predictions.
Even better, you could pose the question however you like and also note when you make your prediction 1) which outcome (if either) is an event rather than the default and 2) which outcome (if either) you prefer. Then at the end of the year you could look at 3 graphs, one which looks at whether the outcome that you considered more likely occurred, one that looks at whether the (non-default) event occurred, and one which looks at whether your preferred outcome occurred.
I would imagine that at the 50% level, you can put down a prediction in the positive or negative phrasing, and since it'll be fixed at the beginning of the year (IE, you won't be rephrasing it six months in), you should expect 50% of them to end up happening either way. Right?
(50% predictions are meaningless for calculating Brier scores, but seem valuable for general calibration levels. I suppose forcing them to 45/55% so that you can incorporate them in Brier scores / etc isn't a bad idea. I'm not much of a statistician. Is that what you were saying, Douglas_Knight?)
The 99%/97% thing is true in that you're jumping from one probability to a probability that's 3 times as high, but it seems practically less necessary in that A) if you're making fewer than 30 predictions at that interval, you shouldn't expect any of them to be true, and B) I have a hard time mentally distinguishing 97% and 99% chances, and would expect other people to be similarly bad at it (unless they practiced or did some rigorous evaluation of the evidence.) I'm not sure how much credence I should lend to this.
Throw a prediction party with your EA/rationality group
TL;DR: Prediction & calibration parties are an exciting way for your EA/rationality/LessWrong group to practice rationality skills and celebrate the new year.
On December 30th, Seattle Rationality had a prediction party. Around 15 people showed up, brought snacks, brewed coffee, and spent several hours making predictions for 2017, and generating confidence levels for those predictions.
This was heavily inspired by Scott Alexander’s yearly predictions. (2014 results, 2015 results, 2016 predictions.) Our move was to turn this into a communal activity, with a few alterations to meet our needs and make it work better in a group.
Procedure:
- Each person individually writes a bunch of predictions for the upcoming year. They can be about global events, people’s personal lives, etc.
- If you use Scott Alexander’s system, create 5+ predictions each for fixed confidence levels (50%, 60%, 70%, 80%, 90%, 95%, etc.)
- If you want to generate Brier scores or logarithmic scores, just do 30+ predictions at whatever confidence levels you believe.
- Write down confidence levels for each prediction.
- Save your predictions and put it aside for 12 months.
- Open up your predictions and see how everyone did.
To make this work in a group, we recommend the following:
- Don’t share your confidence intervals. Avoid anchoring by just not naming how likely or unlikely you think any prediction is.
- Do share predictions. Generating 30+ predictions is difficult, and sharing ideas (without confidence levels) makes it way easier to come up with a bunch. We made a shared google doc, and everyone pasted some of their predictions into it.
- Make predictions that, in a year, will verifiably have happened or not. (IE, not “the academic year will go well”, which is debatable, but “I will finish the year with a 3.5 GPA or above”.)
- It’s convenient to assume that unless stated otherwise predictions that end by the next year (IE, "I will go to the Bay Area" means "I will go to the Bay Area at least once in 2017.") It’s also fine to make predictions that have other end dates (“I will go to EA Global this summer.”)
- Make a bunch of predictions first without thinking too hard about how likely they are, then assign confidence levels. This post details why. You could also generate a group list of predictions, and everyone individually lists their own confidence levels.
This makes a good activity for rationality/EA groups for the following reasons:
- Practicing rationality skills:
- Making accurate predictions
- Using confidence intervals
- Accessibility
- It’s open to many different knowledge levels. Even if you don’t know a thing about geopolitics, you can still give predictions and confidence intervals about media, sports, or your own life.
- More free-form and less intimidating than using a prediction market. You do not have to know about the details of forecasting to try this.
- Natural time and recurring activity
- You could do this at any point during the year, but doing it at the start of the year seems appropriate for ringing in the new year.
- In twelve months, you have an automatic new activity, which is coming back together and checking everybody’s predictions from last year. Then you make a new set of predictions for next year. (If this falls through for some reason, everyone can, of course, still check their predictions on their own.)
- Fostering a friendly sense of competitiveness
- Everyone wants to have the best calibration, or the lowest Brier score. Everyone wants to have the most accurate predictions!
Some examples of the predictions people used:
- Any open challenges from the Good Judgment Project.
- I will switch jobs.
- I will make more than $1000 money in a way that is different from my primary job or stock.
- I will exercise 3 or more times per week in October, November, December.
- I’ll get another tattoo.
- Gay marriage will continue to be legal in Washington state.
- Gay marriage will continue to be legal in all 50 states.
- I will try Focusing at least once.
- I will go to another continent.
- CRISPR clinical trials will happen on humans in the US.
- A country that didn’t previously have nuclear weapons will acquire them.
- I will read Thinking Fast and Slow.
- I will go on at least 3 dates.
Also relevant:
- 16 types of useful predictions
- Brier values and graphs of ‘perfect’ vs. actual scores will give you different information. Yvain writes about the differences between these. Several of us did predictions last year using the Scott Alexander method (bins at fixed probabilities), although this year, everybody seems to have used continuous probabilities. The exact method by which we’ll determine how well-calibrated we were will be left to Seattle Rationality of 2018, but will probably include Brier values AND something to determine calibration.
Hello friends! I have been orbiting around effective altruism and rationality ever since a friend sent me a weird Harry Potter fanfiction back in high school. I started going to Seattle EA meetings on and off a couple years ago, and have since read a bunch of blogs, made friends who were into existential risk, started my own blog, graduated college, and moved to Seattle.
I went to EA Global this summer, attend and occasionally help organize Seattle EA/rationality events, and work in a bacteriophage lab. I plan on studying international security and biodefense. I recently got back from a trip to the Bay Area, that gaping void in our coastline that all local EA group leaders are eventually sucked into, and was lucky to escape with my life.
I'm gray on the LessWrong slack, and I also have a real name. I had a LW account back early in college that I used for a couple months, but then I got significantly more entangled in the community, heard about the LW revitalization, and wanted a clean break - so here we are. In very recent news, I'm pleased to announce in celebration of finding the welcome thread, I'm making a welcome post.
I wasn't sure if it would be tacky to directly link my blog here, so I put it in my profile instead. :)
Areas of expertise, or at least interest: Microbiology, existential risk, animal ethics and welfare, group social norms, EA in general.
Some things I've been thinking about lately include:
- How to give my System 1 a visceral sense of what "humanity winning" looks like
- What mental effects hormonal birth control might have
- Which invertebrates might be able to feel pain
- What an alternate system of taxonomy based on convergent evolution, rather than phylogeny, would look like
- How to start a useful career in biorisk/biodefense
I'd like to know if anyone knows good research (or just good estimates) of the following:
- Mental effects of hormonal birth control, especially long-term or subtle (think personality changes, maybe cognition, etc, not just increased risk of diagnosed mental illness)
- If anyone's estimated QALYs lost by menstruating
If not, I'm planning on researching it, but I love when people have already done the thing.
I am skeptical that group conversations have a tendency to fall apart at when they get interesting because people have social reasons for doing so.
Rather, it feels like there's some expectations that group conversations are "supposed" to be lighter, and one-on-one / small group discussions are really meant for intimacy.
So it might not be so much that people deliberately leave to sabotage interesting conversations, but they see it as a signal to start one of their own in a small group, or politely leave to increase the perceived value of the discussion of those involved.
This resonates. When a group conversation became unexpectedly intimate, I've definitely felt that urge to bail - or interfere and bring the conversation back to a normal level of engagement. It feels like an intense discomfort, maybe a sense of "I shouldn't be here" or "they shouldn't have to answer that question."
I think that's often a good instinct to have. (In this context, where 'interesting' seems to mean not just a topic you think is neat, but something like 'substantive and highly relevant to someone' or 'involving querying a person's deep-held beliefs', etc. Correct me if I'm wrong.) Where "diplomat mode" might be coming from:
The person starting an intensive conversation might be 'inflicting' it on the other person, who can't gracefully duck out
Both people are well-acquainted and clearly interested in having the conversation, but haven't considered that they're in public, and in retrospect would prefer not to have everyone else there
Even if they seem to be fine with me being there, my role is unclear if I'm not well-versed on the issue - am I suppose to ask questions, chime in with uneducated opinions, just listen to them talk?
Relatedly, conversations specific to people's deeply held interests are likely to require more knowledge to engage with, and thus exclude people from the conversation.
If other people are sharing personal stories or details, I might feel pressure to do that too
Conversations that run closer to what people really care about are more likely to be upsetting, and I don't want to be upset (or, depending, expect them to want to be upset in front of me)
I expect other people are uncomfortable, for whatever (any of the above) reasons
Most of these seem to apply less in small groups, or groups where everybody knows each other quite well. Attempting diplomat --> engineering shifts in large group seems interesting, but risky if there are near-strangers present, and also like managing or participating in that would take a whole different set of group-based social skills. (IE: Reducing risks from the above, assessing how comfortable everybody is with increased above risks, etc.)
Comparative solsticeology: I helped organize the Seattle Solstice, and also attended the Bay Solstice. Both were really nice. A couple major observations:
The Seattle Solstice (also, I think, the New York one) had a really clear light-dark-light progression throughout the presentations, the Bay one didn't - it seemed like each speech or song was its own small narrative arc, and there wasn't an over-arching one.
Seattle's was also in a small venue where there were chairs, but most people sat on cushions of various size on the floors, and were quite close to the performers and speakers. The Bay's was on a stage. While the cushion version probably wouldn't work for a much larger solstice, it felt intimate and communal. (Despite, I think, ~100 attendees at Seattle. Not sure how many people came to the Bay one, ~150 marked themselves as having gone on Facebook but it seemed larger.)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Is there something that lets you search all the rationality/EA blogs at once? I could have sworn I've seen something - maybe a web app made by chaining a bunch of terms together in Google - but I can't remember where or how to find it.