There isn't enough sharing of positive and negative results within the rationality community. I suspect this results in a fair amount of wasted effort as people explore the same dead ends, and a fair amount of lost potential when more effective tools don't get shared.

So, here are some things Boston has tried (not everything though):

Successes

Bureaucracy Day

Everyone shows up for a few hours with the intention of taking care of whatever bureaucratic tasks they've been putting off (doctor's appointments, getting a passport, taking care of personal finance tasks, etc.). Various supplies (printer, staplers, envelopes, etc.) are available.

I record how long each attempted task had been put off for, whether or not it was completed, and (optionally) what the task was. I use "old tasks get accomplished" as a proxy for the impact of the intervention, since I assume that if someone has been putting something off for six months they weren't going to do it in the counterfactual world where Bureaucracy Day doesn't exist.

Overall it's been way more successful than I expected. It's not uncommon for tasks that are years old to get finished.

I expect efficacy to drop over time as all the oldest tasks get accomplished, but so far that's been counteracted by new people participating.

Sprint Day

My attempt to use Hackathon mindset for something actually productive. People show up and work for ten hours. No social media, no non-essential conversations, and only one project is allowed. Food is ordered or cooked the night before.

There aren't any common metrics, because I don't want to disrupt workflow by imposing recording procedures. Using my own metrics (number of github issues resolved and time spent working) I'm about an order of magnitude more productive on sprint days than on normal Saturdays. Not enough data to tell if I'm just redistributing my productivity to sprint days, but the data do not suggest this. Other participants report excellent results (that sprint days are at least 90th percentile productivity).

However participation has been very low (and I suspect this hurts individual efficacy). I'm not sure if this is because people don't want to give up their Saturdays, don't have a project to work on, or something else.

Backthumb

A conversational norm that anyone can say "backthumb" and the conversation will return to the previous topic. We use it to cull tangents. It's useful enough to have reached fixation within the local community, and we use it regularly.

I do not have any data on how focused our conversations are with / without the backthumb norm, but I'm quite confident it's a significant improvement.

Boiling point of nitrogen

Another conversational norm. Anyone can say "boiling point of nitrogen" to indicate that the current disagreement/question can be easily resolved with google, and then everyone has to shut up and google it. (Not sure if this is original to Boston or if we imported it).

Works well for culling pointless debates. Adoption has steadily increased.

Failures

Order of the Sphex

We made a weekly review worksheet, and attempted to iterate on it. Questions were things like: "What goals are you working on", "What trivial inconveniences are in your way", "Is there something you need to get off your plate". (I can share the full list if anyone is interested).

Was initially successful, but eventually became useless, and attempts to save it failed. There was also a meta failure where we didn't notice how badly it was failing, and so continued spending time on it.

Timed worksheets

There were many attempts to make use of time worksheets (a la CFAR) in our early days. As far as I can tell these were almost never useful for anyone (although one person reports them being very useful).

Group Habitica

Individual members have reported finding group Habiticas useful in the past, but our attempt to create a community-wide Habitica failed. Very few people joined, and of those very few made use of it.

Group intervention testing

We created a list of interventions (take modafinil, plan your day in the morning, etc.). Once a week an intervention was picked at random, and everyone would try it.

Project died almost immediately, since no one actually implemented the chosen interventions.

: Or people aren't trying new things (I hope not) or they're sharing but in places I don't check.

I'm not claiming that all results will generalize, or that it's never worth replicating an idea. But currently those aren't even possibilities (subject to ).

New Comment
26 comments, sorted by Click to highlight new comments since:

Promoted to Featured for:

  • being very concrete
  • having data on outcome, not just an idea
  • general actionability.

(Note: I had slight objections to promoting this post because I was worried it would not be that interesting to people who do not have a social stake in this community, but was convinced by elizabeth otherwise. But figured I would leave this here for posterity in case in some future world we went down the incentive gradient I was worried we would and we want to understand whether we noticed anything.)

I might have represented these as more social than I intended? 3/8 (sprints, worksheets, and Sphex) are not inherently group-focused. We tend to do them in groups, but I've tried all three on my own and seen approximately the same results.

Still now that you point it out I think we (meaning the Boston community) are biasing ourselves towards generating group interventions. Will try a few rounds of generating purely individual interventions/techniques.

In most of my social groups, "pop" is equivalent to what you describe using "backthumb" for. This started with the programmers, who also use "push" when deliberately introducing a new temporary topic, and reached fixation.

In my experience "pop" is connotationally very different from how the Boston rationalists "backthumb"; "backthumb" contains a value judgement that the nascent conversation branch would be a poor use of time even if there is much that could be said about it, while "pop" is primarily used to return to a previous topic after a conversation branch has exhausted itself naturally.

Maybe worth saying: I think of all of these as instrumental rationality outputs. They aren't meant to make people more rational, they're ideas a rational person might come up with in order to accomplish other goals.

I think these also work as rationality-by-example training. I think I've learned a mindset of looking for interventions like this largely by seeing people do this.

Regarding bureaucracy day:

1) Is there a list of the sorts of tasks that have been accomplished? I often find myself questioning whether there are bureaucratic things I should be engaging with but am not remembering.

2) I really want someone near me (south bay) to host a bureaucracy day :|

(If you want the full list I'd need an email so I can share the relevant spreadsheet)

  • Schedule passport photo appointment
  • Order glasses
  • Set up account with Vanguard
  • Call insurance company
  • Set up password management system
  • Schedule dentist appointment

I'm in the southbay (Mountain View) and have some interest in a bureaucracy day. I couldn't host at home (apartment is too small) but we could probably find a place if there were more people interested.

I'm in SF and might be up for hosting such a thing at some point.

Weekly reviews

"[weekly review worksheet] Was initially successful, but eventually became useless, and attempts to save it failed. There was also a meta failure where we didn't notice how badly it was failing, and so continued spending time on it."

Can you say more about how it became useless?

My experience (both personally and based on others' experience using Complice) is that weekly reviews tend to be clearly really valuable whenever I do them, but I still often feel like they're not important/urgent, and so I tend to put them off. So then the meta failure is that without doing my weekly review, I don't get into the reflective headspace where I remember how valuable weekly reviews are. It sounds like you guys experienced something different, but I'm not sure.

Btw: the default weekly review questions in Complice are:

• What went really well this week? What did you do that worked?

• What got in the way? What didn't work?

• Based on that, do you want to be approaching things differently?

• What are your priorities for the upcoming week?

These seem to work quite well, although I'm sure they could be further optimized!

Sebastian Marshall recommends these questions, which I also like:

• “What’s really going on?”

• “So what do I do about it?”

• “What matters, what doesn’t?”

Our first iteration had questions extremely similar to yours, actually. I believe we had rewordings of each of those questions.

I don't have a good idea of what made them work, because I only started participating after they'd started to decline. There was a lot of socializing that dragged down discussions, but even when we limited socializing I didn't notice any improvement.

Personally I'm skeptical of the entire endeavour. People claimed lots of positive effects, but then as soon as I tried to measure them they disappeared. I kind of suspect that people notice lots of possibilities during weekly review, and feel like they've accomplished something, but then don't actually follow through.

However I think it's pretty plausible that there exists a useful weekly review structure, so I plan to continue testing them.

I'm not sure if you have any data on your weekly reviews (maybe how often you change a behavior as a result?) but I'd be very interested.

I don't have statistical data on it, but it is generally my experience that doing weekly reviews causes me to choose new priorities for the week, that I wouldn't have chosen otherwise, and to the extent that those priorities are actually better, I then do them.

One of the advantages of doing weekly reviews as part of Complice is that the review system is integrated with a system for intentionally doing things each day, so I suspect it means that any possibilities noticed are more likely to be followed through on.

The integration isn't as good as it could be though, and we have some sketches of a UI that'll make it better. That'll be added sometime this year unless my priorities shift.

I am curious how everyone would be able to go about trying modafinil. I've been interested in trying it for a long time, but I can't bring myself to attempt a sketchy internet order.

Years ago in London, we had one person do a bulk sketchy internet order (legal at the time), and then distribute it (less legal).

The "boiling point of nitrogen" conversational norm may be original to Boston, but the descriptive phrase "boiling point of nitrogen discussion" for the sort of thing the norm aims to avoid was in use at Caltech around 2005 and I'm fairly sure it originated there.

We have at least one person who was part of Caltech culture for a while, so that is probably where we got it.

I love the backthumb idea/norm.

I find that when socializing, people start the conversation off at one branch, and then keep opening up new tangents without ever finishing a branch. Personally I almost always have a moderately strong preference to finish branches. I don't mind tangents, but I do prefer to actually finish the original branch at some point in the somewhat near future. Or at least to explicitly close it. Perhaps this is why I often prefer to talk/socialzie online than in person.

However, I realize that most people have the opposite preference - conversation where you just keep opening up tangents.

Anyway, I think that having a norm is important. Without a norm, I get the sense that interrupting and saying, "Hey, we just went on a tangent. I'd like to get back to what we were originally talking about." would be uncomfortable. In many cases, it violates a social norm. In cases where the social norm is more towards, finishing branches, it could still be uncomfortable for whatever reason.

For me, when a tangent conversations starts to die out, I literally say "So... what do you think about [previous topic]?". The other person will usually laugh, probably because they didn't even realize that they went off on a tangent.

The main value we've gotten from the backthumb norm isn't to get back to the previous topic after a tangent dies out. It's to kill useless/unwanted tangents before we waste time on them.

What is a Habitica?

Habitica is an app for tracking habits and tasks, designed to mimic an RPG, including the part where players form parties.

What are time(d) worksheets?

I like this post, and would like to see more posts like this.

Did you discover why Order of the Sphex failed?

I think it's better to think about it on the question level:

  • There were a lot of questions ("What are you working on now", for instance) that received basically the same answer every time, so they were useless.
  • Some other questions ("What do you need to get off your plate") tended to receive the same answer every time, and they created an ugh field around the relevant thing by repeatedly reminding participants of their failure to do the thing.
  • Some questions were designed to catch rare-but-bad events ("Do you need to refill your medications"). I'm not sure if these were effective, because they didn't exist for long enough to expect them to be answered positively. Bundling them with other Sphex question was a mistake, because they died along with it.
  • By far the most effective question (in terms of directly causing people to do things / change their behavior) was "Is there anything you can take care of right now? If so: do it". Most other questions never did anything, this once got something accomplished ~1/week.

(Worth noting that we were only recording data towards the end of Sphex's life, because that was when I started to organize it, and I care a lot about gathering data).

I've personally replaced Sphex with a set of check-ins spaced to occur at a reasonable rate ("When was the last time you made a git commit?", "When was the last time you read a machine learning paper?"), etc. The general idea of "create check-ins" might work for other people, but the questions are probably too specific to me to be useful.