Retrospective on a quantitative productivity logging attempt

femtogrammar

I have a productivity scale I've used to log productivity data for several years. It's a subjective rating system from 1 to 10, and looks something like

1. can’t do anything, even reading. Worktime 0%.

2. can force myself to read or work, but I can barely parse anything. Worktime 5%.

...

5. I don’t want to work and am easily distracted, but I’m getting stuff done. 50%

6. Some mild distractions, but I can stick to a pomodoro timer. Worktime 60%.

...

10. Insane art-level hyperfocus. Worktime 100%.

At the end of each workday I would record how well I thought I'd done by this scale.

I'd been dissatisfied with this for a while – there was no way my brain was accurately tracking what percentage of time I was working, these descriptions are not well defined, don't cleanly map to some level of productivity. I can't prevent internal mapping drift, where my standards slowly rise or fall, such that a day I mark as productivity=4 this week is actually much more productive than a day I marked at 4 several years ago.

I'm invested in having good measurements, because I've been iterating on antidepressants and ADD meds for years, and having data on which ones are working on what metrics (I also track mood, exercise, sleep, social time) would be very useful for having a better life.

So I wrote a small Python script that tracked how much time I spent working at my job. Every time I took a break or went back to work, I'd mark it. If I noticed I'd started working during breaktime or zoning out during worktime, I'd 'revert' however many minutes I thought it had been to be of the other type. I also had 'dead' time where I wasn't getting anything done for reasons unrelated to my productivity, which I used to mark meetings or lunch breaks. At the end of the session, it would spit out a summary of how much time I'd spent working vs resting. This was a much more accurate, quantitative way of measuring what I wanted, or so I thought. I used it for a month and a week before I stopped.

Here's why I quit it.

As is frequently the case, using part of a system to monitor that system changes that system enough that the output of the monitoring is less useful. On bad attention days, I would be switching back and forth between work every two minutes. The work/break context switches were costlier, and seeing how short my work periods were lowered my mood.
The varying difficulty of tasks threw off the measurement. Some days I'd have monotonous but easy work, sometimes I'd have one tricky complicated thing that took a lot of brainpower and persistence. When I was using the subjective scale, I'd adjust my score for perceived task-easiness. But when I was using the time tracker, a day that was spent on "going through the codebase and adjusting every method call to do a new thing" would be logged as productive, even though I could have done that on that even on a bad day.
My personal productivity scale ranged between 0% and 100% productivity, and my ratings usually fall between 2 (subjective 5% work time) to 6 (60%). But my time-tracked work time usually hovered around 50%. I have an internal sense of "I haven't done enough work, I really need to do something" that kicks in and makes me do work-like activities badly, slowly, and ineffectively. For example, I'll read some documentation, staring at one sentence at a time, forcing myself to process it before moving onto the next one. That counts as work by the tracker time -- I'm certainly not resting -- and I can fill half my workday with that. At the end, the tracker will say I worked 50% of the time, but my subjective scale would say it was a 2. And I think my subjective scale is more correct.

I considered the idea of rating work periods every time I ended one, so that after spending an hour laboriously shoving sentences against my eyeballs, I'd indicate to the program that "taking break now; also, that last work period was only 10% of a real work period". But that goes right back to the problem of subjectively rated work, plus adds to the already-painful overhead I described in point 1.

That was an interesting lesson to learn. In the future, when I'm trying to measure something, I'll try to ask myself

How will integrating monitoring into my system change my system?
Describe a day that the proposed monitoring system would give a high score, but actually should have a bad score. How common do you expect these days to be?
Describe a day that would get a low score should have a high score, etc.
How much overhead do you expect this to add?

[-]Gordon Seidoh Worley6y40

I think it's somewhat a matter of personal taste, but like you I've found such attempts to quantify my life dissatisfying, although I know others who get a lot out of such attempts. I general fall in the direction of not bothering to measure hard to measure things if I don't have to, and when I'm reluctantly forced to do it I try to use very gross measurements to match the poor precision possible in such cases. Having the precision of the measurement match the level of precision you can achieve helps avoid getting confused by the numbers and thinking you have more information than you do.

[-]Natália6y20

Who are you and how is it that we don't we know each other yet?

[-]femtogrammar6y30

Hmm, I looked you up on Facebook and apparently you sent me a friend request god-knows-when (which I presumably ignored because I didn't know you), which I have just accepted.

[-]Morpheus4y10

Thanks for your post! I also tried to track my productivity and mood (to get better about reflection and to track the impact of adhd medication) in my journal until I got frustrated with how much these metrics seemed to shift over time when I reviewed them in retrospect. It could be my memory that tricks me, but I think at some point, I slipped into the habit of rating my mood for the day as a 4 just because I didn't take enough time to reflect (my scale goes from 1-5). I also think that the ratings were much more influenced by how I felt in the evening. Descriptions like the ones in your post could make this problem less bad: descriptions along the lines of "4: feeling neutral for most of the day, but 2 hours that were very nice" (I'll have to think about this scale more) might make this work for me again.

LESSWRONG
LW

25

Retrospective on a quantitative productivity logging attempt

25

25