All of Tristan H's Comments + Replies

Anecdote from the NYC rationalist (OBNYC) group: Something I think we'd want that other groups might want too is an easy way for people organizing meetups to post to multiple channels like a website, mailing list, LessWrong and meetup.com.

Another issue we have that others may have too is that we tend to host meetups at people's apartments, and the people hosting don't necessarily want their addresses to be posted publicly. We currently handle this by only posting the address on a Google Group which is configured so you have to "apply" with a text box, and ... (read more)

1Sam Rossini
I'll add another voice agreeing that an easier way to crosspost events would be great. Meetup.com allows repeating events, but LessWrong does not, and I find the LessWrong calendar/time interface so frustrating I've stopped posting events on the community page here. 
1ChristianKl
I completely agree with the first one. It's a task I found annoying enough in the past to limit the spaces where I post announcements. 

You mention having a second office in "the city proper": Would that be referring to Bellingham and Peekskill or Seattle and NYC? Alternatively would working from home some days of the week be viable for many employees?

I ask this because to me these would make the difference for the viability of living mainly in Seattle/NYC and spending 3 days a week at the campus, as opposed to the reverse case of living mainly near the campus and going into the city on weekends.

This isn't a huge difference from the perspective of doing things on weekends, but it makes a d... (read more)

5Rob Bensinger
"The city proper" meaning Bellingham / Peekskill. If we moved to Bellingham, I (speculatively) imagine MIRI organizing trips to Seattle or Vancouver once every week or two, including trips to the big universities in those cities, including big-university meetups once or twice a year. I haven't heard discussion of how much rationalists would personally want to hop back and forth between the cities, and I haven't heard a MIRI employee say they'd prefer to live in Seattle and commute. Having to regularly commute from Seattle to Bellingham sounds doable but pretty unpleasant to me. (Maybe better if you're working weird hours, so you can avoid the worst traffic.) If we moved to Peekskill, I imagine more interaction than that with NYC. (Partly because NYC has more attractions than Seattle/Vancouver; partly because Peekskill has fewer attractions than Bellingham; and partly because the regular trains make it so much more convenient to travel between Peekskill and NYC.) I can more easily imagine worlds where some MIRI staff lived and worked in NYC itself, though I think MIRI's first-pass goal would be to have as many staff as possible working in the Peekskill area. I already do MIRI work from home a lot in Berkeley. (Well, I did pre-COVID; my living arrangement is weird now.) I think MIRI is pretty pragmatic and case-specific about this, rather than having top-down rules. (Though all else equal, having people in the same place where they can readily interact face-to-face seems better to me.)

I went and checked and as far as I can tell they used the same 1024 batch size for the 12 and 6 hour time. The changes I noticed were better normalization, label smoothing, a somewhat tweaked input pipeline (not sure if optimization or refactoring) and updating Tensorflow a few versions (plausibly includes a bunch of hardware optimizations like you're talking about).

The things they took from fast.ai for the 2x speedup were training on progressively larger image sizes, and the better triangular learning rate schedule. Separately for their later submissions,

... (read more)

A relevant paper came out 3 days ago talking about how AlphaGo used Bayesian hyperparameter optimization and how that improved performance: https://arxiv.org/pdf/1812.06855v1.pdf

It's interesting to set the OpenAI compute article's graph to linear scale so you can see that the compute that went into AlphaGo utterly dwarfs everything else. It seems like DeepMind is definitely ahead of nearly everyone else on the engineering effort and money they've put into scaling.

I just checked and seems it was fp32. I agree this makes it less impressive, I forgot to check that originally. I still think this somewhat counts as a software win, because working fp16 training required a bunch of programmer effort to take advantage of the hardware, just like optimization to make better use of cache would.

However, there's also a different set of same-machine datapoints available in the benchmark, where training time on a single Cloud TPU v2 went down from 12 hours 30 minutes to 2 hours 44 minutes, which is a 4.5x speedup similar to the 5

... (read more)

I think that fp32 -> fp16 should give a >5x boost on a V100, so this 5x improvement still probably hides some inefficiencies when running in fp16.

I suspect the initial 15 - > 6 hour improvement on TPUs was also mostly dealing with low hanging fruit and cleaning up various inefficiencies from porting older code to a TPU / larger batch size / etc.. It seems plausible the last factor of 2 is more of a steady state improvement, I don't know.

My take on this story would be: "Hardware has been changing rapidly, giving large speedups, and people... (read more)