All of specbug's Comments + Replies

I wouldn't read too much into this - the challenge was buggy and slow enough that I almost ragequit, and it took me about an hour to start submitting, I expect many people had similarly bad experiences

I had the same experience (50 mins for first problem, as seen in the post). I agree, it is possible that the server issues biased the stats greatly.

The only correctness filters are the hidden testcases (as is standard in most competitive coding competition). You can check the leaderboard - the positions correlate with the cumulative time taken to solve problems & codex assists. If there are any hidden metrics, I wouldn't know.

If so, how was Codex deployed solo? Did they just sample it many times on the same prompt until it produced something that passed the tests? Or something more sophisticated?

They didn't reveal this publicly. We can only guess here.

This makes no sense to me. Do you assume solo-

... (read more)
6tin482
There is no state saving or learning at test time. The prompts were prepended to the API calls, you could see it in the requests
2Vanessa Kosoy
Hmm, I suppose they might be combining the problem statement and the prompt provided by the user into a single prompt somehow, and feeding that to the network? Either that or they're cheating :)

It's hard to monitor most work in the short term, so having the engagements be longer-term makes it possible to adjust job and compensation based on years' of output rather than the latest delivery.


Fair point. I agree, I am exaggerating the effectiveness of certain elements. And downplaying the necessity of others.

Although, there's an inherent survivorship bias to favour a longer-term contract, because we've never experienced an efficient short-term engagement model, at scale, before. But I do believe this adjustment buffer will shorten with time, as the t... (read more)

Yes but there's generally a long enough buffer before the messenger apps change status.

Working on something personal, reading some blog, general web surfing, etc., I feel, constitute 80% of "alt work" sessions. These scenarios won't register on instant-messenger as "away". It is not about going out for a one-hour walk in the middle of the day, without informing anyone. It is these bursts of freedom,  and the ability to switch context, unmonitored.

Also, pinging someone for feedback, checking someone's status or organizing group activities, seems like a less efficient monitoring medium (over constantly being in their range of vision).

2Viliam
The personal activities available during the 8 hours monitored by instant messengers involve mouse and keyboard. Possible: reading a blog, commenting on a blog, writing a blog, watching YouTube videos, reading a book in PDF, doing an online course that does not require installing anything, etc. Not possible: taking a nap, exercising, taking a walk, cooking, etc. The thing I find sad is that all healthy activities seem to be in the latter group. For someone who wants to spend most of the day browsing Reddit and watching cat videos, work from home is a complete blessing. For someone who wants to take care of their health (maybe damaged by years of sedentary work), there are still many advantages (e.g. freedom to choose a chair or standing desk, plus all the useful things in the former group), but the 8-hour block still remains an obstacle to some activities.
3ChristianKl
That's also what people do at the office.