Comment author: jacob_cannell 09 November 2015 07:45:02PM *  4 points [-]

The classifier isn't breaking down - it was trained to do well across the entire training set using a small amount of computation for each inference and a reasonable (larger) amount of computation for training.

Human's fastest recognition capability still takes 100 ms or so, and operating in that mode (rapid visual presentation), human inference accuracy is considerably less capable than modern ANNs - which classify using less time and also around 1000x less neurons/synapses.

I would bet that humans often make similar mistakes in fast recognition. And even if humans don't make this specific mistake, it doesn't matter because they make more total mistakes in other categories.

The fact that humans can do better given considerably more time and enormously more neural resources is hardly surprising (involving more complex multi-step inference steps).

Also, the ImageNet training criterion is not really a good match for human visual intuitions. It assigns the same penalty for mistaking a dog for a cat as it does for mistaking two closely related species of dogs. Humans have a more sensible hierarchical error allocation. This may be something that is relatively easy to improve low-hanging fruit for ANNs, not sure - but someone is probably working on that if it hasn't already been done.

Comment author: jsteinhardt 10 November 2015 03:53:59AM 1 point [-]

Human's fastest recognition capability still takes 100 ms or so, and operating in that mode (rapid visual presentation), human inference accuracy is considerably less capable than modern ANNs.

This doesn't seem right, assuming that "considerably less capable" means "considerably worse accuracy at classifying objects not drawn from ImageNet". Do you have a study in mind that shows this? In either case, I don't think this is strong enough to support the claim that the classifier isn't breaking down --- it's pretty clearly making mistakes where humans would find the answer obvious. I don't think that saying that the ANN answers more quickly is a very strong defense.

Comment author: Panorama 05 November 2015 08:46:51PM 2 points [-]

Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say “Usually Not” by Andrew C. Chang and Phillip Li

We attempt to replicate 67 papers published in 13 well-regarded economics journals using author-provided replication files that include both data and code. Some journals in our sample require data and code replication files, and other journals do not require such files. Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files. We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable. We conclude with recommendations on improving replication of economics research.

Comment author: jsteinhardt 06 November 2015 05:02:04PM 7 points [-]

Note that their implicit definition of "replicable" is very narrow --- under their procedure, one can fail to be "replicable" simply by failing to reply to an e-mail from the authors asking for code. This is somewhat of a word play, since typically "failure to replicate" means that one is unable to get the same results as the authors while following the same procedure. Based on their discussion at the end of section 3, it appears that (at most) 9 of the 30 "failed replications" are due to actually running the code and getting different results.

Comment author: jsteinhardt 03 November 2015 05:07:33PM 3 points [-]

Thanks for writing this; a couple quick thoughts:

For example, it turns out that a learning algorithm tasked with some relatively simple tasks, such as determining whether or not English sentences are valid, will automatically build up an internal representation of the world which captures many of the regularities of the world – as a pure side effect of carrying out its task.

I think I've yet to see a paper that convincingly supports the claim that neural nets are learning natural representations of the world. For some papers that refute this claim, see e.g.

http://arxiv.org/abs/1312.6199 http://arxiv.org/abs/1412.6572

I think the Degrees of Freedom thesis is a good statement of one of the potential problems. Since it's essentially making a claim about whether a certain very complex statistical problem is identifiable, I think it's very hard to know whether it's true or not without either some serious technical analysis or some serious empirical research --- which is a reason to do that research, because if the thesis is true then that has some worrisome implications about AI safety.

Comment author: SilentCal 27 October 2015 09:37:08PM 0 points [-]

Sure. But if we know or suspect any correlation between A and Y, there's nothing strange about the common information between them being expressed in the prior, right?

Granted, H-T will have nice worst-case performance if we're not confident about A and Y being independent, but that reduces to this debate http://lesswrong.com/lw/k9c/can_noise_have_power/.

Comment author: jsteinhardt 29 October 2015 04:08:14AM 2 points [-]

I wrote up a pretty detailed reply to Luke's question: http://lesswrong.com/lw/kd4/the_power_of_noise/

Comment author: Raelifin 14 October 2015 08:04:48PM 2 points [-]

Every judge being close to 50% would be bizarre. If I flip 13 coins 53 times I would expect that many of those sets of 13 will stray from the 6.5/13 expected ratio. The big question is whether anyone scored high enough or low enough that we can say "this wasn't just pure chance".

Comment author: jsteinhardt 15 October 2015 03:34:53AM 0 points [-]

Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don't think that the analysis done so far provides much evidence about how many judges are doing better than chance. It's possible that there just isn't enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.

Comment author: jsteinhardt 14 October 2015 03:26:22PM 2 points [-]

I think you should distinguish between "average score across judges is close to 50%" and "every single judge is close to 50%". I suspect the latter is not true, as pointed out in one of the other comments.

Comment author: cousin_it 13 October 2015 12:16:22PM *  0 points [-]

The only way I think you could see the Superhappies' solution as acceptable if you don't think jokes or fiction (or other sort of arts involving "deception") are something humans would value as part of their utility function.

Um, that's the opposite of how utility functions work. They don't have sacred components. You can and should trade off one component for a larger gain in another component. That's exactly what the super happies were offering.

Comment author: jsteinhardt 13 October 2015 01:25:50PM 1 point [-]

What why would this be true? Utility functions don't have to be linear, it could even be the case that I place no additional utility on happiness beyond a certain level.

Comment author: [deleted] 12 October 2015 02:25:44AM 4 points [-]

I've only gotten up to doing an MSc (currently volunteering for Vikash Masinghka in my Copious Free Time), but I do know a hell of a lot of academics.

From my (second-hand) knowledge, easy quals are an artifact of something very like economic privilege: your school is very prestigious and doesn't need to cull its grad-student herds as much as others, so quals are allowed to be easy. In other places, quals are used to evict many grad-students from their PhD program because resources are more scarce.

but it seemed like MIT grad students also often started doing research fairly early on, from my perspective as an undergrad there.

I don't know anywhere where grad-students don't start doing research as early as possible. Do some programs really involve whole years of just classes?

In response to comment by [deleted] on Deliberate Grad School
Comment author: jsteinhardt 12 October 2015 02:42:23AM 4 points [-]

In Berkeley CS there are enough course requirements that I don't think people do serious research until their second year (although I'm sure they do some preliminary reading / thinking in year one).

Comment author: 27chaos 04 October 2015 05:44:55PM 2 points [-]

Is there any way to do these things without paying a large pricetag? Could you just lurk around campus or something? Only half-joking here.

be sure to first consider the most useful version of grad that you could reliably make for yourself... and then decide whether or not to do it.

Planning fallacy is going to eat you alive if you use this technique.

Comment author: jsteinhardt 12 October 2015 02:09:02AM 1 point [-]

I don't think lurking around campus is going to lead to the same results as being immersed in a research environment full-time (especially if you're not doing research yourself). I generally think that a large amount of useful knowledge is tacit and that it's hard to absorb without being pretty directly involved.

Also as others have noted, a PhD is free / paid for so (economic) cost isn't that much of a consideration.

Comment author: [deleted] 06 October 2015 11:56:56PM 6 points [-]

PhD programs in mathematics, statistics, philosophy, and theoretical computer science tend to give you a great deal of free time and flexibility, provided you can pass the various qualifying exams without too much studying.

Bolding the parts to which I object.

I have never seen anyone in a rigorous postgraduate program who had a lot of free time and could pass their quals without large amounts of studying.

Of course, I could just be, like magic, on the lower part of the intelligence curve for graduate school, but given that my actual measured IQ numbers are pretty in-the-middle for scientific academia (I won't tell what they are, though), and given that almost everyone else says they have little free time and have to study hard in graduate school, I'm inclined to believe the bolded phrases only accurately describe a narrow slice of lucky individuals.

In response to comment by [deleted] on Deliberate Grad School
Comment author: jsteinhardt 12 October 2015 02:05:12AM 4 points [-]

Are you talking about free time pre- or post-quals? And do you include work that goes towards your thesis but that you "have" to do (e.g. for a conference or internal deadline) as free time or non-free time?

My experience (and I would guess many of my labmates, though I don't know for sure) is that quals are really easy to pass, you spend at most 2 weeks of your life studying for them, and otherwise you're just doing research plus a few classes. Stanford is an outlier in that it has particularly few class requirements compared to other top CS departments, but it seemed like MIT grad students also often started doing research fairly early on, from my perspective as an undergrad there.

Depending on your funding situation, your actual time spent doing research may be more or less beholden to what grants your advisor has to do work towards. I'm on a fellowship and so can do whatever I want, the only consequences being that if my research after 5 years is uninteresting then I'll have trouble getting academic jobs.

View more: Prev | Next