In response to Chicago Meetup
Comment author: marks 22 May 2010 05:02:03AM 1 point [-]

I'd meet on June 6 (tentatively). South side is preferable if there are other people down here.

Comment author: [deleted] 18 January 2010 09:45:43AM 3 points [-]

Your Gian-Carlo Rota link talks about tricks used in proofs, which is different from sub-fields of mathematics. It's certainly true that modern mathematicians and scientists are hyperspecialized, but that's not the same thing.

There is a trick for your parenthesized link: hex-escape the parentheses. Here's the link itself: http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice%28PsychologicalReview%29.pdf Here are cute fluffy kittens.

Comment author: marks 18 January 2010 07:33:49PM 2 points [-]

Thanks for the link assistance.

I agree that my mathematics example is insufficient to prove the general claim: "One will master only a small number of skills". I suppose a proper argument would require an in-depth study of people who solve hard problems.

I think the essential point of my claim is that there is high variance with respect to the subset of the population that can solve a given difficult problem. This seems to be true in most of the sciences and engineering to the best of my knowledge (though I know mathematics best). The theory I believe that explains why this variation occurs is that the subset of people which can solve a given problem use unconscious heuristics borne out of the hard work they put into previous problems over many years.

Admittedly, the problems I am thinking about are kind of like NP problems: it seems difficult to find a solution, but once a solution is found we can know it when we see it. There tends to be a large number of such problems that can be solved by only a small number of people. And the group of people that can solve them varies a lot from problem to problem.

There are also many hard problems for which it is hard to say what a good solution is (e.g. it seems difficult to evaluate different economic policies), or the "goodness" of a solution varies a lot with different value systems (e.g. abortion policy). It does seem that in these instances politicians claim they can give good answers to all the problems as do management consulting companies. Public intellectuals and pundits also seem to think they can give good answers to lots of questions as well. I suppose that if they are right then my claim is wrong. I argue that such individuals and organizations claim to be able to solve many problems but since its hard to verify the quality of the solutions we should take the claim with a grain of salt. We know that individuals who can solve lots of problems would have a lot of status so there is a clear incentive to claim to be able to solve problems that one cannot actually solve if verifying the solution is sufficiently costly.

I also think there is a good reason to think that even for those problems whose solutions are difficult to evaluate we should expect only a small number of people to actually give a good solution. The reason relates to a point made by Robin Hanson (and myself in another comment) which is that in solving a problem you should try to solve many at once. A good solution to a problem should give insight to many problems. Conversely, to understand and recognize a good solution to a given hard problem one should understand what it says about many other problems. The space of problems is too vast for any human being to know but a small portion, so I expect that people who are able to solve a given problem should only be those aware of many related problems and that most people will not be aware of the related problems. Given that in our civilization different people are exposed to different problems (no matter in which field they are employed) we should expect high variance of who can solve which hard problems.

Comment author: JRMayne 18 January 2010 01:30:09AM 11 points [-]

Great topic.

Ask someone who has been in a similar situation or solved a similar question or tried to solve a similar question. This may seem obvious, but is often ignored.

Be ready to recognize a bad answer when you see it. Sometimes, what looks like a good answer at the outset develops fatal problems. Don't vest too deeply in your answer. But don't be afraid to keep your answer just because others view it as different or scary.

Comment author: marks 18 January 2010 03:03:53AM *  5 points [-]

Asking other people who have solved a similar problem to evaluate your answer is very powerful and simple strategy to follow.

Also, most evidence I have seen is that you can only learn how to do a small number of things well. So if you are solving something outside of your area of expertise (which probably includes most problems you'll encounter during your life) then there is probably somebody out there who can give a much better answer than you (although the cost to find such a person may be too great).

Post Note: The fact that you can only learn a few things really well seems to be true with mathematics: as in here. More generally, mastering a topic seems to take ten years or so [PDF] (see Edit below).

Edit: The software does not seem to allow for links that have parentheses, so you would need to copy the whole link--including the ".pdf" at the end--in order to actually pull up the document.

Edit Jan 18: Hex-escaped the parentheses so it should work better.

Comment author: marks 18 January 2010 02:29:51AM *  6 points [-]

Expanding on the go meta point:

Solve many hard problems at once

Whatever solution you give to a hard problem should give insight or be consistent with answers given to other hard problems. This is similar in spirit to: "http://lesswrong.com/lw/1kn/two_truths_and_a_lie/" and a point made by Robin Hanson (Youtube link: the point is at 3:31) "...the first thing to do with puzzles is [to] try to resist the temptation to explain them one at a time. I think the right, disciplined way to deal puzzles is to collect a bunch of them: lay them all out on the table and find a small number of hypotheses that can explain a large number of puzzles at once."

His point as I understand was that people often narrowly focus on a limited number of health-related puzzles and that we could produce better policy if we attempted to attack many puzzles at once (consider things such as fear of death, the need to show we care, status-regulation, human social dynamics: particularly signaling loyalty).

Edit: I had originally meant to point out that solving several problems is a meta-thought about solutions to problems: i.e. they should relate to solutions to other problems

Comment author: marks 12 January 2010 05:42:45PM 11 points [-]

From: You and Your Research

When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you.

Here is another mechanism by which status could make you "stupid", although I'm interpreting stupid in a different sense: as in making one less productive than one otherwise might. Although, I think the critique could be more general.

Its generally only worth talking about things that we can make progress in understanding so if you have an inflated sense of what you can accomplish then you might try to think about and discuss things that you cannot advance. So you end up wasting your mental efforts more and you fall behind on other areas that would have been a better use of your talents.

In response to Two Truths and a Lie
Comment author: bgrah449 23 December 2009 08:12:52PM 1 point [-]

No, you shouldn't make broad inferences about human behaviour without any data because they are consistent with evolution, unless your application of the theory of evolution is so precise and well-informed that you can consistently pass the Two-Truths-and-a-Lie Test.

This sentence could have beneficially ended after "behavior," "data," or "evolution." The last clause seems to be begging the question - why am I assuming the Two-Truths-and-a-Lie Test is so valuable? Shouldn't the test itself be put to some kind of test to prove its worth?

Comment author: marks 28 December 2009 08:02:31PM 0 points [-]

I think that it should be tested on our currently known theories, but I do think it will probably perform quite well. This is on the basis that its analogically similar to cross validation in the way that Occam's Razor is similar to the information criteria (Aikake, Bayes, Minimum Description Length, etc.) used in statistics.

I think that, in some sense, its the porting over of a statistical idea to the evaluation of general hypotheses.

In response to Two Truths and a Lie
Comment author: marks 28 December 2009 07:45:12AM 0 points [-]

I think this is cross-validation for tests. There have been several posts on Occam's Razor as a way to find correct theories, but this is the first I have seen on cross-validation.

In machine learning and statistics, a researcher often is trying to find a good predictor for some data and they often have some "training data" on which they can use to select the predictor from a class of potential predictors. Often one has more than one predictor that performs well on the training data so the question is how else can one choose an appropriate predictor.

One way to handle the problem is to use only a class of "simple predictors" (I'm fudging details!) and then use the best one: that's Occam's razor. Theorists like this approach and usually attach the word "information" to it. The other "practitioner" approach is use a bigger class of predictors where you tune some of the parameters on one part of the data and tune other parameters (often hyper-parameters if you know the jargon) on a separate part of the data. That's the cross-validation approach.

There's some results on the asymptotic equivalence of the two approaches. But, what's cool about this post is that I think it offers a way to apply cross-validation to an area where I have never heard it discussed (I think, in part, because its the method of the practitioner and not so much the theorist--there are exceptions of course!)

Comment author: Eliezer_Yudkowsky 16 August 2009 08:46:59PM 5 points [-]

Sorry about that. Our first diavlog was better, IMHO, and included some material about whether rationality benefits a rationalist - but that diavlog was lost due to audio problems. Maybe we should do another for topics that would interest our respective readers. What would you want me to talk about with Scott?

Comment author: marks 18 August 2009 03:27:55PM 1 point [-]

I would like to see more discussion on the timing of artificial super intelligence (or human level intelligence). I really want to understand the mechanics of your disagreement.

In response to comment by marks on Bayesian Flame
Comment author: janos 29 July 2009 06:04:34AM *  1 point [-]

Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).

How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think of many interesting group actions that apply. Though of course 0 and 1 should be treated symmetrically; but that's trivial to arrange.

I guess you're right that regularities can be described more generally with computational models; but I expect them to be harder to deal with than this (relatively) simple, noncomputational (though stochastic) model. I'm not looking for regularities among the models, so I'm not sure how a computational model would help me.

In response to comment by janos on Bayesian Flame
Comment author: marks 05 August 2009 06:00:25AM 0 points [-]

One issue with say taking a normal distribution and letting the variance go to infinity (which is the improper prior I normally use) is that the posterior distribution distribution is going to have a finite mean, which may not be a desired property of the resulting distribution.

You're right that there's no essential reason to relate things back to the reals, I was just using that to illustrate the difficulty.

I was thinking about this a little over the last few days and it occurred to me that one model for what you are discussing might actually be an infinite graphical model. The infinite bi-directional sequence here are the values of bernoulli-distributed random variables. Probably the most interesting case for you would be a Markov-random field, as the stochastic 'patterns' you were discussing may be described in terms of dependencies between random variables.

Here's three papers I read a little while back on the topic (and related to) something called an Indian Buffet process: (http://www.cs.utah.edu/~hal/docs/daume08ihfrm.pdf) (http://cocosci.berkeley.edu/tom/papers/ibptr.pdf) (http://www.cs.man.ac.uk/~mtitsias/papers/nips07.pdf)

These may not quite be what you are looking for since they deal with a bound on the extent of the interactions, you probably want to think about probability distributions of binary matrices with an infinite number of rows and columns (which would correspond to an adjacency matrix over an infinite graph).

In response to comment by marks on Bayesian Flame
Comment author: PhilGoetz 04 August 2009 05:22:22PM *  1 point [-]

Thanks much!

and one has chosen a prior for those parameters such that there is non-zero weight on the true values then the Bernstein-von Mises theorem guarantees that the Bayesian posterior distribution and the maximum likelihood estimate converge to the same probability distribution (although you may need to use improper priors)

What do "non-zero weight" and "improper priors" mean?

EDIT: Improper priors mean priors that don't sum to one. I would guess "non-zero weight" means "non-zero probability". But then I would wonder why anyone would introduce the term "weight". Perhaps "weight" is the term you use to express a value from a probability density function that is not itself a probability.

In response to comment by PhilGoetz on Bayesian Flame
Comment author: marks 05 August 2009 05:42:21AM *  1 point [-]

No problem.

Improper priors are generally only considered in the case of continuous distributions so 'sum' is probably not the right term, integrate is usually used.

I used the term 'weight' to signify an integral because of how I usually intuit probability measures. Say you have a random variable X that takes values in the real line, the probability that it takes a value in some subset S of the real line would be the integral of S with respect to the given probability measure.

There's a good discussion of this way of viewing probability distributions in the wikipedia article. There's also a fantastic textbook on the subject that really has made a world of difference for me mathematically.

View more: Prev | Next