LESSWRONG
LW

All of marks's Comments + Replies

This isn't precisely what Daniel_Burfoot was talking about but its a related idea based on "sparse coding" and it has recently obtained good results in classification:

http://www.di.ens.fr/~fbach/icml2010a.pdf

Here the "theories" are hierarchical dictionaries (so a discrete hierarchy index set plus a set of vectors) which perform a compression (by creating reconstructions of the data). Although they weren't developed with this in mind, support vector machines also do this as well, since one finds a small number of "support vectors&q... (read more)

Significance of Compression Rate Method

marks15y20

(A text with some decent discussion on the topic)[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html]. At least one group that has a shot at winning a major speech recognition benchmark competition uses information-theoretic ideas for the development of their speech recognizer. Another development has been the use of error-correcting codes to assist in multi-class classification problems (google "error correcting codes machine learning")[http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=error+correcting+codes+machine+learni... (read more)

0Richard_Kennaway15y

Thanks to you and PhilGoetz for those references. I have updated my estimate of the subject.

Significance of Compression Rate Method

marks15y00

I have a minor disagreement, which I think supports your general point. There is definitely a type of compression going on in the algorithm, it's just that the key insight in the compression is not to just "minimize entropy" but rather make the outputs of the encoder behave in a similar manner as the observed data. Indeed, that was one of the major insights in information theory is that one wants the encoding scheme to capture the properties of the distribution over the messages (and hence over alphabets).

Namely, in Hinton's algorithm the out... (read more)

Cultivating our own gardens

marks15y10

This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn't even a problem.

Not every function can be approximated efficiently, though. I see the scope of morality as addressing human activity where human activity is a function space itself. In this case the "moral gradient" that the consequentialist is computing is based on a functional defined over a function space. There are plenty of function spaces and functionals which are very... (read more)

0PhilGoetz15y

What is the justification for the incoherence argument? Is there a reason, or is it just "I won't believe in your utility function until I see it"?

Cultivating our own gardens

marks15y00

I would like you to elaborate on the incoherence of deontology so I can test out how my optimization perspective on morality can handle the objections.

0PhilGoetz15y

Can you explain the difference between deontology and moral absolutism first? Because I see it as deontology = moral absolutism, and claims that they are not the same as based on blending deontology + consequentialism and calling the blend deontology.

Cultivating our own gardens

marks15y30

To be clear I see the deontologist optimization problem as being a pure "feasibility" problem: one has hard constraints and zero gradient (or approximately zero gradient) on the moral objective function given all decisions that one can make.

Of the many, many critiques of utilitarianism some argue that its not sensible to actually talk about a "gradient" or marginal improvement in moral objective functions. Some argue this on the basis of computational constraints: there's no way that you could ever reasonably compute a moral objective ... (read more)

0PhilGoetz15y

This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn't even a problem. A utility function is more well-defined than any other approach to ethics. How do a deontologist's rules fare any better? A utility function /provides/ meaning. A set of rules is just an incomplete utility function, where someone has picked out a set of values, but hasn't bothered to prioritize them.

Cultivating our own gardens

marks15y00

I would argue that deriving principles using the categorical imperative is a very difficult optimization problem and that there is a very meaningful sense in which one is a deontologist and not a utilitarian. If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated). In the Kantian approach: given a situation, one has to derive the constraints under which one must act in that situation via moral thinking then one must accord to those constraints.

This is very closely ... (read more)

1PhilGoetz15y

If all you require is to not violate any constraints, and you have no preference between worlds where equal numbers of constraints are violated, and you can regularly achieve worlds in which no constraints are violated, then perhaps constraint-satisfaction is qualitatively different. In the real world, linear programming typically involves a combination of hard constraints and penalized constraints. If I say the hard-constraint solver isn't utilitarian, then what term would I use to describe the mixed-case problem? The critical thing to me is that both are formalizing the problem and trying to find the best solution they can. The objections commonly made to utilitarianism would apply equally to moral absolutism phrased as a hard constraint problem. There's the additional, complicating problem that non-utilitarian approaches may simply not be intelligible. A moral absolutist needs a language in which to specify the morals; the language is so context-dependent that the morals can't be absolute. Non-utilitarian approaches break down when the agents are not restricted to a single species; they break down more when "agent" means something like "set".

Cultivating our own gardens

marks15y10

I agree with the beginning of your comment. I would add that the authors may believe they are attacking utilitarianism, when in fact they are commenting on the proper methods for implementing utilitarianism.

I disagree that attacking utilitarianism involves arguing for different optimization theory. If a utilitarian believed that the free market was more efficient at producing utility then the utilitarian would support it: it doesn't matter by what means that free market, say, achieved that greater utility.

Rather, attacking utilitarianism involves arguing... (read more)

0PhilGoetz15y

If you're optimizing, you're a form of utilitarian. Even if all you're optimizing is "minimize the number of times Kant's principles X, Y, and Z are violated". This makes the utilitarian/non-utilitarian distinction useless, which I think it is. Everybody is either a utilitarian of some sort, a nihilist, or a conservative, mystic, or gambler saying "Do it the way we've always done it / Leave it up to God / Roll the dice". It's important to recognize this, so that we can get on with talking about "utility functions" without someone protesting that utilitarianism is fundamentally flawed. The distinction I was drawing could be phrased as between explicit utilitarianism (trying to compute the utility function) and implicit utilitarianism (constructing mechanisms that you expect will maximize a utility function that is implicit in the action of a system but not easily extracted from it and formalized).

Diseased thinking: dissolving questions about disease

marks15y40

Bear in mind that having more fat means that the brain gets starved of (glucose)[http://www.loni.ucla.edu/~thompson/ObesityBrain2009.pdf] and blood sugar levels have (impacts on the brain generally)[http://ajpregu.physiology.org/cgi/content/abstract/276/5/R1223]. Some research has indicated that the amount of sugar available to the brain has a relationship with self-control. A moderately obese person may have fat cells that steal so much glucose from their brain that their brain is incapable of mustering the will in order to get them to stop eating poorl... (read more)

Cultivating our own gardens

marks15y30

I think that this post has something to say about political philosophy. The problem as I see it is that we want to understand how our local decision-making affects the global picture and what constraints should we put on our local decisions. This is extremely important because, arguably, people make a lot of local decisions that make us globally worse off: such as pollution ("externalities" in econo-speak). I don't buy the author's belief that we should ignore these global constraints: they are clearly important--indeed its the fear of the pot... (read more)

Cultivating our own gardens

marks15y10

I think there is definitely potential to the idea, but I don't think you pushed the analogy quite far enough. I can see an analogy between what is presented here and human rights and to Kantian moral philosophy.

Essentially, we can think of human rights as being what many people believe to be an essential bare-minimum conditions on human treatment. I.e. that the class of all "good and just" worlds everybody's human rights will be respected. Here human rights corresponds to the "local rigidity" condition of the subgraph. In general, t... (read more)

Significance of Compression Rate Method

marks15y10

All the sciences mentioned above definitely do rely on controlled experimentation. But their central empirical questions are not amenable to being directly studied by controlled experimentation. We don't have multiple earths or natural histories upon which we can draw inference about the origins of species.

There is a world of difference between saying "I have observed speciation under these laboratory conditions" and "speciation explains observed biodiversity". These are distinct types of inferences. This of course does not mean th... (read more)

Significance of Compression Rate Method

marks15y20

I think we are talking past each other. I agree that those are experiments in a broad and colloquial use of the term. They aren't "controlled" experiments: which is a term that I was wanting to clarify (since I know a little bit about it). This means that they do not allow you to randomly assign treatments to experimental units which generally means that the risk of bias is greater (hence the statistical analysis must be done with care and the conclusions drawn should face greater scrutiny).

Pick up any textbook on statistical design or statis... (read more)

Significance of Compression Rate Method

marks15y00

I think it's standard in the literature: "The word experiment is used in a quite precise sense to mean an investigation where the system under study is under the control of the investigator. This means that the individuals or material investigated, the nature of the treatments or manipulations under study and the measurement procedures used are all settled, in their important features at least, by the investigator." The theory of the design of experiments

To be sure there are geological experiments where one, say, takes rock samples and subjects ... (read more)

0timtyler15y

Natural experiments are experiments too. See: http://en.wikipedia.org/wiki/Natural_experiment http://en.wikipedia.org/wiki/Experiment http://dictionary.reference.com/browse/experiment I think the usage in the cited book is bad and unorthodox. E.g. one can still study storms experimentally - though nobody can completely control a storm.

Significance of Compression Rate Method

marks15y30

Those sciences are based on observations. Controlled experimentation requires that you have some set of experimental units to which you randomly assign treatments. With geology, for instance, you are trying to figure out the structure of the Earth's crust (mostly). There are no real treatments that you apply, instead you observe the "treatments" that have been applied by the earth to the earth. I.e. you can't decide which area will have a volcano, or an earthquake: you can't choose to change the direction of a plate or change the configurati... (read more)

4JoshuaZ15y

Of the examples given, some of them certainly involve controled experiments in the classical sense. Evolutionary biology for example involves tests of genetic drift and speciation in the lab environment. For example, one matter that has been extensively tested in labs is different speciation mechanisms. The founder-effect mechanism is one that is particularly easy to test in a lab. For one major paper on the subject see this paper. A much older example is speciation by hybridization which has been tested in controlled lab environments for about a century now. The oldest I'm aware of in that regard is a 1912 paper by Digby (I haven't read it, and I'd have to go look up the citation but it shouldn't be hard to find ), and there have been many papers since then on the same topic. Edit:Citation for Digby according to TOA is Digby, L. 1912. The cytology of Primula kewensis and of other related Primula hybrids. Ann. Bot. 26:357-388.

3timtyler15y

I think that is a non-standard interpretation of the terminology: "A controlled experiment generally compares the results obtained from an experimental sample against a control sample, which is practically identical to the experimental sample except for the one aspect whose effect is being tested (the independent variable)." http://en.wikipedia.org/wiki/Experiment#Controlled_experiments It never says the control sample has been influenced by the experimenter. It could instead be chosen by the experimenter - from the available spectrum of naturally-occurring phenomena.

Be a Visiting Fellow at the Singularity Institute

marks15y10

Bear in mind that the people who used steam engines to make money didn't make it by selling the engines: rather, the engines were useful in producing other goods. I don't think that the creators of a cheap substitute for human labor (GAI could be one such example) would be looking to sell it necessarily. They could simply want to develop such a tool in order to produce a wide array of goods at low cost.

I may think that I'm clever enough, for example, to keep it in a box and ask it for stock market predictions now and again. :)

As for the "no free lun... (read more)

Link: Strong Inference

marks15y20

Go to 1:00 minute here

"Building the best possible programs" is what he says.

0timtyler15y

Ah, what he means is having an agent which will sort through the available programs - and quickly find one that efficiently does the specified task.

Link: Strong Inference

marks15y30

It actually comes from Peter Norvig's definition that AI is simply good software, a comment that Robin Hanson made: , and the general theme of Shane Legg's definitions: which are ways of achieving particular goals.

I would also emphasize that the foundations of statistics can (and probably should) be framed in terms of decision theory (See DeGroot, "Optimal Statistical Decisions" for what I think is the best book on the topic, as a further note the decision-theoretic perspective is neither frequentist nor Bayesian: those two approaches can be unde... (read more)

-1timtyler15y

Surely Peter Norvig never said that!

Link: Strong Inference

marks15y00

The fact that there are so many definitions and no consensus is precisely the unclarity. Shane Legg has done us all a great favor by collecting those definitions together. With that said, his definition is certainly not the standard in the field and many people still believe their separate definitions.

I think his definitions often lack an understanding of the statistical aspects of intelligence, and as such they don't give much insight into the part of AI that I and others work on.

Link: Strong Inference

marks15y10

I think there is a science of intelligence which (in my opinion) is closely related to computation, biology, and production functions (in the economic sense). The difficulty is that there is much debate as to what constitutes intelligence: there aren't any easily definable results in the field of intelligence nor are there clear definitions.

There is also the engineering side: this is to create an intelligence. The engineering is driven by a vague sense of what an AI should be, and one builds theories to construct concrete subproblems and give a framework ... (read more)

3Daniel_Burfoot15y

Interesting that you're taking into account the economic angle. Is it related to Eric Baum's ideas (e.g. "Manifesto for an evolutionary economics of intelligence")? Right, so in Kuhnian terms, AI is in a pre-paradigm phase where there is no consensus on definitions or frameworks, and so normal science cannot occur. That implies to me that people should spend much more time thinking about candidate paradigms and conceptual frameworks, and less time doing technical research that is unattached to any paradigm (or attached to a candidate paradigm that is obviously flawed).

4timtyler15y

Re: there aren't any easily definable results in the field of intelligence nor are there clear definitions. There are pretty clear definitions: http://www.vetta.org/definitions-of-intelligence/

Chicago Meetup

marks15y10

I'd meet on June 6 (tentatively). South side is preferable if there are other people down here.

Tips and Tricks for Answering Hard Questions

marks15y30

Thanks for the link assistance.

I agree that my mathematics example is insufficient to prove the general claim: "One will master only a small number of skills". I suppose a proper argument would require an in-depth study of people who solve hard problems.

I think the essential point of my claim is that there is high variance with respect to the subset of the population that can solve a given difficult problem. This seems to be true in most of the sciences and engineering to the best of my knowledge (though I know mathematics best). The theory I ... (read more)

Tips and Tricks for Answering Hard Questions

marks15y60

Asking other people who have solved a similar problem to evaluate your answer is very powerful and simple strategy to follow.

Also, most evidence I have seen is that you can only learn how to do a small number of things well. So if you are solving something outside of your area of expertise (which probably includes most problems you'll encounter during your life) then there is probably somebody out there who can give a much better answer than you (although the cost to find such a person may be too great).

Post Note: The fact that you can only learn a few th... (read more)

3[anonymous]15y

Your Gian-Carlo Rota link talks about tricks used in proofs, which is different from sub-fields of mathematics. It's certainly true that modern mathematicians and scientists are hyperspecialized, but that's not the same thing. There is a trick for your parenthesized link: hex-escape the parentheses. Here's the link itself: http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice%28PsychologicalReview%29.pdf Here are cute fluffy kittens.

Tips and Tricks for Answering Hard Questions

marks15y100

Expanding on the go meta point:

Solve many hard problems at once

Whatever solution you give to a hard problem should give insight or be consistent with answers given to other hard problems. This is similar in spirit to: "http://lesswrong.com/lw/1kn/two_truths_and_a_lie/" and a point made by Robin Hanson (Youtube link: the point is at 3:31) "...the first thing to do with puzzles is [to] try to resist the temptation to explain them one at a time. I think the right, disciplined way to deal puzzles is to collect a bunch of them: lay them all out ... (read more)

8matt15y

Did you know that you can add #t=211 to the end of a YouTube URL to make it start 211 seconds into the vid? Your link would become "point made by Robin Hanson".

High Status and Stupidity: Why?

marks15y150

From: You and Your Research

When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you.

Here is another mechanism by which status could make you "stupid", alth... (read more)

Two Truths and a Lie

marks15y00

I think that it should be tested on our currently known theories, but I do think it will probably perform quite well. This is on the basis that its analogically similar to cross validation in the way that Occam's Razor is similar to the information criteria (Aikake, Bayes, Minimum Description Length, etc.) used in statistics.

I think that, in some sense, its the porting over of a statistical idea to the evaluation of general hypotheses.

Two Truths and a Lie

marks15y00

I think this is cross-validation for tests. There have been several posts on Occam's Razor as a way to find correct theories, but this is the first I have seen on cross-validation.

In machine learning and statistics, a researcher often is trying to find a good predictor for some data and they often have some "training data" on which they can use to select the predictor from a class of potential predictors. Often one has more than one predictor that performs well on the training data so the question is how else can one choose an appropriate predic... (read more)

Bloggingheads: Yudkowsky and Aaronson talk about AI and Many-worlds

marks16y10

I would like to see more discussion on the timing of artificial super intelligence (or human level intelligence). I really want to understand the mechanics of your disagreement.

Bayesian Flame

marks16y00

One issue with say taking a normal distribution and letting the variance go to infinity (which is the improper prior I normally use) is that the posterior distribution distribution is going to have a finite mean, which may not be a desired property of the resulting distribution.

You're right that there's no essential reason to relate things back to the reals, I was just using that to illustrate the difficulty.

I was thinking about this a little over the last few days and it occurred to me that one model for what you are discussing might actually be an infini... (read more)

Bayesian Flame

marks16y10

No problem.

Improper priors are generally only considered in the case of continuous distributions so 'sum' is probably not the right term, integrate is usually used.

I used the term 'weight' to signify an integral because of how I usually intuit probability measures. Say you have a random variable X that takes values in the real line, the probability that it takes a value in some subset S of the real line would be the integral of S with respect to the given probability measure.

There's a good discussion of this way of viewing probability distributions in ... (read more)

Open Thread: August 2009

marks16y10

I think you're making an important point about the uncertainty of what impact our actions will have. However, I think the right way to about handling this issue is to put a bound on what impacts of our actions are likely to be significant.

As an extreme example, I think I have seen much evidence that clapping my hands once right now will have essentially no impact on the people living in Tripoli. Very likely clapping my hands will only affect myself (as no one is presently around) and probably in no huge way.

I have not done a formal statistical model to a... (read more)

0[anonymous]16y

Good point, cutting off very low-impact consequences is a necessary addition to keep you from spending forever worrying. I think you could apply the significance cutoff when making the initial list of consequences, then assign probabilities and uncertainty to those consequences that made the cut. Your example also reminded me of butterflies and hurricanes. It's sensible to have a cutoff for extremely low probabilities too (there is some chance that clapping your hands will cause a hurricane, but it's not worth considering). The probability bound would solve the problem of cascading consequences too. For a choice, you can make some probability distribution that it will, say, benefit your child. You can then take each scenario you've thought of and ranked as significant and possible, and consider the impact on your grandchildren. But now you're multiplying probabilities, and in most cases will quickly end up with insignificantly small probabilities for each secondary consequence, not worth worrying about. (Something seems off with this idea I just added to yours - I feel like there should be some relation between the difference in probability and the difference in value, but I'm not sure if that's actually so, or what it should be.)

Open Thread: August 2009

marks16y00

There's another issue too, which is that it is extraordinarily complicated to assess what the ultimate outcome of particular behavior is. I think this opens up a statistical question of what kinds of behaviors are "significant", in the sense that if you are choosing between A and B, is it possible to distinguish A and B or are they approximately the same.

In some cases they won't be, but I think that in very many they would.

1billswift16y

That's why I believe a person is responsible for the foreseeable consequences of their actions. If the chain of effects is so convoluted that a particular result cannot be foreseen than it should not be used to access the reasonableness of a person's actions. Which is why I think general principles should guide large areas of our actions, such as refraining from coercion and fraud, even for a consequentialist.

Bayesian Flame

marks16y10

What topology are you putting on this set?

I made the point about the real numbers because it shows that putting a non-informative prior on the infinite bidirectional sequences should be at least as hard as for the real numbers (which is non-trivial).

Usually a regularity is defined in terms of a particular computational model, so if you picked Turing machines (or the variant that works with bidirectional infinite tape, which is basically the same class as infinite tape in one direction), then you could instead begin constructing your prior in terms of Turing machines. I don't know if that helps any.

1janos16y

Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them). How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think of many interesting group actions that apply. Though of course 0 and 1 should be treated symmetrically; but that's trivial to arrange. I guess you're right that regularities can be described more generally with computational models; but I expect them to be harder to deal with than this (relatively) simple, noncomputational (though stochastic) model. I'm not looking for regularities among the models, so I'm not sure how a computational model would help me.

Bayesian Flame

marks16y10

You can actually simulate a tremendous number of distributions (and theoretically any to an arbitrary degree of accuracy) by doing an approximate inverse CDF applied to a standard uniform random variable see here for example. So the space of distributions from which you could select to do your test is potentially infinite. We can then think of your selection of a probability distribution as being a random experiment and model your selection process using a probability distribution.

The issue is that since the outcome space is the space of all computable p... (read more)

0RolfAndreassen16y

The small samples are a constraint imposed by the limits of blog comments; there's a limit to how many numbers I would feel comfortable spamming this place with. If we got some volunteers, we might do a more serious sample size using hosted ROOT ntuples or zipping up some plain ASCII. I do know how to sample from arbitrary distributions; I should have specified that the space of distributions is those for which I don't have to think for more than a minute or so, or in other words, someone has already coded the CDF in a library I've already got installed. It's not knowledge but work that's the limiting factor. :) Presumably this limits your prior quite a lot already, there being only so many commonly used math libraries.

Bayesian Flame

marks16y00

In finite dimensional parameter spaces sure, this makes perfect sense. But suppose that we are considering a stochastic process X1, X2, X3, .... where Xn is follows a distribution Pn over the integers. Now put a prior on the distribution and suppose that unbeknown to you Pn is the distribution that puts 1/2 probability weight on -n and 1/2 probability weight on n. If the prior on the stochastic process does not put increasing weight on integers with large absolute value, then in the limit the prior puts zero probability weight on the true distribution (a... (read more)

Bayesian Flame

marks16y20

There's a difficulty with your experimental setup in that you implicitly are invoking a probability distribution over probability distributions (since you represent a random choice of a distribution). The results are going to be highly dependent upon how you construct your distribution over distributions. If your outcome space for probability distributions is infinite (which is what I would expect), and you sampled from a broad enough class of distributions then a sampling of 25 data points is not enough data to say anything substantive.

A friend of yours... (read more)

3RolfAndreassen16y

Hmm. I don't know if I'm a very random source of distributions; humans are notoriously bad at randomness, and there are only so many distributions readily available in standard libraries. But in any case, I don't see this as a difficulty; a real-world problem is under no obligation to give you an easily recognised distribution. If Bayesians do better when the distribution is unknown, good for them. And if not, tough beans. That is precisely the sort of thing we're trying to measure! I don't think, though, that the existence of a Bayesian who can win, based on knowing what distributions I'm likely to use, is a very strong statement. Similarly there exists a frequentist who can win based on watching over my shoulder when I wrote the program! You can always win by invoking special knowledge. This does not say anything about what would happen in a real-world problem, where special knowledge is not available.

Bayesian Flame

marks16y10

I think what Shalizi means is that a Bayesian model is never "wrong", in the sense that it is a true description of the current state of the ideal Bayesian agent's knowledge. I.e., if A says an event X has probability p, and B says X has probability q, then they aren't lying even if p!=q. And the ideal Bayesian agent updates that knowledge perfectly by Bayes' rule (where knowledge is defined as probability distributions of states of the world). In this case, if A and B talk with each other then they should probably update, of course.

In frequen... (read more)

Bayesian Flame

marks16y10

I suppose it depends what you want to do, first I would point out that the set is in a bijection with the real numbers (think of two simple injections and then use Cantor–Bernstein–Schroeder), so you can use any prior over the real numbers. The fact that you want to look at infinite sequences of 0s and 1s seems to imply that you are considering a specific type of problem that would demand a very particular meaning of 'non-informative prior'. What I mean by that is that any 'noninformative prior' usually incorporates some kind of invariance: e.g. a uniform prior on [0,1] for a Bernoulli distribution is invariant with respect to the true value being anywhere in the interval.

3janos16y

The purpose would be to predict regularities in a "language", e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can't think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I'm just trying to find something that I can imagine computing with. It's true that there exists a bijection between this space and the real numbers, but it doesn't seem like a very natural one, though it does work (it's measurable, etc). I'll have to think about that one.

Bayesian Flame

marks16y00

This isn't always the case if the prior puts zero probability weight on the true model. This can be avoided on finite outcome spaces, but for infinite outcome spaces no matter how much evidence you have you may not overcome the prior.

1Richard_Kennaway16y

I thought that 0 and 1 were Bayesian sins, unattainable +/- infinity on the log-odds scale, and however strong your priors, you never make them that strong.

Bayesian Flame

marks16y100

I've had some training in Bayesian and Frequentist statistics and I think I know enough to say that it would be difficult to give a "simple" and satisfying example. The reason is that if one is dealing with finite dimensional statistical models (this is where the parameter space of the model is finite) and one has chosen a prior for those parameters such that there is non-zero weight on the true values then the Bernstein-von Mises theorem guarantees that the Bayesian posterior distribution and the maximum likelihood estimate converge to the same... (read more)

1PhilGoetz16y

Thanks much! What do "non-zero weight" and "improper priors" mean? EDIT: Improper priors mean priors that don't sum to one. I would guess "non-zero weight" means "non-zero probability". But then I would wonder why anyone would introduce the term "weight". Perhaps "weight" is the term you use to express a value from a probability density function that is not itself a probability.

AndrewH's observation and opportunity costs

marks16y20

I am uneasy with that sentiment although I'm having a hard time putting my finger one exactly why. But this is how I see it: there are vastly more people in the world than I could possibly ever help and some of them are so poor and downtrodden that they spend most of their money on food since they can't afford luxuries such as drugs. Eventually, I might give money to the drug user if I had solved all the other problems first, but I would prefer my money to be spent on something more essential for survival first before I turn to subsidizing people's luxury spending.

Imposing my values on somebody seems to more aptly describe a situation where I use authority to compel the drug user to not use drugs.

AndrewH's observation and opportunity costs

marks16y20

Would a simple solution to this be to say plan a date each year to give away some quantity of money? You could keep a record of all the times you gave money to a beggar, or you could use a simple model to estimate how much you probably would have given, then you can send that amount to a worthwhile charity.

When I get more money that's what I plan on doing.

Sayeth the Girl

marks16y00

Also, I'd like to note that the post here included nigh-Yudkowskian levels of cross-linking to other material on LW. When we're talking about "conversation norms on LW", how is that not solid data?

The evidence presented is a number of anecdotes from LW conversation. A fully analysis of LW would need to categorize different types of offending comments, discuss their frequency and what role they play in LW discussion. Even better would be to identify who does them, etc.

Although I do find it plausible that LW should enact a policy of altering present discussions of gender seems I certainly will not say the evidence presented is "overwhelming".

3SoullessAutomaton16y

I said "solid data" in this case, not "overwhelming evidence". Alicorn was probably being a bit overly strong in phrasing; even a modest amount of evidence would go a long way. Robin is, of course, being disingenuous here, because even that is more evidence than many of his observations carry.