All of janos's Comments + Replies

janos00

Is there a reason to think this problem is less amenable to being solved by complexity priors than other learning problems? / Might we build an unaligned agent competent enough to be problematic without solving problems similar to this one?

janos20

What is Mathematics? by Courant and Robbins is a classic exploration that goes reasonably deep into most areas of math.

janos00

This makes me think of two very different things.

One is informational containment, ie how to run an AGI in a simulated environment that reveals nothing about the system it's simulated on; this is a technical challenge, and if interpreted very strictly (via algorithmic complexity arguments about how improbable our universe is likely to be in something like a Solomonoff prior), is very constraining.

The other is futurological simulation; here I think the notion of simulation is pointing at a tool, but the idea of using this tool is a very small part of the ap... (read more)

janos00

Certainly, interventions may be available, just as for anything else; but it's not fundamentally more accessible or malleable than other things.

janos20

I'm arguing that the fuzzy-ish definition that corresponds to our everyday experience/usage is better than the crisp one that doesn't.

Re IQ and "way of thinking", I'm arguing they both affect each other, but neither is entirely under conscious control, so it's a bit of a moot point.

Apropos the original point, under my usual circumstances (not malnourished, hanging out with smart people, reading and thinking about engaging, complex things that can be analyzed and have reasonable success measures, etc), my IQ is mostly not under my control. (Perhaps if I was more focused on measurements, nootropics, and getting enough sleep, I could increase my IQ a bit; but not very much, I think.) YMMV.

janos100

I think what you're saying is that if we want a coherent, nontrivial definition of "under our control" then the most natural one is "everything that depends on the neural signals from your brain". But this definition, while relatively clean from the outside, doesn't correspond to what we ordinarily mean; for example, if you have a mental illness, this would suggest that "stop having that illness!!" is reasonable advice, because your illness is "under your control".

I don't know enough neuroscience to give this a physi... (read more)

0[anonymous]
Uh.. "stop having that illness!" is reasonable advice. Seek help. Try medication. Enter into psychotherapy. I'm not sure what you are objecting to there?
0estimator
Well, you're right that in the mental illness case my definition works badly, but I can't think about a better precise definition right now (can you?); probably something like selecting a specific "sub-process" in brain which is related to the conscious experience, but it's fuzzy and I'm not even sure that such separation is possible. I have a feeling that it is a rephrasing of "things under your control". Actually, I'm arguing that causal arrows are pointing in the opposite direction: if I was to change your IQ, I could change your way of thinking. The rest of article is about what happens if we assume IQ fixed (that somehow resembles Bayesian inference).
janos30

March 2nd isn't a Tuesday; is it Monday night or Tuesday night?

2Sheaman3773
It clearly stipulates 12:01 am to avoid just this kind of confusion. Further, the chapter will be posted at 10:00 am on Tuesday. So the deadline is Monday night.
janos20

If you want to discuss the nature of reality using a similar lexicon to what philosophers use, I recommend consulting the Stanford Encyclopedia of Philosophy: http://plato.stanford.edu/

2[anonymous]
I have a very strong philosophical background. I've discussed many of those topics with the authors. Basically, what I'm trying to do is draw the attention to something that is usually missed by people engaging in these topics. That is: absolute is not objective. There is a fundamental disconnect with the way most people organize truth and reality. They do no have clear concepts of objective and absolute. The sequence on how to use words, is basically 6 parables that state words are not absolute. It's such a simple point, but most people can look right at that sentence, and not have the foggiest clue what it means. Traditionally (in the history of philosophy) the Rationalist is the lone defender of the distinction between objective and absolute. I'm curious if that tradition is held up by contemporary rationalists.
janos130

Musk has joined the advisory board of FLI and CSER, which are younger sibling orgs of FHI and MIRI. He's aware of the AI xrisk community.

janos10

Cool. Regarding bounded utility functions, I didn't mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.

This is where I thought you were missing the point:

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

Sometimes we (seem to) ... (read more)

janos10

Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the ... (read more)

2gjm
No, it doesn't seem strange to me to consider representing what I want by a bounded utility function. It seems strange to consider representing what I want by a utility function that converges exponentially fast towards its bound. I'll repeat something I said in another comment: (Remark 1: the above is a comment that remarks that the optimum is the optimum but is visibly not missing the point by failing to appreciate that we might be constructing a utility function and trying to make it do good-looking things, rather than approximating a utility function we already have.) (Remark 2: I think I can imagine situations in which we might consider making the relationship between chocolate and utility converge very fast -- in fact, taking "chocolate" literally rather than metaphorically might yield such a situation. But in those situations, I also think the results you get from your exponentially-converging utility function aren't obviously unreasonable.)
0AlexMennen
You still haven't answered my question of why we don't want those properties. To me, they don't seem counter-intuitive at all.
janos30

One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf

janos00

I think you're making the wrong comparisons. If you buy $1 worth, you get p(win) U(jackpot) + (1-p(win)) U(-$1), which is more-or-less p(win)U(jackpot)+U(-$1); this is a good idea if p(win) U(jackpot) > -U(-$1). But under usual assumptions -U(-$2)>-2U(-$1). This adds up to normality; you shouldn't actually spend all your money. :)

0prase
Of course you are right, silly mistake. (Not really important nitpick:) The dollar is spent once the ticket is bought and doesn't return even if you win, so you shoudn't have there (1-p(win)) * U(-$1), but just U(-$1).
janos10

One good negation is "the value/intrinsic utility of a life is the sum of the values/intrinsic utilities of all the moments/experiences in it, evaluated without reference to their place/context in the life story, except inasmuch as is actually part of that moment/experience".

The "actually" gets traction if people's lives follow narratives that they don't realize as they're happening, but such that certain narratives are more valuable than others; this seems true.

janos50

If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.

1Manfred
Add-on: You can make the analogy clearer if you imagine, instead of rummaging around in a hat, you lined up all the strips of paper in random order and read them one at a time. Then it makes sense that the total number of slips of paper shouldn't matter.
janos20

You can comfortably do Bayesian model comparison here; have priors for µcon, µamn, and µsim, and let µpat be either µamn (under hypothesis Hamn) or µsim (under hypothesis Hsim), and let Hamn and Hsim be mutually exclusive. Then integrating out µcon, µamn, and µsim, you get a marginal odds-ratio for Hamn vs Hsim, which tells you how to update.

The standard frequentist method being discussed is nested hypothesis testing, where you want to test null hypothesis H0 with alternative hypothesis H1, and H0 is supposed to be nested inside H1. For instance you could ... (read more)

janos40

"Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.

Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:

"Alice is a banker" is a simpler statement than "Alice is a feminist, a banker, or a pianist.". That's why the former must be assigned greater probability than the latter.

0ArisKatsaris
Agreed. Instead of complexity, I should have probably said "specificity". "Alice is a banker" is a less complicated statement than "Alice is a feminist, a banker, or a pianist", but a more specific one.
janos20

tl;dr : miscalibration means mentally interpreting loglikelihood of data as being more or less than its actual loglikelihood; to infer it you need to assume/infer the Bayesian calculation that's being made/approximated. Easiest with distributions over finite sets (i.e. T/F or multiple-choice questions). Also, likelihood should be called evidence.

I wonder why I didn't respond to this when it was fresh. Anyway, I was running into this same difficulty last summer when attempting to write software to give friendly outputs (like "calibration") to a bu... (read more)

janos140

The way I'd try to do this problem mentally would be:

Relative to the desired concentration of 55%, each unit of 40% is missing .15 units of alcohol, and each unit of 85% has .3 extra units of alcohol. .15:.3=1:2, so to balance these out we need (amount of 40%):(amount of 85%)=2:1, i.e. we need twice as much 40% as 85%. Since we're using 1kg of 40%, this means 0.5kg of 85%.

3RobinZ
That's clever! Changing your frame of reference is a useful tool - there are a lot of problems which become simpler if you use measurements from a 'zero' that you pick.
janos30

Nope: the odds ratio was (.847/(1-.847))/(.906/(1-.906)), which is indeed 57.5%, which could be rounded to 60%. If the starting probability was, say, 1%, rather than 90.6%, then translating the odds ratio statement to "60% as likely" would be legitimate, and approximately correct; probably the journalist learned to interpret odds ratios via examples like that. But when the probabilities are close to 1, it's more correct to say that the women/blacks were 60% more likely to not be referred.

2Perplexed
Hmmm. I would have said that white men were 60% as likely to not be referred. (This is the first time I've seen the golden ratio show up in a discussion of probability!)
janos00

It's just a vanilla (MH) MCMC sampler for (some convenient family of) distributions on polytopes; hopefully like this: http://cran.r-project.org/web/packages/limSolve/vignettes/xsample.pdf , but faster. It's motivated by a model for inferring network link traffic flows from counts of in- and out-bound traffic at each node; the solution space is a polytope, and we want to take advantage of previous observations to form a better prior. But for the approach to be feasible we first need to sample.

But this is not a long-term project, I think.

janos00

Looks like good stuff ... thanks for the tip.

janos10

Currently I'm taking classes and working on a polytope sampler. I tend to be excited about Bayesian nonparametrics and consistent families of arbitrary-dimensional priors. I'm also excited about general-purpose MCMC-like approaches, but so far I haven't thought very hard about them.

0jsalvatier
What is a polytope sampler? Link to work?
0jsteinhardt
It seems like you might want to check this guy's work out.
janos80

In undergrad I feared a feeling of locked-in-ness, and ditched my intention to do a PhD in math (which I think I could have done well in) partly for this reason, though it was also easier for me because I hadn't established close ties to a particular line of research, and because I had programming background. I worked a couple of years in programming, and now I'm back in school doing a PhD in stats, because I like probability spaces and because I wanted to do something more mathematical than (most) programming. I guess I picked stats over applied math partly out of the same worry about overspecialization; I think stats has a bigger wealth of better-integrated more widely applicable concepts/insights.

0jsalvatier
I am curious: what do you plan to work on in stats? I personally think more people should be working on efficient general sampling methods for Bayesian stats, for reasons I have written about here: http://goodmorningeconomics.wordpress.com/2010/11/16/the-promise-of-bayesian-statistics-pt-2/ . Programming skills are very useful there. I am a programmer and one of my hobbies is implementing bayes stats algorithms in the literature. Do let me know if you come up with anything revolutionary.
janos20

Would you be surprised if the absolute value was bigger than 3^^^3? I'm guessing yes, very much so. So that's a reason not to use an improper prior.

If there's no better information about the problem, I sortof like using crazy things like Normal(0,1)*exp(Cauchy); that way you usually get reasonable smallish numbers, but you don't become shocked by huge or tiny numbers either. And it's proper.

1CronoDAS
Let's say that you know the variable is a real number in [0,1], but nothing else...
janos00

I wasn't trying to present a principled distinction, or trying to avoid bias. What I was saying isn't something I'm going to defend. The only reason I responded to your criticism of it was that I was annoyed by the nature of your objection. However, since now I know you thought I was trying to say more than I actually was, I will freely ignore your objection.

janos00

Do you have an instance of "I proactively do X" where you do not class it as reactive? Do you have an instance of "I wish to avoid Y" where you do not class it as specific? I don't like conversations about definitions. I was using these words to describe a hypothetical inner experience; I don't claim that they aren't fuzzy. You seem to be pointing at the fuzziness and saying that they're meaningless; I don't see why you'd want to do that.

4Dreaded_Anomaly
My point is that 1 and 2 above don't seem to differ fundamentally in either of the two descriptors you used. Conversations about definitions of words are not useful, but definitions of concepts are necessary. I'm pointing at the fuzziness because it indicates to me that the supposed distinction is not being made based on any principle, but simply to rationalize a preexisting bias.
janos-20

It seems to me that we mean different things by the words "reactive" (as opposed to proactive) and "specific". A weak attempt at a reductio: I proactively do X to avoid facing Y; I am thus reacting to my desire to avoid facing Y. And is Y general or specific? Y is the specific Y that I do X to avoid facing.

5Dreaded_Anomaly
1. A person doesn't want to have a baby, so she has an abortion to stop the fetus from developing into one. 2. A person doesn't want to have a fetus, so she uses contraception to stop the ovum and sperm from developing into one. If 1 is reactive, then so is 2. For a given fetus, there is a finite possibility space of all the persons into which it could develop, taking into account different values of unknown future parameters. The same can be said of any combination of sperm and ova; it's just that the possibility space is larger. How would one derive a concept of "specific" that discriminates between the fetus space and the sperm/ova space without drawing an arbitrary line based on the size of the space?
janos30

Ah, yes indeedy true. I guess I was thinking of abstinence. So wrong distinction. More likely, then: abortion is done to a specific embryo who is thereby prevented from being, and it's done reactively; there's no question that when you have an abortion it's about deciding to kill this particular embryo. Contraceptive use on the other hand is nonspecific and proactive; it doesn't feel like "I discard these reproductive cells which would have become a person!", it feels like exerting prudent control over your life.

8Dreaded_Anomaly
Every time contraception is used, it prevents a specific multitude of "potential humans" from existing. Sure, most of them would have been prevented from existing by other factors, but contraception still actively contributes to that. It's also done reactively, in that it's a reaction to someone's desire to have sex with a lower risk of pregnancy. It may not feel the same way as abortion, but that's just because it's easier for humans to value fetuses than sperm and egg cells. Both abortion and contraception have specific and reactive components, in principle.
janos50

I agree with your main point (that this is a stumbling block for some people), but there are others who will contend that A and part of B (namely the irreversible error) do apply to unwanted babies (usually, or on average), and that the reason why abortion is more evil than contraception is because it's an error of commission rather than omission.

9PlaidX
Killing adults is less reversible in the sense that if you kill comedian carlos mencia, you can't get a new carlos mencia if you change your mind. In contrast, babies are basically fungible.
9Alicorn
I think taking birth control precautions is pretty comission-y. Abstinence would be the omission version of not having babies.
janos30

But I drink orange juice with pulp; then the fiber is no longer absent, though I guess it's reduced. The vitamins and minerals are still present, though, aren't they?

2Conuly
Are you making this juice yourself by chucking a whole orange in the blender and then drinking it? In that case, you probably - I don't know - have enough fiber that it's not that much different from just eating an orange, and fresh juices are said to be more nutritious than bought anyway. (Admittedly, the people who say this are people who own juicers, but that's probably beside the point.) But if you're buying it from the store, then... no. It's still mostly just sugar with a little bit of texture floating in it. If you're not gulping it by the gallon daily I wouldn't worry about it, but it's part of your healthy balanced breakfast - and not a huge part :)
1Vladimir_M
You still get an enormous amount of sugar, with or without the pulp. Regarding the vitamins and minerals, my understanding is that you need a certain amount of each of those to avoid various nasty and fatal diseases, and an amount over a certain limit can be poisonous, but there isn't any real evidence that anything in-between makes a difference. From what I understand, it also requires a very extreme diet (by modern developed world standards) to develop provably harmful micronutrient deficiencies. (One exception might be vitamin D if the winters are especially dark and cold where you live, but you won't get that one from fruit juice.)
janos30

Regarding the fruit juices, I agree that fruit-flavored mixtures of HFCS and other things generally aren't worth much, but aren't proper fruit juices usually nutritious? (I mean the kinds where the ingredients consist of fruit juices, perhaps water, and nothing else.)

6Conuly
One orange is one or two servings of fruit... but a serving of orange juice is four oranges. You're getting all the sugar and calories of four oranges (4 - 8 servings of fruit!) without any of the fiber. Fruit juices aren't exactly the devil, but they're not especially nutritious either.
0TobyBartels
I like real juice, but (except for orange juice with pulp) I always water it down. It tastes the same when compared to long-term memory (although not when directly compared).
4Kutta
Fruit juices are very bad. They concentrate the sugar content of a lot of fruits into a small mass and volume. For instance apple juice is usually considerably more sugary than Pepsi, with around 11-12 g/100g sugar content, and also with a worse sugar profile, with 66% fructose, compared to HFCS's 55 percent as it is commonly used in soft drinks (note: fructose is the worse sugar). Other fruit juices are usually above 8% sugar too.
4Alicorn
They're still high in sugar relative to how much you are likely to consume, and don't offer the fiber or unprocessed-ness of entire fruit. It would usually be better to either eat a piece of fruit or drink water. (I ignore this advice because I hate water, so when I thirst between meals I drink juice.)
janos140

Regarding investment, my suggestion (if you work in the US) is to open a basic (because it doesn't periodically charge you fees) E*TRADE account here. They will provide an interface for buying and selling shares of stocks and various other things (ETFs and such; I mention stocks and ETFs because those are the only things I've tried doing anything with). They will charge you $10 for every transaction you make, so unless you're going to be (or become) active/clever enough to make it worthwhile, it makes sense not to trade too frequently.

EDIT: These guys appe... (read more)

0Alexei
Scottrade is another well known company that provides the same services. They only charge $7 dollars per transaction (more more for penny stocks). I've had very positive experience with them. One thing to keep in mind is that doing stock trading will make your taxes more complicated and more expensive to fill out.

I feel like it is useful to mention that because of efficient markets (which implies assets are "fairly priced") and the benefits of diversification (lower risk), it's almost always better to buy a low fee mutual fund than any particular stocks or bonds. In particular, Index Funds merely keep a portfolio which tracks a broad market index. These often have very low operating costs, so they are a pretty good way to invest. You can buy these as ETFs, or you can buy them through something like Vanguard.

Benquo280

This is right. But to put it much more generally, and as an exercise in seriously trying to bridge information gaps:

To buy stocks you need what is called a Brokerage account. The way a brokerage account works is that you give money to the Broker to invest for you. (Generally, you will do this by transferring it from an existing bank account.) This money generally gets put into a highly liquid account in your name, such as a money market fund. You can get your money back by instructing your broker to send it back to you.

When you want to buy stocks or other... (read more)

janos30

Echoing the others:

If we suppose these are 22 iid samples from a Poisson then the max likelihood estimate for the Poisson parameter is 0.82 (the sample mean). Simulating such draws from such a Poisson and looking at sample correlation between Jan 15-Feb 4 and Jan 16-Feb 5, the p-value is 0.1. And when testing Poisson-ness vs negative binomial clustering (with the same mean), the locally most powerful test uses statistic (x-1.32)^2, and gives a simulated p-value of 0.44.

janos130

It's provided in the linked page; you need to scroll down to see it.

janos10

What I don't like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn't know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they're given.

This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.

janos00

As far as I understand, agent 1 doesn't know that agent 2 knows A2, and agent 2 doesn't know that agent 1 knows A1. Instead, agent 1 knows that agent 2's state of knowledge is in J and agent 2 knows that agent 1's state of knowledge is in I. I'm a bit confused now about how this matches up with the meaning of Aumann's Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that's what the theorem requires, but currently I'm finding it hard to see how I and J being common... (read more)

0Psy-Kosh
Then agent 1 knows that agent 2 knows one of the members of J that have non empty intersection with I(w), and similar for for agent 2. Presumably they have to tell each other which of their own partitions w is in, right? ie, presumably SOME sort of information sharing happens about each other's conclusions. And, once that happens, seems like intersection I(w) and J(w) would be their resultant common knowledge. I'm confused still though what the "meet" operation is. Unless... the idea is something like this: they exchange probabilities. Then agent 1 reasons "J(w) is a member of J such that it both Intersects I(w) AND would assign that particular probability. So then I can determine the subset of I(w) that intersects with those" and determine a probability from there." And similar for agent 2. Then they exchange probabilities again, and go through an equivalent reasoning process to tighten the spaces a bit more... and the theorem ensures that they'd end up converging on the same probabilities? (each time they state unequal probabilities, they each learn more information and each one then comes up with a set that's a strict subset of the one they were previously considering, but each of their sets always contain the intersection of I(w) and J(w))?
janos00

That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that's still not common knowledge, because agent 1 doesn't know that agent 2 knows A1 union A2.

I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual "everyone knows that everyone knows that ... " definition of common knowledge translates to I(J(I(J(I(J(...(w)...).

1Psy-Kosh
Well, how is it not the intersection then? ie, Agent 1 knows A1 and knows that Agent 2 knows A2 If they trust each other's rationality, then they both know that w must be in A1 and be in A2 So they both conclude it must be in intersection of A1 and A2, and they both know that they both know this, etc etc... Or am I missing the point?
janos00

Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don't see what's wrong with this.

0AndrewKemendo
I suppose my post was poorly worded. Yes, in this case omega is the reference set for possible world histories. What I was referring to was the baseline of w as an accurate measure. It is a normalizing reference, though not a set.
janos00

Nope; it's the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.

Alternately if instead of I and J you think about the sigma-algebras they generate (let's call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.

1Psy-Kosh
Then... I'm having trouble seeing why I^J wouldn't very often converge on the entire space. ie, suppose a super simplification in which both agent 1 and agent 2 partition the space into only two parts, agent 1 partitioning it into I = {A1, B1}, and agent 2 partitioning into J = {A2, B2} Suppose I(w) = A1 and J(w) = A2 Then, unless the two partitions are identical, wouldn't (I^J)(w) = the entire space? or am I completely misreading? And thanks for taking the time to explain.
janos30

Right, that is a good piece. But I'm afraid I was unclear. (Sorry if I was.) I'm looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective "stationary" can be interpreted in two compatible ways: either I'm talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a p... (read more)

0cousin_it
Janos, I spent some days parsing your request and it's quite complex. Cosma Shalizi's thesis and algorithm seem to address your problem in a frequentist manner, but I can't yet work out any good Bayesian solution.
janos10

Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).

How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think o... (read more)

0marks
One issue with say taking a normal distribution and letting the variance go to infinity (which is the improper prior I normally use) is that the posterior distribution distribution is going to have a finite mean, which may not be a desired property of the resulting distribution. You're right that there's no essential reason to relate things back to the reals, I was just using that to illustrate the difficulty. I was thinking about this a little over the last few days and it occurred to me that one model for what you are discussing might actually be an infinite graphical model. The infinite bi-directional sequence here are the values of bernoulli-distributed random variables. Probably the most interesting case for you would be a Markov-random field, as the stochastic 'patterns' you were discussing may be described in terms of dependencies between random variables. Here's three papers I read a little while back on the topic (and related to) something called an Indian Buffet process: (http://www.cs.utah.edu/~hal/docs/daume08ihfrm.pdf) (http://cocosci.berkeley.edu/tom/papers/ibptr.pdf) (http://www.cs.man.ac.uk/~mtitsias/papers/nips07.pdf) These may not quite be what you are looking for since they deal with a bound on the extent of the interactions, you probably want to think about probability distributions of binary matrices with an infinite number of rows and columns (which would correspond to an adjacency matrix over an infinite graph).
2cousin_it
Something about this discussion reminds me of a hilarious text: The moral of this story seems to be, Assume priors over generators, not over sequences. A noninformative prior over the reals will never learn that the digit after 0100 is more likely to be 1, no matter how much data you feed it.
janos30

The purpose would be to predict regularities in a "language", e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can't think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I'm just trying to find something that I can imagine computing with.

It's true that there exists a bijection between this space and the real numbers, but it doesn't seem like a very natural one, though it does work (it's measurable, etc). I'll have to think about that one.

1marks
What topology are you putting on this set? I made the point about the real numbers because it shows that putting a non-informative prior on the infinite bidirectional sequences should be at least as hard as for the real numbers (which is non-trivial). Usually a regularity is defined in terms of a particular computational model, so if you picked Turing machines (or the variant that works with bidirectional infinite tape, which is basically the same class as infinite tape in one direction), then you could instead begin constructing your prior in terms of Turing machines. I don't know if that helps any.
janos40

Since we're discussing (among other things) noninformative priors, I'd like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?

Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you'd only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it's easy to find a ... (read more)

1marks
I suppose it depends what you want to do, first I would point out that the set is in a bijection with the real numbers (think of two simple injections and then use Cantor–Bernstein–Schroeder), so you can use any prior over the real numbers. The fact that you want to look at infinite sequences of 0s and 1s seems to imply that you are considering a specific type of problem that would demand a very particular meaning of 'non-informative prior'. What I mean by that is that any 'noninformative prior' usually incorporates some kind of invariance: e.g. a uniform prior on [0,1] for a Bernoulli distribution is invariant with respect to the true value being anywhere in the interval.
janos80

Updated, eh? Where did your prior come from? :)

0[anonymous]
Overcoming Bias. :-)
janos00

I am trying to understand the examples on that page, but they seem strange; shouldn't there be a model with parameters, and a prior distribution for those parameters? I don't understand the inferences. Can someone explain?

0cousin_it
Well, the first example is a model with a single parameter. Roughly speaking, the Bayesian initially believes that the true model is either a Gaussian around 1, or a Gaussian around -1. The actual distribution is a mix of those two, so the Bayesian has no chance of ever arriving at the truth (the prior for the truth is zero), instead becoming over time more and more comically overconfident in one of the initial preposterous beliefs.
janos20

I think you're confusing the act of receiving information/understanding about an experience with the experience itself.

Re: the joke example, I think that one would get tired of hearing a joke too many times, and that's what the dissection is equivalent to, because you keep hearing it in your head; but if you already get the joke, the dissection is not really adding to your understanding. If you didn't get the joke, you will probably receive a twinge of enjoyment at the moment when you finally do understand. If you don't understand a joke, I don't think you... (read more)

0Mike Bishop
I think you make an important distinction, but people sometimes act like gaining understanding will result in a long-term reduction in some warm fuzzies for them. They sometimes explicitly tell me they think this will happen. While I think people may underestimate the net warm fuzzies resulting from learning (i.e. they are biased), I'm confident that they are sometimes correct. The difficult question is deciding what we should do about this. Don't get me wrong, I'm still very committed to epistemic rationality and will try to sell people on its many virtues/benefits.
4pjeby
Indeed, my wife and I have practiced for well over a decade how to get optimum endorphin release from casual contact. (For example, we've identified certain spots we can apply hand pressure to on the other person that create a sensation we call "recharging" -- a kind of relaxed energy.)
janos10

Interesting. My internal experience of programming is quite different; I don't see boxes and lines. Data structures for me are more like people who answer questions, although of course with no personality or voice; the voice is mine as I ask them a question, and they respond in a "written" form, i.e. with a silent indication. So the diagrams people like to draw for databases and such don't make direct sense to me per se; they're just a way of organizing written information.

I am finding it quite difficult to coherently and correctly describe such things; no part of this do I have any certainty of, except that I know I don't imagine black-and-white box diagrams.

janos30

Do you have some good examples of abuse of Bayes' theorem?

0AndySimpson
That is a good question for a statistician, and I am not a statistician. One thing that leaps to mind, however, is two-boxing on Newcomb's Problem using assumptions about the prior probability of box B containing $1,000,000. Some new work using math that I don't begin to understand suggests that either response to Newcomb's problem is defensible using Bayesian nets. There could be more trivial cases, too, where a person inputs unreasonable prior probabilities and uses cargo-cult statistics to support some assertion. Also, it's struck me that a frequentist statistician might call most Bayesian uses of the theorem "abuses." I'm not sure those are really good examples, but I hope they're satisfying.
Load More