What is Mathematics? by Courant and Robbins is a classic exploration that goes reasonably deep into most areas of math.
This makes me think of two very different things.
One is informational containment, ie how to run an AGI in a simulated environment that reveals nothing about the system it's simulated on; this is a technical challenge, and if interpreted very strictly (via algorithmic complexity arguments about how improbable our universe is likely to be in something like a Solomonoff prior), is very constraining.
The other is futurological simulation; here I think the notion of simulation is pointing at a tool, but the idea of using this tool is a very small part of the ap...
Certainly, interventions may be available, just as for anything else; but it's not fundamentally more accessible or malleable than other things.
I'm arguing that the fuzzy-ish definition that corresponds to our everyday experience/usage is better than the crisp one that doesn't.
Re IQ and "way of thinking", I'm arguing they both affect each other, but neither is entirely under conscious control, so it's a bit of a moot point.
Apropos the original point, under my usual circumstances (not malnourished, hanging out with smart people, reading and thinking about engaging, complex things that can be analyzed and have reasonable success measures, etc), my IQ is mostly not under my control. (Perhaps if I was more focused on measurements, nootropics, and getting enough sleep, I could increase my IQ a bit; but not very much, I think.) YMMV.
I think what you're saying is that if we want a coherent, nontrivial definition of "under our control" then the most natural one is "everything that depends on the neural signals from your brain". But this definition, while relatively clean from the outside, doesn't correspond to what we ordinarily mean; for example, if you have a mental illness, this would suggest that "stop having that illness!!" is reasonable advice, because your illness is "under your control".
I don't know enough neuroscience to give this a physi...
March 2nd isn't a Tuesday; is it Monday night or Tuesday night?
If you want to discuss the nature of reality using a similar lexicon to what philosophers use, I recommend consulting the Stanford Encyclopedia of Philosophy: http://plato.stanford.edu/
Musk has joined the advisory board of FLI and CSER, which are younger sibling orgs of FHI and MIRI. He's aware of the AI xrisk community.
Cool. Regarding bounded utility functions, I didn't mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.
This is where I thought you were missing the point:
Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.
Sometimes we (seem to) ...
Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.
Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the ...
One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf
I think you're making the wrong comparisons. If you buy $1 worth, you get p(win) U(jackpot) + (1-p(win)) U(-$1), which is more-or-less p(win)U(jackpot)+U(-$1); this is a good idea if p(win) U(jackpot) > -U(-$1). But under usual assumptions -U(-$2)>-2U(-$1). This adds up to normality; you shouldn't actually spend all your money. :)
One good negation is "the value/intrinsic utility of a life is the sum of the values/intrinsic utilities of all the moments/experiences in it, evaluated without reference to their place/context in the life story, except inasmuch as is actually part of that moment/experience".
The "actually" gets traction if people's lives follow narratives that they don't realize as they're happening, but such that certain narratives are more valuable than others; this seems true.
If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.
You can comfortably do Bayesian model comparison here; have priors for µcon, µamn, and µsim, and let µpat be either µamn (under hypothesis Hamn) or µsim (under hypothesis Hsim), and let Hamn and Hsim be mutually exclusive. Then integrating out µcon, µamn, and µsim, you get a marginal odds-ratio for Hamn vs Hsim, which tells you how to update.
The standard frequentist method being discussed is nested hypothesis testing, where you want to test null hypothesis H0 with alternative hypothesis H1, and H0 is supposed to be nested inside H1. For instance you could ...
"Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.
Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:
"Alice is a banker" is a simpler statement than "Alice is a feminist, a banker, or a pianist.". That's why the former must be assigned greater probability than the latter.
tl;dr : miscalibration means mentally interpreting loglikelihood of data as being more or less than its actual loglikelihood; to infer it you need to assume/infer the Bayesian calculation that's being made/approximated. Easiest with distributions over finite sets (i.e. T/F or multiple-choice questions). Also, likelihood should be called evidence.
I wonder why I didn't respond to this when it was fresh. Anyway, I was running into this same difficulty last summer when attempting to write software to give friendly outputs (like "calibration") to a bu...
The way I'd try to do this problem mentally would be:
Relative to the desired concentration of 55%, each unit of 40% is missing .15 units of alcohol, and each unit of 85% has .3 extra units of alcohol. .15:.3=1:2, so to balance these out we need (amount of 40%):(amount of 85%)=2:1, i.e. we need twice as much 40% as 85%. Since we're using 1kg of 40%, this means 0.5kg of 85%.
I prefer your phrasing.
Nope: the odds ratio was (.847/(1-.847))/(.906/(1-.906)), which is indeed 57.5%, which could be rounded to 60%. If the starting probability was, say, 1%, rather than 90.6%, then translating the odds ratio statement to "60% as likely" would be legitimate, and approximately correct; probably the journalist learned to interpret odds ratios via examples like that. But when the probabilities are close to 1, it's more correct to say that the women/blacks were 60% more likely to not be referred.
It's just a vanilla (MH) MCMC sampler for (some convenient family of) distributions on polytopes; hopefully like this: http://cran.r-project.org/web/packages/limSolve/vignettes/xsample.pdf , but faster. It's motivated by a model for inferring network link traffic flows from counts of in- and out-bound traffic at each node; the solution space is a polytope, and we want to take advantage of previous observations to form a better prior. But for the approach to be feasible we first need to sample.
But this is not a long-term project, I think.
Looks like good stuff ... thanks for the tip.
Currently I'm taking classes and working on a polytope sampler. I tend to be excited about Bayesian nonparametrics and consistent families of arbitrary-dimensional priors. I'm also excited about general-purpose MCMC-like approaches, but so far I haven't thought very hard about them.
In undergrad I feared a feeling of locked-in-ness, and ditched my intention to do a PhD in math (which I think I could have done well in) partly for this reason, though it was also easier for me because I hadn't established close ties to a particular line of research, and because I had programming background. I worked a couple of years in programming, and now I'm back in school doing a PhD in stats, because I like probability spaces and because I wanted to do something more mathematical than (most) programming. I guess I picked stats over applied math partly out of the same worry about overspecialization; I think stats has a bigger wealth of better-integrated more widely applicable concepts/insights.
Would you be surprised if the absolute value was bigger than 3^^^3? I'm guessing yes, very much so. So that's a reason not to use an improper prior.
If there's no better information about the problem, I sortof like using crazy things like Normal(0,1)*exp(Cauchy); that way you usually get reasonable smallish numbers, but you don't become shocked by huge or tiny numbers either. And it's proper.
I wasn't trying to present a principled distinction, or trying to avoid bias. What I was saying isn't something I'm going to defend. The only reason I responded to your criticism of it was that I was annoyed by the nature of your objection. However, since now I know you thought I was trying to say more than I actually was, I will freely ignore your objection.
Do you have an instance of "I proactively do X" where you do not class it as reactive? Do you have an instance of "I wish to avoid Y" where you do not class it as specific? I don't like conversations about definitions. I was using these words to describe a hypothetical inner experience; I don't claim that they aren't fuzzy. You seem to be pointing at the fuzziness and saying that they're meaningless; I don't see why you'd want to do that.
It seems to me that we mean different things by the words "reactive" (as opposed to proactive) and "specific". A weak attempt at a reductio: I proactively do X to avoid facing Y; I am thus reacting to my desire to avoid facing Y. And is Y general or specific? Y is the specific Y that I do X to avoid facing.
Ah, yes indeedy true. I guess I was thinking of abstinence. So wrong distinction. More likely, then: abortion is done to a specific embryo who is thereby prevented from being, and it's done reactively; there's no question that when you have an abortion it's about deciding to kill this particular embryo. Contraceptive use on the other hand is nonspecific and proactive; it doesn't feel like "I discard these reproductive cells which would have become a person!", it feels like exerting prudent control over your life.
I agree with your main point (that this is a stumbling block for some people), but there are others who will contend that A and part of B (namely the irreversible error) do apply to unwanted babies (usually, or on average), and that the reason why abortion is more evil than contraception is because it's an error of commission rather than omission.
But I drink orange juice with pulp; then the fiber is no longer absent, though I guess it's reduced. The vitamins and minerals are still present, though, aren't they?
Regarding the fruit juices, I agree that fruit-flavored mixtures of HFCS and other things generally aren't worth much, but aren't proper fruit juices usually nutritious? (I mean the kinds where the ingredients consist of fruit juices, perhaps water, and nothing else.)
Regarding investment, my suggestion (if you work in the US) is to open a basic (because it doesn't periodically charge you fees) E*TRADE account here. They will provide an interface for buying and selling shares of stocks and various other things (ETFs and such; I mention stocks and ETFs because those are the only things I've tried doing anything with). They will charge you $10 for every transaction you make, so unless you're going to be (or become) active/clever enough to make it worthwhile, it makes sense not to trade too frequently.
EDIT: These guys appe...
I feel like it is useful to mention that because of efficient markets (which implies assets are "fairly priced") and the benefits of diversification (lower risk), it's almost always better to buy a low fee mutual fund than any particular stocks or bonds. In particular, Index Funds merely keep a portfolio which tracks a broad market index. These often have very low operating costs, so they are a pretty good way to invest. You can buy these as ETFs, or you can buy them through something like Vanguard.
This is right. But to put it much more generally, and as an exercise in seriously trying to bridge information gaps:
To buy stocks you need what is called a Brokerage account. The way a brokerage account works is that you give money to the Broker to invest for you. (Generally, you will do this by transferring it from an existing bank account.) This money generally gets put into a highly liquid account in your name, such as a money market fund. You can get your money back by instructing your broker to send it back to you.
When you want to buy stocks or other...
Echoing the others:
If we suppose these are 22 iid samples from a Poisson then the max likelihood estimate for the Poisson parameter is 0.82 (the sample mean). Simulating such draws from such a Poisson and looking at sample correlation between Jan 15-Feb 4 and Jan 16-Feb 5, the p-value is 0.1. And when testing Poisson-ness vs negative binomial clustering (with the same mean), the locally most powerful test uses statistic (x-1.32)^2, and gives a simulated p-value of 0.44.
It's provided in the linked page; you need to scroll down to see it.
What I don't like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn't know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they're given.
This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.
As far as I understand, agent 1 doesn't know that agent 2 knows A2, and agent 2 doesn't know that agent 1 knows A1. Instead, agent 1 knows that agent 2's state of knowledge is in J and agent 2 knows that agent 1's state of knowledge is in I. I'm a bit confused now about how this matches up with the meaning of Aumann's Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that's what the theorem requires, but currently I'm finding it hard to see how I and J being common...
That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that's still not common knowledge, because agent 1 doesn't know that agent 2 knows A1 union A2.
I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual "everyone knows that everyone knows that ... " definition of common knowledge translates to I(J(I(J(I(J(...(w)...).
Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don't see what's wrong with this.
Nope; it's the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.
Alternately if instead of I and J you think about the sigma-algebras they generate (let's call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.
Right, that is a good piece. But I'm afraid I was unclear. (Sorry if I was.) I'm looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective "stationary" can be interpreted in two compatible ways: either I'm talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a p...
Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).
How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think o...
The purpose would be to predict regularities in a "language", e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can't think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I'm just trying to find something that I can imagine computing with.
It's true that there exists a bijection between this space and the real numbers, but it doesn't seem like a very natural one, though it does work (it's measurable, etc). I'll have to think about that one.
Since we're discussing (among other things) noninformative priors, I'd like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?
Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you'd only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it's easy to find a ...
Updated, eh? Where did your prior come from? :)
I am trying to understand the examples on that page, but they seem strange; shouldn't there be a model with parameters, and a prior distribution for those parameters? I don't understand the inferences. Can someone explain?
I think you're confusing the act of receiving information/understanding about an experience with the experience itself.
Re: the joke example, I think that one would get tired of hearing a joke too many times, and that's what the dissection is equivalent to, because you keep hearing it in your head; but if you already get the joke, the dissection is not really adding to your understanding. If you didn't get the joke, you will probably receive a twinge of enjoyment at the moment when you finally do understand. If you don't understand a joke, I don't think you...
Interesting. My internal experience of programming is quite different; I don't see boxes and lines. Data structures for me are more like people who answer questions, although of course with no personality or voice; the voice is mine as I ask them a question, and they respond in a "written" form, i.e. with a silent indication. So the diagrams people like to draw for databases and such don't make direct sense to me per se; they're just a way of organizing written information.
I am finding it quite difficult to coherently and correctly describe such things; no part of this do I have any certainty of, except that I know I don't imagine black-and-white box diagrams.
Do you have some good examples of abuse of Bayes' theorem?
Is there a reason to think this problem is less amenable to being solved by complexity priors than other learning problems? / Might we build an unaligned agent competent enough to be problematic without solving problems similar to this one?