Fascinating, I thought Tennanbaum's theorem implied non-standard models were rather impossible to visualize. The non-standard model of Peano arithmetic illustrated in the diagram only gives the successor relation, there's no definition of addition and multiplication. Tennenbaum's theorem implies there's no computable way to do this, but is there a proof that they can be defined at all for this particular model?
The chapter on Chomsky is contrasting the generative grammar approach, which Lakoff used to work within, to the cognitive science inspired cognitive linguistics approach, which Lakoff has been working in for the last few decades. Cognitive linguistics includes cognitive semantics which is rather different to generative semantics.
I largely agree with your critique, but more as a description of a different book that could have been written in this book's place. For example, a book on philosophy applying the results of this book's methodology, of which chapter 25 is a poor substitute. Or books drilling into one particular area in more detail with careful connections to the literature. This book serves better as an inspiring manifesto.
...While these chapters are enlightening, they depend too heavily on the earlier account of metaphor, rarely draw upon other findings in cognitive sci
If the goal in exercise is to lose weight, have you tried replacing carbohydrates with fat in your diet? Forcing yourself to exercise will serve to work up an appetite and make you hungry, but not to lose weight. There is a correlation between exercising and being thin, but the causality is generally perceived the wrong way around. There is also a correlation between exercising and (temporarily) losing weight, but that is confounded by diet changes which typically involving reducing carbohydrate intake.
I've heard you mention Gary Taube's work, but not that...
The Von-Neumann Morgenstern axioms talk just about preference over lotteries, which are simply probability distributions over outcomes. That is you have an unstructured set O of outcomes, and you have a total preordering over Dist(O) the set of probability distributions over O. They do not talk about a utility function. This is quite elegant, because to make decisions you must have preferences over distributions over outcomes, but you don't need to assume that O has a certain structure, e.g. that of the reals.
The expected utility theorem says that prefe...
To be concrete, suppose you want to maximise the average utility people have, but you also care about fairness so, all things equal, you prefer the utility to be clustered about its average. Then maybe your real utility function is not
U = (U[1] + .... + U[n])/n
but
U' = U + ((U[1]-U)^2 + .... + (U[n]-U)^2)/n
which is in some sense a mean minus a variance.
Can you translate your complaint into a problem with the independence axiom in particular?
Your second example is not a problem of variance in final utility, but aggregation of utility. Utility theory doesn't force "Giving 1 util to N people" to be equivalent to "Giving N util to 1 person". That is, it doesn't force your utility U to be equal to U1 + U2 + ... + UN where Ui is the "utility for person i".
Your use of the terms parametric vs. nonparametric doesn't seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.
See, for example, this list of publications coauthored by Michael Jordan:
How about buttons "High quality", "Low quality", "Accurate", "Inaccurate". We're increasing options here, but there's probably a nice way to design the interface to reduce the cognitive load.
Using the word "vote" seems broken here more generally -- we aren't implementing some democratic process, we're aggregating judgments (read: collecting evidence) across a population.
Because quality and truth are separate judgments in practice, and forcing them to be conflated into a single scale is losing information. To the extent that truth is positively correlated with quality this will fall out automatically: highly truthy posts will tend to have high quality. Low quality and high truth are not opposites.
Z. M. Davis: Good point, I was brushing that distinction under the rug. From this perspective all people arguing about values are trying to change someone's value computation, to a greater or lesser degree i.e. this is not the place to look if you want to discriminate between "liberal" and "conservative".
With the obvious way to implement a CEV, you start by modeling a population of actual humans (e.g. Earth's), then consider extrapolations of these models (know more, thought faster, etc). No "wipe culturally-defined values" step, however that would be defined.
Where was it suggested otherwise?
Ian C: neither group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value. See the post value is fragile.
Interestingly, you can have unboundedly many children with only quadratic population growth, so long as they are exponentially spaced. For example, give each newborn sentient a resource token, which can be used after the age of maturity (say, 100 years or so) to fund a child. Additionally, in the years 2^i every living sentient is given an extra resource token. One can show there is at most quadratic growth in the number of resource tokens. By adjusting the exponent in 2^i we can get growth O(n^{1+p}) for any nonnegative real p.
Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is "shutdown silently, wiping the AI system clean."
(When I say "CEV" I really mean a FAI which s...
Anon: no, I mean the log probability. In your example, the calibratedness will generally be high: - \log 0.499 - H(p) ~= 0.00289 each time you see tails, and - log 0.501 - H(p) ~= - 0.00289 each time you come up tails. It's continuous.
Let's be specific. We have H(p) = - \sum_x p(x) \log p(x), where p is some probability distribution over a finite set. If we observe x0, the say the predictor's calibration is
C(x0) = \sum_x p(x) \log p(x) - \log p(x0) = - \log p(x0) - H(p)
so the expected calibration is 0 by the definition of H(p). The calibration is co...
Anon: well-calibrated means roughly that in the class of all events you think have probability p to being true, the proportion of them that turn out to be true is p.
More formally, suppose you have a probability distribution over something you are going to observe. If the log probability of the event which actually occurs is equal to the entropy of your distribution, you are well calibrated. If it is above you are over confident, if it is below you are under confident. By this measure, assigning every possibility equal probability will always be calibrated.
This is related to relative entropy.
Just in case it's not clear from the above: there are uncountably many degrees of freedom to an arbitrary complex function on the real line, since you can specify its value at each point independently.
A continuous function, however, has only countably many degrees of freedom: it is uniquely determined by its values on the rational numbers (or any dense set).
Tiiba:
The hypothesis is actual immortality, to which nonzero probability is being assigned. For example, suppose under some scenario your probability of dying at each time decreases by a factor of 1/2. Then, your total probability of dying is 2 times the probability of dying at the very first step, which we can assume far less than 1/2.
Eliezer: "You could see someone else's engine operating materially, through material chains of cause and effect, to compute by "pure thought" that 1 + 1 = 2. How is observing this pattern in someone else's brain any different, as a way of knowing, from observing your own brain doing the same thing? When "pure thought" tells you that 1 + 1 = 2, "independently of any experience or observation", you are, in effect, observing your own brain as evidence."
Richard: "It's just fundamentally mistaken to conflate reason...
Perhaps this formulation is nice:
0 = (P(H|E)-P(H))P(E) + (P(H|~E)-P(H))P(~E)
The expected change in probability is zero (for if you expected change you would have already changed).
Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.
Hey, sorry if it's mad trivial, but may I ask for a derivation of this? You can start with "P(H) = P(H|E)P(E) + P(H|~E)P(~E)" if that makes it shorter.
(edit):
Never mind, I just did it. I'll post it for you in case anyone else wonders.
1} P(H) = P(H|E)P(E) + P(H|~E)P(~E) [CEE]
2} P(H)P(E) + P(H)P(~E) = P(H|E)P(E) + P(H|~E)P(~E) [because ab + (1-a)b = b]
3} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [subtract P(H) from every value to be weighted]
4} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = P(H) - P(H)...
It seems the point of the exercise is to think of non-obvious cognitive strategies, ways of thinking, for improving things. The chronophone translation is both a tool both for finding these strategies by induction, and a rationality test to see if the strategies are sufficiently unbiased and meta.
But what would I say? The strategy of searching for and correcting biases in thought, failures of rationality, would improve things. But I think I generated that suggestion by thinking of "good ideas to transmit" which isn't meta enough. Perhaps if I...
Very nice. These notes say that every countable nonstandard model of Peano arithmetic is isomorphic, as an ordered set, to the natural numbers followed by lexicographically ordered pairs (r, z) for r a positive rational and z an integer. If I remember rightly, the ordering can be defined in terms of addition: x <= y iff exists z. x+z <= y. So if we want to have a countable nonstandard model of Peano arithmetic with successor function and addition we need all these nonstandard numbers.
It seems that if we only care about Peano arithmetic with the s... (read more)