LESSWRONG
LW

All of JonasMoss's Comments + Replies

The number of elements in $0 N$ won't change when removing every other element from it. The cardinality of $0 N$ is countable. And when you remove every other element, it is still countable, and indistinguishable from $0 N$ . If you're unconvinced, ask yourself how many elements $0 N$ with every other element removed contains. The set is certainly not larger than $N$ , so it's at most countable. But it's certainly not finite either. Thus you're dealing with a set of countably many 0s. As there is only one such multiset, ... (read more)

1[anonymous]3y

Pareto explicitly says that you have to keep identities intact, because the definition stipulates that w1 and w2 "contain the same people." If you don't preserve identities, you can't verify that that condition is met, in which case Pareto isn't applicable.

2Slider3y

To my mind the reduced set has ω2 elements which is less than ω. But yeah its part of a bigger pattern where I don't think cardinality is a very exhaustive concept when it comes to infinite set sizes. But I don't have that much knowledge to have a good alternative working conception around "ordinalities".

On infinite ethics

JonasMoss3y30

I don't understand what you mean. The upgraded individuals are better off than the non-upgraded individuals, with everything else staying the same, so it is an application of Pareto.

Now, I can understand the intuition that (a) and (b) aren't directly comparable due to identity of individuals. That's what I mean with the caveat "(Unless we add an arbitrary ordering relation on the utilities or some other kind of structure.)"

2Slider3y

Okay the pareto thing applies but the formal contradiction has a problem in the (b) prong. Consider 1N which is 1,2,3,4,5,6,7... if you took each other out from that yu would get 1,3,5,7... There is no 2 in there so there is no copy of 1N remaining in there. Sure if you have 0N which is 0,0,0,0,0... you can have it as a multiset but multisets track amounts. It is not sufficient that the members are of the same object the amount needs to match too. And in that dropping the amounts (atleast ought to) change. So 0N2 is not the same as 0N. So you get 2N∪0N≺N∪0N2 which is not an exact mirror of the (a) prong.

On infinite ethics

JonasMoss3y10

Pareto: If two worlds (w1 and w2) contain the same people, and w1 is better for an infinite number of them, and at least as good for all of them, then w1 is better than w2.

As far as I can see, the Pareto principle is not just incompatible with the agent-neutrality principle, it's incompatible with set theory itself. (Unless we add an arbitrary ordering relation on the utilities or some other kind of structure.)

Let's take a look at, for instance, $N \cup 0 N$ vs $2 N \cup 0 N$ , where $n N$ is the multiset containing $n, 2 n, 3 n, \dots$ and $\cup$ ... (read more)

1MichaelStJules3y

Multisets don't track identity and effectively bake agent-neutrality into them, so they don't have enough structure to express the Pareto principle properly. For Pareto, it's better to represent your worlds/universes/outcomes as vectors with a component for each individual, or functions mappings identities (or labels) to real numbers. Your set of identities can just be the natural numbers, or integers or whatever.

2Slider3y

In (b) the remaining copy of 0N is specifically missing those "upgraded individuals". They might contain the same number of people but it is not clear to me that they contain the same people. Thus (b) is not an instance of applying pareto.

2Zach Stein-Perlman3y

Yeah, so Pareto seems to require that we don't just think about the people in the universe in terms of set theory as you do, but instead maybe have something like a canonical order in which to compare people between universes... that seems to work for comparing worlds where (roughly) the people are the same but their utilities change; I'm not sure how to compare universes with people arranged differently using something like this set theory. Ideally we could think of infinite utilities as hyperreal numbers rather than in terms of sets; then there's no contradiction of this form.

A Bayesian Aggregation Paradox

JonasMoss3y10

Okay, thanks for the clarification! Let's see if I understand your setup correctly. Suppose we have the probability measures $p_{E}$ and $p_{1}$ , where $p_{E}$ is the probability measure of the expert. Moreover, we have an outcome $x \in {A, B, C} .$

In your post, you use $p_{1} (x ∣ z) \propto p_{E} (z ∣ x) p_{1} (x)$ , where $z$ is an unknown outcome known only to the expert. To use Bayes' rule, we must make the assumption that $p_{1} (z ∣ x) = p_{E} (z ∣ x)$ . This assumption doesn't sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculation... (read more)

Harms and possibilities of schooling

JonasMoss3y10

Do you have a link to the research about the effect of a bachelor of education?

A Bayesian Aggregation Paradox

JonasMoss3y*10

I find the beginning of this post somewhat strange, and I'm not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says $p (θ ∣ x) = \frac{p (x ∣ θ) p (θ)}{p (x)}$ . To use this theorem, you need both an $x$ (your data / evidence), and a $θ$ (your parameter). Using “posterior $\propto$ prior $\times$ likelihood” (with priors $p_{1}, p_{2}, p_{3}$ and likelihoods $e_{1}, e_{2}, e_{3}$ ), you're talking as if your expert's likelihood equals&... (read more)

2Jsevillamol3y

Thanks for engaging! Parameters are abstractions we use to simplify modelling. What we actually care about is the probability of unkown events given past observations. To clarify: this is not what I wanted to discuss. The expert is reporting how you should update your priors given the evidence, and remaining agnostic on what the priors should be. The whole point of Bayesianism is that it offer a precise, quantitative answer to how you should update your priors given some evidence - and that is multiplying by the likelihoods. This is why it is often recommend in social sciences and elsewhere to report your likelihoods. I agree this is not common in judgemental forecasting, where the whole updating process is very illegible. I think it holds for most Bayesian-leaning scientific reporting. I am not, I am talking about evidence = likelihood vectors. One way to think about this is that the expert is just informing us about how we should update our beliefs. "Given that the pandemic broke out in Wuhan, your subjective probability of a lab break should increase and it should increase by this amount". But the final probability depends on your prior beliefs, that the expert cannot possibly know. Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and neccessarily loses some information.

Harms and possibilities of schooling

JonasMoss3y*30

Children became grown-ups 200 years ago too. I don't think we need to teach them anything at all, much less anything in particular.

According to this SSC post, kids can easily catch up in math even if they aren't taught any math at all in the 5 first years of school.

In the Benezet experiment, a school district taught no math at all before 6th grade (around age 10-11). Then in sixth grade, they started teaching math, and by the end of the year, the students were just as good at math as traditionally-educated children with five years of preceding math educati

... (read more)

-1Phil Scadden3y

200 years ago was different world - reading wasnt required. Ask anyone who cant read as an adult how tough that is. The 10% with dyslexia need intervention fast.

Magna Alta Doctrina

JonasMoss3y10

I found this post interesting, especially the first part, but extremely difficult to understand (yeah, that hard). I believe some of the analogies might be valuable, but it's simply too hard for me to confirm / disconfirm most of them. Here are some (but far from all!) examples:

1. About local optimizers. I didn't understand this section at all! Are you claiming that gradient descent isn't a local optimizer? Or are you claiming that neural networks can implement mesa-optimizers? Or something else?

2. The analogy to Bayesian reasoning feels forced and unrela... (read more)

Ordinary and unordinary decision theory

JonasMoss3y10

I disagree. Sometimes your entire payoffs also change when you change your action space (in the informal description of the problem). That is the point of the last example, where precommitment changes the possible payoffs, not only restricts the action space.

Ordinary and unordinary decision theory

JonasMoss3y50

Paradoxical decision problems are paradoxical in the colloquial sense (such as Hilbert's hotel or Bertrand's paradox), not the literal sense (such as "this sentence is false"). Paradoxicality is in the eye of the beholder. Some people think Newcomb's problem is paradoxical, some don't. I agree with you and don't find it paradoxical.

3TLW3y

In that case why are people spending so much effort on it[1]? Ditto, why is there so much argumentation based around applying game theory to Newcomb's problem[2] (or variants) when much of game theory does not apply to it? ***** I think that there is a lot of effort being wasted due to not clearly distinguishing counterintuative results and self-contradictory results in general, and this is a prime example. 1. ^ The LessWrong "Newcomb's Problem" tag has 41 entries with a combined total of over 1,000 comments, for instance. 2. ^ See also any argument[3] that says that X is invalid because it provides 'the wrong solution' to Newcomb's Problem. 3. ^ Such as "Evidential decision theories are inadequate rational decision theories. For either they provide wrong solutions to Newcomb's problem [...]", which is a direct quote from the overview of Newcomb's problem linked in the description of the tag 'Newcomb's Problem' on this site.

JonasMoss's Shortform

JonasMoss3y10

Ah! Edited version: "there's no *obvious* distribution $p (a, x)$ " (which could have been "natural distribution" or "canonical distribution"). The point is that you need more information than what should be sufficient (the effect of the action) to do evidential decision theory.

JonasMoss's Shortform

JonasMoss3y*10

Evidential decision theory boggles my mind.

I have some sympathy for causal decision theory, especially when the causal description matches reality. But evidential decision theory is 100% bonkers.

The most common argument against evidential decision theory is that it does not care about the consequence of your action. It cares about correlation (broadly speaking), not causality, and acts as if both were same. This argument is sufficient to thoroughly discredit evidential decision theory, but philosophers keep giving it screen time.

Even if we lived in a ... (read more)

3Alexander Gietelink Oldenziel3y

I agree I am quite confused by EDT OTOH you can have a distribution over actions that have never been done

Do, Then Think

JonasMoss3y30

Just like this classic! https://slatestarcodex.com/2014/03/24/should-you-reverse-any-advice-you-hear/

Theses on Sleep

JonasMoss3y40

About that paper.

The p-values relevant for testosterone are on the lower side, with one them 0.049 (which screams p-hacking) and another at 0.02 (also really shitty). A reasonable back-of-the-envelope method to correct for p-hacking and publication bias involves multiplying the p-values with 20 (the reasoning is not super-involved. think about what happens to the truncated normal distribution in the case of complete publication bias); in that case, none of the testosterone-related p-values in said paper are significant. I feel comfortable ignoring it.

Open problem: how can we quantify player alignment in 2x2 normal-form games?

JonasMoss4y10

It's a game, just a trivial one. Snakes and Ladders is also a game, and its payoff matrix is similar to this one, just with a little bit of randomness involved.

My intuition says that this game not only has maximal alignment, but is the only game (up to equivalence) game with maximal alignment for any set of strategies $s, r$ . No matter what player 1 and player 2 does, the world is as good as it could be.

The case can be compared to the $R^{2}$ when the variance of the dependent variable is 0. How much of the variance in the dependent variable does th... (read more)

Variables Don't Represent The Physical World (And That's OK)

JonasMoss4y40

This reminds me of the propensity of social scientists to drop inference when studying the entire population, claiming that confidence intervals do not make any sense when we have every single existing data point. But confidence intervals do make sense even then, as the entire observed population isn't equal to the theoretical population. The observed population does not give us exact knowledge about any properties of the data generating mechanism, except in edge cases.

(Not that confidence intervals are very useful when looking at linear regressions with millions of data points anyway, but make sure to have your justification right.)

Open problem: how can we quantify player alignment in 2x2 normal-form games?

JonasMoss4y20

I believe the upper right-hand corner of $a$ shouldn't be 1; even if both players are acting in each other's best interest, they are not acting in their own best interest. And alignment is about having both at the same time. The configuration of Prisoner's dilemma makes it impossible to make both players maximally satisfied at the same time, so I believe it cannot have maximal alignment for any strategy.

Anyhow, your concept of alignment might involve altruism only, which is fair enough. In that case, Vanessa Kosoy has a similar proposal to mi... (read more)

Open problem: how can we quantify player alignment in 2x2 normal-form games?

Answer by JonasMossJun 17, 202120

Alright, here comes a pretty detailed proposal! The idea is to find out if the sum of expected utility for both players is “small” or “large” using the appropriate normalizers.

First, let's define some quantities. (I'm not overly familiar with game theory, and my notation and terminology are probably non-standard. Please correct me if that's the case!)

$A .$ The payoff matrix for player 1.
$B .$ The payoff matrix for player 2.
$s, r$ the mixed strategies for players 1 and 2. These are probability vectors, i.e., vectors of non-negative numbers summing to

... (read more)

1Forged Invariant4y

I like how this proposal makes explicit the player strategies, and how they are incorporated into the calculation. I also think that the edge case where the agents actions have no effect on the result I think that this proposal making alignment symmetric might be undesirable. Taking the prisoner's dilemma as an example, if s = always cooperate and r = always defect, then I would say s is perfectly aligned with r, and r is not at all aligned with s. The result of 0 alignment for the Nash equilibrium of PD seems correct. I think this should be the alignment matrix for pure-strategy, single-shot PD: a=[1,11,00,10,0] Here the first of each ordered pair represents A's alignment with B. (assuming we use the [0,1] interval) I think in this case the alignments are simple, because A can choose to either maximize or to minimize B's utility.