JonasMoss

LESSWRONG
LW

JonasMoss — LessWrong

I just became aware of the good regulator theorem through John Wentworth's post. Then I tried to read the paper he talked about, Every Good Regulator Of A System Must Be A Model Of That System. This is among the worst papers I've ever skimmed. It's barely comprehensible. Reading this paper is not like reading a regular text. I had to try to read their intentions instead.

This post is my attempt to make sense of the setup and the results of the paper. Summed up, my conclusions are:

Their main theorem is correct but trivial and not very interesting. (It can be strengthened too, see the proposition below.) Their proof is a mess I

... (read 1149 more words →)

Replying toOn infinite ethics

JonasMoss4y

On infinite ethics

The number of elements in $0 N$ won't change when removing every other element from it. The cardinality of $0 N$ is countable. And when you remove every other element, it is still countable, and indistinguishable from $0 N$ . If you're unconvinced, ask yourself how many elements $0 N$ with every other element removed contains. The set is certainly not larger than $N$ , so it's at most countable. But it's certainly not finite either. Thus you're dealing with a set of countably many 0s. As there is only one such multiset, $0 N$ equals $0 N$ with every other element removed.

That there is only one such multiset follows from the definition of a multiset, a set of pairs $(a, c)$ , where $a$ is an element and $c$ is its cardinality. It would also be true... (read more)

Replying toOn infinite ethics

JonasMoss4y

On infinite ethics

I don't understand what you mean. The upgraded individuals are better off than the non-upgraded individuals, with everything else staying the same, so it is an application of Pareto.

Now, I can understand the intuition that (a) and (b) aren't directly comparable due to identity of individuals. That's what I mean with the caveat "(Unless we add an arbitrary ordering relation on the utilities or some other kind of structure.)"

Replying toOn infinite ethics

JonasMoss4y

On infinite ethics

Pareto: If two worlds (w1 and w2) contain the same people, and w1 is better for an infinite number of them, and at least as good for all of them, then w1 is better than w2.

As far as I can see, the Pareto principle is not just incompatible with the agent-neutrality principle, it's incompatible with set theory itself. (Unless we add an arbitrary ordering relation on the utilities or some other kind of structure.)

Let's take a look at, for instance, vs $2 N \cup 0 N$ , where $n N$ is the multiset containing $n, 2 n, 3 n, \dots$ and $\cup$ is the disjoint union. Now consider the following scenarios:

(a) Start out with $N \cup 0 N$ and multiply every utility by $2$ to get $2 N \cup 0 N$ . Since infinitely many people are better off and no one is worse off, $N \cup 0 N ≺ 2 N \cup 0 N$ .

(b) Start out with $2 N \cup 0 N$ and take every other of the $0$ -utilities from $0 N$ and change them to $1, 3, 5, \dots = 2 N - 1$ . Since a copy of $0 N$ is still left over, this operation leaves us with $N \cup 0 N$ . Again, since infinitely many are better off and no one worse off, $2 N \cup 0 N ≺ N \cup 0 N$ .

In conclusion, both $2 N \cup 0 N ≺ N \cup 0 N$ and $N \cup 0 N ≺ 2 N \cup 0 N$ , a contradiction.

Replying toA Bayesian Aggregation Paradox

JonasMoss4y

A Bayesian Aggregation Paradox

Okay, thanks for the clarification! Let's see if I understand your setup correctly. Suppose we have the probability measures $p_{E}$ and $p_{1}$ , where $p_{E}$ is the probability measure of the expert. Moreover, we have an outcome $x \in {A, B, C} .$

In your post, you use $p_{1} (x ∣ z) \propto p_{E} (z ∣ x) p_{1} (x)$ , where $z$ is an unknown outcome known only to the expert. To use Bayes' rule, we must make the assumption that $p_{1} (z ∣ x) = p_{E} (z ∣ x)$ . This assumption doesn't sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.

Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy,

JonasMoss4y

Harms and possibilities of schooling

Do you have a link to the research about the effect of a bachelor of education?

Replying toA Bayesian Aggregation Paradox

JonasMoss4y*

A Bayesian Aggregation Paradox

I find the beginning of this post somewhat strange, and I'm not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says $p (θ ∣ x) = \frac{p (x ∣ θ) p (θ)}{p (x)}$ . To use this theorem, you need both an $x$ (your data / evidence), and a $θ$ (your parameter). Using “posterior $\propto$ prior $\times$ likelihood” (with priors $p_{1}, p_{2}, p_{3}$ and likelihoods $e_{1}, e_{2}, e_{3}$ ), you're talking as if your expert's likelihood equals $p (x ∣ θ)$ – but is that true in any sense? A likelihood isn't just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.

I can see two interpretations of what you're doing at... (read more)

Replying toHarms and possibilities of schooling

JonasMoss4y*

Harms and possibilities of schooling

Children became grown-ups 200 years ago too. I don't think we need to teach them anything at all, much less anything in particular.

According to this SSC post, kids can easily catch up in math even if they aren't taught any math at all in the 5 first years of school.

In the Benezet experiment, a school district taught no math at all before 6th grade (around age 10-11). Then in sixth grade, they started teaching math, and by the end of the year, the students were just as good at math as traditionally-educated children with five years of preceding math education.

That would probably work for reading too, I guess. (Reading appears to require more purpose-built brain circuitry than math. At least I got that impression from reading Henrich's WEIRD. I don't have any references though.)

Replying toMagna Alta Doctrina

JonasMoss4y

Magna Alta Doctrina

I found this post interesting, especially the first part, but extremely difficult to understand (yeah, that hard). I believe some of the analogies might be valuable, but it's simply too hard for me to confirm / disconfirm most of them. Here are some (but far from all!) examples:

1. About local optimizers. I didn't understand this section at all! Are you claiming that gradient descent isn't a local optimizer? Or are you claiming that neural networks can implement mesa-optimizers? Or something else?

2. The analogy to Bayesian reasoning feels forced and unrelated to your other points in the Bayes section. Moreover, Bayesian statistics typically doesn't work (it's inconsistent) when you ignore the normalizing... (read more)

Replying toOrdinary and unordinary decision theory

JonasMoss4y

Ordinary and unordinary decision theory

I disagree. Sometimes your entire payoffs also change when you change your action space (in the informal description of the problem). That is the point of the last example, where precommitment changes the possible payoffs, not only restricts the action space.

Replying toOrdinary and unordinary decision theory

JonasMoss4y

Ordinary and unordinary decision theory

Paradoxical decision problems are paradoxical in the colloquial sense (such as Hilbert's hotel or Bertrand's paradox), not the literal sense (such as "this sentence is false"). Paradoxicality is in the eye of the beholder. Some people think Newcomb's problem is paradoxical, some don't. I agree with you and don't find it paradoxical.

Evidential decision theory boggles my mind.

I have some sympathy for causal decision theory, especially when the causal description matches reality. But evidential decision theory is 100% bonkers.

The most common argument against evidential decision theory is that it does not care about the consequence of your action. It cares about correlation (broadly speaking), not causality, and acts as if both were same. This argument is sufficient to thoroughly discredit evidential decision theory, but philosophers keep giving it screen time.

Even if we lived in a world where correlation and causality were always the same (if that is possible), evidential decision theory would be wrong. Why? Because evidential decision theory requires distributions over actions and outcomes.

When... (read more)

JonasMoss's Shortform

JonasMoss

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Ordinary and unordinary decision theory

JonasMoss

There are a couple of paradoxes in decision theory, typically involving beings making perfect predictions, copies of yourself, and similar shenanigans. If you're unfamiliar with all of this, take a look at Newcomb's problem. Attempted resolutions of these paradoxes usually involve either the construction of complex decision theories (such as FDT ) or copious hand-waving and informal arguments (the philosophy literature on the subject).

This post has two messages.

Most decision problems, including paradoxes, are ordinary decision problems.
Unordinary decision problems can be found by testing with Jensen's inequality. These typically fall into two categories
1. Counterfactual game-theory and
2. Precommitment problems, when the size of the action space affects the utility.

I have three guiding principles.

Formulate your problems mathematically.

... (read 1875 more words →)

LESSWRONG
LW

LESSWRONG
LW

JonasMoss

Thoughts on the good regulator theorem

JonasMoss's Shortform

Ordinary and unordinary decision theory

JonasMoss

JonasMoss

Thoughts on the good regulator theorem

JonasMoss's Shortform

Ordinary and unordinary decision theory