User Comment Replies

Why you maybe should lift weights, and How to.

I'm curious about your thoughts on lifting vs endurance training. I thought in terms of general health optimization a combination of them would be better then just lifting

2samusasuke3mo

For "optimization" I totally agree. This was not the goal here, but yeah I think if you can only do one you should lift, as it will improve your cardio to above baseline levels, but doing both is probably better than just lifting

Bayesians Commit the Gambler's Fallacy

Mlxa1y70

I agree with @James Camacho, at least intuitively, that stickiness in the space of observations is equivalent to switchiness in the space of their prefix XORs and vice versa. Also, I tried to replicate, and didn't observe the mentioned effect, so maybe one of us has a bug in the simulation.

1Kevin Dorst1y

Mathematica notebook is here! Link in the full paper. How did you define Switchy and Sticky? It needs to be >= 2-steps, i.e. the following matrices won't exhibit the effect. So it won't appear if they are eg Switchy = (0.4, 0.6; 0.6, 0.4) Sticky = (0.6,0.4; 0.4,0.6) But it WILL appear if they build up to (say) 60%-shiftiness over two steps. Eg: Switchy = (0.4, 0 ,0.6, 0; 0.45, 0, 0.55, 0; 0, 0.55, 0, 0.45, 0, 0.6, 0, 0.4) Sticky = (0.6, 0 ,0.4, 0; 0.55, 0, 0.45, 0; 0, 0.45, 0, 0.55, 0, 0.4, 0, 0.6)

Predictive model agents are sort of corrigible

Mlxa1y10

I agree that if a DT is trained on trajectories which sometimes contain unoptimal actions, but the effect size of this mistake is small compared to the inherent randomness of the return, the learned policy will also take such unoptimal action, though with a bit less frequency. (By inherent randomness I mean, in this case, the fact that Player 2 pushes the button randomly and independently of the actions of Player 1)

But where do these trajectories come from? If you take them from another RL algorithm, then the optimization is already there. And if you start... (read more)

Agents which are EU-maximizing as a group are not EU-maximizing individually

Mlxa1y10

That example with traders was to show that in the limit these non EU-maximizers actually become EU-maximizers, now with linear utility instead of logaritmic. And in other sections I tried to demonstrate that they are not EU-maximizers for a finite number of agents.

First, in the expression for their utility based on the outcome distribution, you integrate something of the form $f (x_{1}, x_{2}) p (x_{1}) p (x_{2}) d x$ , a quadratic form, instead of $f (x) p (x) d x$ as you do to compute expected utility. By itself it doesn't prove that there is no utility function, because the... (read more)

LESSWRONG
LW

All of Mlxa's Comments + Replies