Mlxa — LessWrong

LESSWRONG
LW

Mlxa — LessWrong

Replying toDeep learning as program synthesis

One of the specific cases of feature learning mystery is MLP being able to learn sparse parities, i.e. output is XOR of some k bits of the input which is n bits in total, and MLP is able to learn this in close to O(n^k), which is actually the computational limit here. In this paper they give a very nice intuition (Section 4.1) about why even in a network with a single layer (and ReLU on top of it) gradients will contain some information about the solution. TLDR: Gradient of "ReLU of the sum of incoming activations", if we consider incoming weights all being one (that's the example they study), is just a... (read more)

Replying toWhy you maybe should lift weights, and How to.

Mlxa1y

Why you maybe should lift weights, and How to.

I'm curious about your thoughts on lifting vs endurance training. I thought in terms of general health optimization a combination of them would be better then just lifting

Replying toBayesians Commit the Gambler's Fallacy

Mlxa2y

Bayesians Commit the Gambler's Fallacy

I agree with @James Camacho, at least intuitively, that stickiness in the space of observations is equivalent to switchiness in the space of their prefix XORs and vice versa. Also, I tried to replicate, and didn't observe the mentioned effect, so maybe one of us has a bug in the simulation.

Replying toPredictive model agents are sort of corrigible

Mlxa2y

Predictive model agents are sort of corrigible

I agree that if a DT is trained on trajectories which sometimes contain unoptimal actions, but the effect size of this mistake is small compared to the inherent randomness of the return, the learned policy will also take such unoptimal action, though with a bit less frequency. (By inherent randomness I mean, in this case, the fact that Player 2 pushes the button randomly and independently of the actions of Player 1)

But where do these trajectories come from? If you take them from another RL algorithm, then the optimization is already there. And if you start from random trajectories, it seems to me that one iteration of DT will not be enough,... (read more)

Replying toAgents which are EU-maximizing as a group are not EU-maximizing individually

Mlxa2y

Agents which are EU-maximizing as a group are not EU-maximizing individually

That example with traders was to show that in the limit these non EU-maximizers actually become EU-maximizers, now with linear utility instead of logaritmic. And in other sections I tried to demonstrate that they are not EU-maximizers for a finite number of agents.

First, in the expression for their utility based on the outcome distribution, you integrate something of the form, a quadratic form, instead of $f (x) p (x) d x$ as you do to compute expected utility. By itself it doesn't prove that there is no utility function, because there might be some easy cases like $\int (x_{1} + x_{2}) p (x_{1}) p (x_{2}) d x_{1} d x_{2} = \int x_{1} p (x_{1}) d x_{1} + \int x_{2} p (x_{2}) d x_{2}$ , and I didn't rigorously proof that this utility function can't be split, though it feels very unlikely to me that something can be done with such non-linearity.

Second, in the example about Independence axiom we have $U (0.5 A + 0.5 B) \neq 0.5 U (A) + 0.5 U (B)$ , which should have been equal if $U$ was equivalent to expectation of some utility function.

Agents which are EU-maximizing as a group are not EU-maximizing individually

Mlxa

Introduction

Why Subagents? and Why Not Subagents? explore whether a group of expected utility maximizers is itself a utility maximizer. Here I want to discuss the converse: if a group wants to maximize some utility function as a whole, what can be said about the individual agents? Of course, if they could make decisions together, they will just compute what each agent needs to do, but what if the only thing they have is a common algorithm that each of them uses independently?

It seems that such agents, in general, don't make decisions by multiplying utilons with probabilities and instead they need to consider the whole distribution of outcomes to evaluate a choice. A... (read 576 more words →)