Agents which are EU-maximizing as a group are not EU-maximizing individually
Introduction Why Subagents? and Why Not Subagents? explore whether a group of expected utility maximizers is itself a utility maximizer. Here I want to discuss the converse: if a group wants to maximize some utility function as a whole, what can be said about the individual agents? Of course, if...
One of the specific cases of feature learning mystery is MLP being able to learn sparse parities, i.e. output is XOR of some k bits of the input which is n bits in total, and MLP is able to learn this in close to O(n^k), which is actually the computational limit here. In this paper they give a very nice intuition (Section 4.1) about why even in a network with a single layer (and ReLU on top of it) gradients will contain some information about the solution. TLDR: Gradient of "ReLU of the sum of incoming activations", if we consider incoming weights all being one (that's the example they study), is just a... (read more)