All of Eli Sennesh's Comments + Replies

>"If I value apples at 3 units and oranges at 1 unit, I don't want at 75%/25% split. I only want apples, because they're better! (I have no diminishing returns.)"

I think what I'd have to ask here is: if you only want apples, why are you spending your money on oranges? If you will not actually pay me 1 unit for an orange, why do you claim you value oranges at 1 unit?

Another construal: you value oranges at 1 orange per 1 unit because if I offer you a lottery over those and let you set the odds yourself, you will choose to set t... (read more)

1Connor McCormick
The problem with the typeface on LW comments is that I, l and 1 look really damn similar. 

So we could quibble over the details of Friston 2009, *buuuuut*...

I don't find it useful to take Friston at 110% of his word. I find it more useful to read him like I read all other cognitive modelers: as establishing a language and a set of techniques whose scientific rigor he demonstrates via their application to novel experiments and known data.

He's no more an absolute gold-standard than, say, Dennett, but his techniques have a certain theoretical elegance in terms of positing that the brain is built out of very few, very efficient core mecha... (read more)

Oh hey, so that's the original KL control paper. Saved!

Oh, I wasn't really trying at all to talk about what prediction-error minimization "really does" there, more to point out that it changes radically depending on your modeling assumptions.

The "distal causes" bit is also something I really want to find the time and expertise to formalize. There are studies of causal judgements grounding moral responsibility of agents and I'd really like to see if we can use the notion of distal causation to generalize from there to how people learn causal models that capture action-affordances.

>But this definitely seems like the better website to talk to Eli Sennesh on :)

Somewhat honored, though I'm not sure we've met before :-).

I'm posting here mostly by now, because I'm... somewhat disappointed with people saying things like, "it's bullshit" or "the mathematical parts of this model are pulled directly from the posterior".

IMHO, there's a lot to the strictly neuroscientific, biological aspects of the free-energy theory, and it integrates well with physics (good prediction resists disorder, &qu... (read more),,,,

3Charlie Steiner
I'm not qualified to comment on the literature in general or how research goes - if you say that treating the brain as drawing actions from a Boltzmann distribution on this weird divergence is useful, I believe you. But it seems like you can extract very specific claims from Friston 2009, like the brain having a model from perceptions to a distribution over "causes" (model parameters), and each step of learning in the brain reducing the KL divergence (specifically!) between a mutable internal generative model of "causes" and the fixed sense-inferred "causes." This is the sort of thing that I failed to find a justification for, and therefore am treating as having a tenuous relation to real brains. And I don't think this is just nitpicking, because fixed inference of causes is used to get fixed motivations that have preferences over causes.

>I wonder if the conversion from mathematics to language is causing problems somewhere. The prose description you are working with is 'take actions that minimize prediction error' but the actual model is 'take actions that minimize a complicated construct called free energy'. Sitting in a dark room certainly works for the former but I don't know how to calculate it for the latter.

There's absolutely trouble here. "Minimizing surprise" always means, to Friston, minimizing sensory surprise under a generative m... (read more),,

Ok, now a post on motivation, affect, and emotion: attempting to explain sex, money, and pizza. Then I’ll try a post on some of my own theories/ideas regarding some stuff. Together, I’m hoping these two posts address the Dark Room Problem in a sufficient way. HEY SCOTT, you’ll want to read this, because I’m going to link a paper giving a better explanation of depression than I think Friston posits.

The following ideas come from one of my advisers who studies emotion. I may bungle it, because our class on the embodied neuroscience of this stuff ha... (read more)

Ok, now the post where I go into my own theory on how to avoid the Dark Room Problem, even without physiological goals.

The brain isn’t just configured to learn any old predictive or causal model of the world. It has to learn the distal causes of its sensory stimuli: the ones that reliably cause the same thing, over and over again, which can be modeled in a tractable way.

If I see a sandwich (which I do right now, it’s lunchtime), one of the important causes is that photons are bouncing off the sandwich, hitting my eyes, and stimulating my retina. Howe... (read more)

2Charlie Steiner
The thing you are minimizing by going outside isn't prediction error for sense data, it's a sort of expected prediction error over a spatial extent in your model. I think both of these are valid concepts to think about, so it's not like this argument shows that prediction error is "really" about building a model of the world and then ensuring that it's both correct and complete - it's an argument about what's more reasonable to model humans as doing. Of course, once you have two possibilities, that usually means you have infinite possibilities. I see where this could lead to people generating a whole family of formalisms. But I still feel like this route leads to oversimplification. For example, sometimes people are happy to just fool their sense-data - we take anesthetics, or look at pornography, or drink diet soda. But sometimes people aren't - the pictures-of-relationships industry is much smaller than the porn industry, people buy free-range beef, or a genuine Rembrandt.

Hi,

I now work in a lab allied to both the Friston branch of neuroscience, and the probabilistic modeling branch of computational cognitive science, so I now feel even more arrogant enough to comment fluently.

I’m gonna leave a bunch of comments over the day as I get the spare time to actually respond coherently to stuff.

The first thing is that we have to situate Friston’s work in its appropriate context of Marr’s Three Levels of cognitive analysis: computational (what’s the target?), algorithmic (how do we want to hit it?), and implementational (how do we... (read more),,,,,,,,,,,,,,,,,,,,,,,,

1do7777
(point 2) Why e (D|D′)P(D′|H) and not P(D|D′,H)P(D′|H)?

Actually, here's a much simpler, more intuitive way to think about probabilistically specified goals.

Visualize a probability distribution as a heat map of the possibility space. Specifying a probabilistic goal then just says, "Here's where I want the heat to concentrate", and submitting it to active inference just uses the available inferential machinery to actually squeeze the heat into that exact concentration as best you can.

When our heat-map takes the form of "heat" over dynamical trajectories, possible "timelines&qu... (read more)

6Bird Concept
I'm confused so I'll comment a dumb question hoping my cognitive algorithms are sufficiently similar to other LW:ers, such that they'll be thinking but not writing this question. "If I value apples at 3 units and oranges at 1 unit, I don't want at 75%/25% split. I only want apples, because they're better! (I have no diminishing returns.)" Where does this reasoning go wrong?
Can you please link me to more on this? I was under the impression that pascal's mugging happens for any utility function that grows at least as fast as the probabilities shrink, and the probabilities shrink exponentially for normal probability functions. (For example: In the toy model of the St. Petersburg problem, the utility function grows exactly as fast as the probability function shrinks, resulting in infinite expected utility for playing the game.)

The Complete Class Theorem says that bounded cost/utility functions are isomorphic to poster... (read more)

Honestly, I've just had to go back and forth banging my head on Friston's free-energy papers, non-Friston free-energy papers, and the ordinary variational inference literature -- for the past two years, prior to which I spent three years banging my head on the Josh Tenenbaum-y computational cog-sci literature and got used to seeing probabilistic models of cognition.

I'm now really fucking glad to be in a PhD program where I can actually use that knowledge.

Oh, and btw, everyone at MIRI was exactly as confused as Scott is when I presented a bunch of free-energy stuff to them last March.

The various papers don't all even implement the same model - the free energy principle seems to be more a design principle than a specific model.`

Bingo. Friston trained as a physicist, and he wants the free-energy principle to be more like a physical law than a computer program. You can write basically any computer program that implements or supports variational inference, throw in some action states as variational parameters, and you've "implemented" the free-energy principle _in some way_.

Overall, the Principle is more of a dom... (read more)

7Daniel Kokotajlo
Can you please link me to more on this? I was under the impression that pascal's mugging happens for any utility function that grows at least as fast as the probabilities shrink, and the probabilities shrink exponentially for normal probability functions. (For example: In the toy model of the St. Petersburg problem, the utility function grows exactly as fast as the probability function shrinks, resulting in infinite expected utility for playing the game.) Also: As I understand them, utility functions aren't of the form "I want to see X P often and Y 1-P often." They are more like "X has utility 200, Y has utility 150, Z has utility 24..." Maybe the form you are talking about is a special case of the form I am talking about, but I don't yet see how it could be the other way around. As I'm thinking of them, utility functions aren't about what you see at all. They are just about the world. The point is, I'm confused by your explanation & would love to read more about this.
6jamii
That was much more informative than most of the papers. Did you learn this by parsing the papers or from another better source?
5habryka
Sorry for the bold, sometimes our editor does weird things with copy-paste and bolds everything you pasted. Working on a fix for that, but it’s an external library and that’s always a bit harder than fixing our code.