User Comment Replies

[Crosspost] On Hreha On Behavioral Economics

FordO4y10

What is ACX?

1Dave Orr4y

https://astralcodexten.substack.com/ FKA slatestarcodex.

2Ruby4y

Astral Codex Ten

Agency in Conway’s Game of Life

FordO4y20

I have never thought much about reversibility. Do humans make heavy use of reversible processes? And what is the relation between cross entropy and reversibility?

3Ben3y

I don't know if GR or some cosmological thing (inflation) breaks reversibility. But classical and quantum mechanics are both reversible. So I would say that all of the lowest-level processes used by human beings are reversible. (Although of course thermodynamics does the normal counter-intuitive thing where the reversibility of the underlying steps is the reason why the overall process is, for all practical purposes, irreversible.) This paper looks at mutual information (which I think relates to the cross entropy you mention), and how it connects to reversibility and entropy. https://bayes.wustl.edu/etj/articles/gibbs.vs.boltzmann.pdf (Aside, their is no way that whoever maintains the website hosting that paper and the LW community don't overlap. The mutual information is too high.)

Everyday Lessons from High-Dimensional Optimization

FordO5y30

Do you count gradient descent as a blackbox optimization, and isn't backpropagation guided by gradient (at least in ANN)?

3johnswentworth5y

Gradient descent requires somehow computing the gradient. There are both blackbox and non-blackbox ways of doing that, and the blackbox methods are much more expensive. Backpropagation is the main non-blackbox method for computing the gradient; it looks at the steps used to compute the function and propagates information backward through those steps. If the function takes m steps to compute, backprop takes O(m) to compute the gradient. The main blackbox method for computing gradients is to evaluate the function itself, then evaluate it with a small change in one coordinate, then evaluate it with a small change in the next coordinate, and so forth - one evaluation for each coordinate. If the function takes m steps to compute and has n inputs, then this takes O(mn).

Everyday Lessons from High-Dimensional Optimization

FordO5y20

Care to elaborate how did you get to the formula O(1/sqrt(n))?

3Unnamed5y

The distance between the n-dimensional points (0,0,...,0) and (1,1,...,1) is sqrt(n). So if you move sqrt(n) units along that diagonal, you move 1 unit along the dimension that matters. Or if you move 1 unit along the diagonal, you move 1/sqrt(n) units along that dimension. 1/sqrt(n) efficiency. If you instead move 1 unit in a random direction then sometimes you'll move more than that and sometimes you'll move less, but I figured that was unimportant enough on net to leave it O(1/sqrt(n)).

LESSWRONG
LW

All of FordO's Comments + Replies