Introduction to French AI Policy

Lucie Philippon

The more important aim of this conversion is that now the minima of the term in the exponent, $K (w)$ , are equal to 0. If we manage to find a way to express $K (w)$ as a polynomial, this lets us to pull in the powerful machinery of algebraic geometry, which studies the zeros of polynomials. We've turned our problem of probability theory and statistics into a problem of algebra and geometry.

Wait... but $K (w)$ just isn't a polynomial most of the time. Right? From its definition above, $K (w$ ) differs by a constant from the log-likelihood&n... (read more)

Neural networks generalize because of this one weird trick

Frank Seidl2y70

4Jesse Hoogland2y

To take a step back, the idea of a Taylor expansion is that we can express any C∞ function as an (infinite) polynomial. If you're close enough to the point you're expanding around, then a finite polynomial can be an arbitrarily good fit. The central challenge here is that K(w) is pretty much never a polynomial. So the idea is to find a mapping, g, that lets us re-express w in terms of a new coordinate system, w=g(u). If we do this right, then we can express K(g(u)) (locally) as a polynomial in terms of the new coordinates, u. What we're doing here is we're "fixing" the non-differentiable singularities in K(w) so that we can do a kind of Taylor expansion over the new coordinates. That's why we have to introduce this new manifold, U, and mapping g.

LESSWRONG
LW

All of Frank Seidl's Comments + Replies