michael_h — LessWrong

Good point. Thank you for bringing this up. I just had a closer look in my notes at how the complexity penalty is derived and there is a additional assumption that I left out.

The derivation uses a matrix $X$ with $p + 1$ columns and $n$ rows which has entry $x_{i}^{j - 1}$ in the $i^{t h}$ row and $j^{t h}$ column (where $x_{1}, x_{2}, \dots, x_{n}$ is the training set). In the derivation it is assumed that $X$ has rank $p + 1$ which true most of the time provided that $n \geq p + 1$ . For simplicity I won't add a mention of this matrix to original post but I will add the assumption $n \geq p + 1$ .

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments