Zetetic comments on Safety Culture and the Marginal Effect of a Dollar - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (105)
My understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI's code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly.
If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person's utility function (and how to extract each person's utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
My comment wasn't directed towards CEV at all -- CEV sounds like a sensible working definition of "friendly enough", and I agree that it's probably computationally hard.
I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go "splat", not "FOOM" -- to degenerate into something that doesn't work at all.