It was a relatively fringe topic that only recently got the attention of a large number of real researchers. And parts of it could need large amounts of computational power afforded by only by superhuman narrow AI.
There have been a few random phd dissertations saying the topic is hard but as far as I can tell there has only recently been push for a group effort by capable and well funded actors (I.e. openAI’s interpretability research).
I don’t trust older alignment research much as an outsider. It seems to me that Yud has built a cult of personality around...
And, sure, but it's not clear why any of this matters? What is the thing that we're going to (attempt) to do with AI, if not use it to solve real-world problems?
It matters because the original poster isn’t saying we don’t use it to solve real world problems, but rather that real world constraints (I.e. laws of physics) will limit its speed of advancement.
An AI likely cannot easily predict a chaotic system unless it can simulate reality at a high fidelity. I guess Op is assuming the TAI won’t have this capability, so even if we do solve real world problems with AI, it is still limited by real world experimentation requirements.
I better include the predict words about the appropriate amount of green-coloured objects”, and write about green-coloured objects even more frequently, and then also notice that, and in the end, write exclusively about green objects.
Can you explain this logic to me? Why would it write more and more on green coloured objects even if its training data was biased towards green colored objects? If there is a bad trend in its output, without reinforcement, why would it make that trend stronger? Do you mean, it recognizes incorrectly that improving said bad tre...
... (read more)I think my point is lowering it to just there being a non trivial probability of it following the rule. Fully aligning AIs to near certainty may be a higher bar than just potentially aligning AI.