Or maybe we're going to go extinct real soon now, because we lack ability to reflect like this, and consequently didn't have couple thousands years to develop effective theory of mind for FAI before we make the hardware.
Having the ability to design and understand AI for a couple of thousand of years but somehow the inability to actually implement it sounds just about perfect. If only!
That is one idea for hacking friendliness: "Become the AI we would make if there were no existential threats and we didn't have the hardware to implement it for a few thousand years, and flaming letters appeared on the moon saying 'thou shall focus on designing Frienly AI' "
Havn't bothered typing it out before because it falls in the reference class of trying to cheat on FAI, wich is always a bad idea, but it seemed relevant here.
Are there any essays anywhere that go in depth about scenarios where AIs become somewhat recursive/general in that they can write functioning code to solve diverse problems, but the AI reflection problem remains unsolved and thus limits the depth of recursion attainable by the AIs? Let's provisionally call such general but reflection-limited AIs semi-general AIs, or SGAIs. SGAIs might be of roughly smart-animal-level intelligence, e.g. have rudimentary communication/negotiation abilities and some level of ability to formulate narrowish plans of the sort that don't leave them susceptible to Pascalian self-destruction or wireheading or the like.
At first blush, this scenario strikes me as Bad; AIs could take over all computers connected to the internet, totally messing stuff up as their goals/subgoals mutate and adapt to circumvent wireheading selection pressures, without being able to reach general intelligence. AIs might or might not cooperate with humans in such a scenario. I imagine any detailed existing literature on this subject would focus on computer security and intelligent computer "viruses"; does such literature exist, anywhere?
I have various questions about this scenario, including: