One thing I'm interested in but don't know where to start looking for it, is seeing people who are working instead on the reverse direction - mathematical approaches which show aligned AI is not possible or likely. By this I mean formal work that suggests something like "almost all AGIs are unsafe", in the same way that the chances of picking a rational number at random from is zero because almost all real numbers are irrational.
I don't say this to be a downer! I mean it in the sense of a mathematician who spent 7 years attempting to prove X exists, ...
I have been working on an argument from that angle.
I've been developing it independently from my own background in autonomous safety-critical hardware/software systems, but I discovered recently that it's very similar to Drexler's CAIS from 2019, except with more focus on low-level evidence or rationale for why certain claims are justified.
It isn't so much a pure mathematical approach as it is a systems engineering or systems safety perspective on all of the problems[1][2] that would remain even if someone showed up tomorrow and dropped a formally ver...
No known solutions can solve our hardest problems—that’s why they’re the hardest ones.
I like the energy, but I have to register a note of dissent here.
Quite a few of our hardest problems do have known solutions - it's just that those known solutions are, or appear, too hard to implement.
I think your trust-o-meter is looking for people who have an unusually low level of self-deception. The energy is Great if you share my axioms or moral judgments, but for Pete's sake, at least be consistent with your own.
What suggests this to me is the Breaking Bad example, because Walter White really does move on a slow gradient from more to less self-decieved throughout the show in my read of his character - it just so happens that the less self-decieved he is, the more at home he becomes with perpetuating monstrous acts as a result of the previous histo... (read more)