(Reposted from Facebook)
Hey Weibing Wang! Thanks for sharing. I just started skimming your paper, and I appreciate the effort you put into this; it combines many of the isolated work people have been working on.
I also appreciate your acknowledgement that your proposed solution has not undergone experimental validation, humility, and the suggestion that these proposed solutions need to be tested and iterated upon as soon as possible due to the practicalities of the real world.
I want to look into your paper again when I have time, but some quick comments:
https://www.lesswrong.com/tag/ai-services-cais?sortedBy=new https://www.lesswrong.com/posts/LxNwBNxXktvzAko65/reframing-superintelligence-llms-4-years
You should make the paper into a digestible format of sub-projects you can post to find collaborators to make progress on to verify some parts experimentally and potentially collaborate with some governance folks to turn some of your thoughts into a report that will get the eyeballs of important people on it.
Need more technical elaboration on the "how" to do x, not just "what" needs to be done.
edit: I took a quick look, and this looks really good! Big upvote. Definitely an impressive body of work. And the actual alignment proposal is along the lines of the ones I find most promising on the current trajectory toward AGI. I don't see a lot of references to existing alignment work, but I do see a lot of references to technical results, which is really useful and encouraging. Look at Daniel Kokatijlo's and other work emphasizing faithful chain of thought for similar suggestions.
edit continued: I find your framing a bit odd, in starting from an unaligned, uninterpretable AGI (but that's presumably under control). I wonder if you're thinking of something like o1 that basically does what it's told WRT answering questions/providing data but can't be considered overall aligned, and which isn't readily interpretable because we can't see its chain of thought? A brief post situating that proposal in relation to current or near-future systems would be interesting, at least to me.
Original:
Interesting. 100 pages is quite a time commitment. And you don't reference any existing work in your brief pitch here - that often signals that people haven't read the literature, so most of their work is redundant with existing stuff or missing big considerations that are part of the public discussion. But it seems unlikely that you'd put in 100 pages of writing without doing some serious reading as well.
Here's what I suggest: relate this to existing work, and reduce the reading-time ask, by commenting on related posts with a link to and summary of the relevant sections of your paper.
I have a lot of ideas about AGI/ASI safety. I've written them down in a paper and I'm sharing the paper here, hoping it can be helpful.
Title: A Comprehensive Solution for the Safety and Controllability of Artificial Superintelligence
Abstract:
As artificial intelligence technology rapidly advances, it is likely to implement Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI) in the future. The highly intelligent ASI systems could be manipulated by malicious humans or independently evolve goals misaligned with human interests, potentially leading to severe harm or even human extinction. To mitigate the risks posed by ASI, it is imperative that we implement measures to ensure its safety and controllability. This paper analyzes the intellectual characteristics of ASI, and three conditions for ASI to cause catastrophes (harmful goals, concealed intentions, and strong power), and proposes a comprehensive safety solution. The solution includes three risk prevention strategies (AI alignment, AI monitoring, and power security) to eliminate the three conditions for AI to cause catastrophes. It also includes four power balancing strategies (decentralizing AI power, decentralizing human power, restricting AI development, and enhancing human intelligence) to ensure equilibrium between AI to AI, AI to human, and human to human, building a stable and safe society with human-AI coexistence. Based on these strategies, this paper proposes 11 major categories, encompassing a total of 47 specific safety measures. For each safety measure, detailed methods are designed, and an evaluation of its benefit, cost, and resistance to implementation is conducted, providing corresponding priorities. Furthermore, to ensure effective execution of these safety measures, a governance system is proposed, encompassing international, national, and societal governance, ensuring coordinated global efforts and effective implementation of these safety measures within nations and organizations, building safe and controllable AI systems which bring benefits to humanity rather than catastrophes.
Content:
The paper is quite long, with over 100 pages. So I can only put a link here. If you're interested, you can visit this link to download the PDF: https://www.preprints.org/manuscript/202412.1418/v1
or you can read the online HTML version at this link:
https://wwbmmm.github.io/asi-safety-solution/en/main.html