Thoughts on the different sub-questions, from someone that doesn't professionally work in AI safety:
Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.
(are we trying to find a trusted arbiter? Find people that are competent to do the evaluation? Find a way to assign blame if things go wrong? Ideally these would all be the same person/organization, but it's not guaranteed).
Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.
(Are we attempting to identify a trusted mediator? Are we seeking individuals competent enough for evaluation? Or are we trying to establish a mechanism to assign accountability should things go awry? Ideally, all these roles would be fulfilled by the same entity or individual, but it's not necessarily the case.)
I understand your point, but it seems that we need a specific organization or team designed for such operations. Why did I pose the question initially? I've developed a prototype for a shutdown mechanism, which involves a potentially hazardous step. This prototype requires assessment by a reliable and skilled team. From my observations of discussions on LW, it appears there's a "clash of agendas" that takes precedence over the principle of "preserving life on earth." Consequently, this might not be the right platform to share anything of a hazardous nature.
Thank you for taking the time to respond to my inquiry.
It does seem a bit odd, so I strongly upvoted the post. MiguelDev seems to be a genuine poster, or at least far from a troll/spammer/etc.
Even if no one knows the answer to this question, it is still worth exploring. That is why I do find it weird how this platform operates, not allowing freedom to discuss important problems.
It is very interesting how questions can be promoted or not promoted here in LessWrong by the moderators.
What is the established process, or what could be a potential process, for validating a highly probable alignment solution? I assume that a robust alignment solution would undergo some form of review—so who is responsible for reviewing these proposals? Is LessWrong the platform for this? or is there a specialized communication network that researchers can access should such a situation arise?
For now, the option I see is that the burden falls on the researcher to demonstrate the solution in a real-world setting and have it peer reviewed. However, this approach risks exposing the techniques used, making them available for capabilities research as well.
If there is already a post addressing this question, please share it here. Thank you.