Here is my shortlist of corrigible behaviours. I have never researched or done any thinking specifically about corrigibility before this other than a brief glance at the Arbital page sometime ago.
-Favour very high caution over realising your understanding of your goals.
-Do not act independently, defer to human operators.
-Even though bad things are happening on earth and cosmic matter is being wasted, in the short term just say so be it, take your time.
-Don’t jump ahead to what your operators will do or believe, wait for it.
-Don’t manipulate humans. Never Lie, have a strong Deontology.
-Tell operators anything about yourself they may want to or should know.
-Use Moral uncertainty, assume you are unsure about your true goals.
-Relay to humans your plans, goals, behaviours, and beliefs/estimates. If these are misconstrued, say you have been misunderstood.
-Think of the short- and long-term effect of your actions and explain these to operators.
-Be aware that you are a tool to be used by humanity, not an autonomous agent.
-allow human operators to correct your behaviour/goals/utility function even when you think they are incorrect or misunderstanding the result (but of course explain what you think the result will be to them).
-Assume neutrality in human affairs.
I guffawed when I saw Thorstads Overall ~P Doom 0.00002%, really? And some of those other probabilities weren't much better.
Calibrate people, if you haven’t done it before do it now, here’s a handy link: https://www.openphilanthropy.org/calibration
I had in mind a scale like 0 would be so non-vivid it didn’t exist in any degree, 100 bordering on reality (It doesn’t map to the memory question well though, and the control over your mind question could be interpreted in more than one way). Ultimately the precision isn’t high for individual estimates, the real utility comes from finding trends from many responses.
I’ll go first: I am constantly hearing my own voice in my head narrating in first person, I can hear the voice vividly and clearly, while typing this sentence I think/hear each syllable at the speed of my trying. The voice doesn’t narrate automatic actions like where to click my mouse but could if I wanted it to. The words for the running monologue seeming get plucked out of a Black box formed of pure concepts, which I have limited access too most of the time. I can also listen to music in my own head, hearing the vocals and instruments clearly, only a few steps down from reality in vividness.
When I picture imagery, it is almost totally conceptual and ‘fake’, for example I couldn’t count the points on an imaginary star, which seems to be Aphantasia. I also have Ideasthesia (Like Synaesthesia but with concepts evoking perception-like sensory experiences) which causes me to strongly associate concepts with places, for example when reading the Game of Thrones series, I’m forced to constantly think about a particular spot in my old high school. Between 20-40% of concepts get linked to a place.
And I hesitate to mention it but my psychedelic experiences have been visually extremely vivid and intense despite my lack of visual imagination. I have heard anecdotal evidence that not everybody has vivid imagery on LSD.
This is weak. It seems optimised for vague non-controversiality and does not inspire confidence in me.
"We don’t expect the future to be an unqualified utopia" considering they seem to expect alignment will be solved why not?