Interesting read! Thank you.
On the last evaluation problem: One could give an initial set of indicators of trustworthiness, deception, and alignment; this does not solve the issue of an initial deceptive agent misleading babyAGI or inconsistencies. If attaching meta-data about sourcing is possible, i.e., where/with whom an input was acquired, the babyAGI could also sort it into approach box and re-evaluate the learning later, or could attempt to relearn.
Further suppose we impose requirement for double feedback before acceptance by the deceptive agent and trustworthy trainer, babyAGI could include negative feedback from a trainer (developer or more advanced stable version). That might help stall a bit.
I'm pretty new to this field and only a hobby philosopher with only basic IT knowledge. Sorry for the lack of structure.
Do you know somebody who has framed the problem in the following ways? Let me know.
Here, I aim for an ideal future, and try to have it torn down to see where things could go wrong, but if not, still progress has been made regarding solutions.
My major assumption is, at point X in the future, AI has managed to dominate the world, embodied through robots or with a hold of major life-supporting organizational systems or has masked its presence in the latter.
How could it be ensured it is a 'beneficial reign'?
One bad case: zombie military AI:
- Black Mirror, episode DOG. armed military-level delivery dogs exterminate survivors.
Danger from simple, physically superior agents in the eco-system that are pretty dumb.
Let's skip this for now. We should try to work past that point to be dominated by a 'simple-minded' AI.
I also skip eco-system of competing AIs in the hands of different egoistic agents, biasing the AI with nasty training data, and move on to where the AI has developed some say and agency based on own principles.
How could a hostile intention arise?
Why would the AI intend to self-preserve and antagonize humans other AI?
- Does it depend on the online input for the AI (aka born out of human imagination about domination)?
if so, should we stop 'feeding it' bad ideas, plans, and negative behavior as samples of average human behavior and preference?
Or at least include distinction of fiction/reality or valence-distinguished attention.
Feasibility of take-over: Cross-AI coordination problem:
- If AI is holding back information, between-AI coordination seems a similarly tough task to accomplish
for the AI as it is for humans obtain trust. (except faster communication rate and turn-taking between AIs)
So on what meachnisms would this work for the AI?
It could be that lying works only on some of the levels of interaction, making things weird.
Possibly, as with deceivers, the system that can 'imagine' more nested alternative frames (aka lies, pretense, pretense-pretense - higher-level theory of mind) could dominate, given sufficient computing power and heustics. Or it is the one that can most directly manipulate others.
Let's suppose it's not full symbolic dominance, but more subtle as getting another AI to do some jobs for it, with IoT and iota currrency this could even be democratized and economized among machines.
Then the most adaptive/persuasive AI in an ecosystem could achieve coordination by either top-down manipulation, or random-result standoffs, or a default trust and exchange protocol could be part of trustworthy agents programming (among other options).
If (bio)diversity is part of AI values, this might prevent it from aiming for complete takeover.
Towards solutions:
Virtuous AI values/interspecies ethics/humans as pets:
- What are virtuous values for AI that might also be evolutionarily beneficial for the AI?
1. One road of inquiry can be to get inter-species ethics advanced enough to have species and habitats preserved by the dominant species. Humans have still way to go, but let's suppose we implement Peter Singer.
This seems a hard problem: If humans become one of many AI 'pet' species, whose survival is to be guaranteed (like tamagochi), how would the AI distribute resources to keep humans alive and thriving?
2. In moral development of kids and adults stages are known progressing from optimizing for outcomes for individual benefit (like getting food and attention) to collective network qualities (like living in a just society).
'Maturing' to this level in human history has been preceded by much war and loss of life.
However, implementing the highest level ethics available in AI reasoning protocols might help prevent repeating this long trajectory of human violence in early AI-human relations.
If AI optimizes for qualities of social systems among its supervised species (from a high-level view, akin to the UN), it could adopt a global guardian role (or economically: a (bio) assets-preservation maximization mode).
It is still possible for AI to hamper human development if it sets simplistic goals and reinforcements rather than holding room for development.
...
Humans could get depressed by the domination anyway (see Solaris).
Humans might still take on an intellectual laborer/slave role for AI as sensing agents, simple or 'intuitive' reasoners, random-step novelty and idea generators, guinea pigs. This role could be complex enough for humans to enjoy it.
A superpower AI might select its human contributors (information and code doctors) to advance it in some direction, based on what selection criteria? The AI could get out of hand in any direction ...
This might include non-individualist AI letting itself be shut down for the greater good,
so that next generation development can supercede it, particularly as memory can be preserved.
=> On the other hand, would we entrust surgery on our system to trained apes?
Preservation of life as a rational principle?
...Maybe rationality can be optimized in a way that is life-serving, so that lower-level rationality
still recognizes cues of higher standard as attractor and defers to higher-level rationality for governance, which in turn recognizes intrinsic systems principles (hopefully life-preserving).
=> So that any agent wanting to use consistent rationality would be pulled towards this higher-order vision by the strongest AIs in the eco-system.
Note: Current AI is not that rational, so maturing it fast would be key.
Different Economics:
- Perhaps different economic principles are a must, as well.
It is not compulsory to have a competetitive (adversarial)
distribution and decision-making system about production and supply of goods, if egoism is overcome as an agent quality.
Chances are this is a stage in human development that more humans get past earlier.
This would approach a complete-information economy (i.e., economic theory originally developed for machines...).
However, teaching on large sample of wacky reasoners' inputs would work against it, than rule-based approach here.
Similarly, with higher computing power, assets inventory, set living/holding standards and preference alignments, a balanced economy is within reach,
but could end up an AI-dominated feudal system.
- If human values evolve to supplant half of today's economy (e.g., BS jobs, environment-extractive or species-coopting jobs),
before AI advances to a point of copying human patterns to gain power,
then some of the presumed negative power acquisition mechanisms might not be available for AI.
AI evolution-economics change interdependency problem:
- for higher efficiency affording humans enough assets to change their economy to automation while their basic needs are met, maybe AI needs to be developed to a certain level of sophistication.
-> What are safe levels of operation?
-> What are even the areas wheren Human-AI necessarily have synergies vs. competition?
These are some lines I'd be interested in knowing/finding out more about.
Thanks, Owen. What a nourishing post. The evocative images help.
"what is good and right and comfortable?" - mh, I would switch 'comfortable' for 'at ease' (to include consciously preferred discomfort, which is ok).
It could appear a sazen, to some. Also, a bit cordially funny-sad, how the explanation has underperformed in seducing some of those who may benefit from it. Would need to refine the teaching.
I'll try to add my very subjective take, since I have not noticed this understanding in the comments yet:
'Wholesomeness' is used as a an evocative label (and as such the label is more assotiative art than definition - which is good in case of trying to tap into mental blockages one might be in denial about). It points to a certain, subjectively interpreted inner state. It marks a moral epistemic (!) quality - how I go about believing to have gained assurance that my considered course of action or option is the one to go forward with.
It's akin to an integral check-in, where you connect to not only your cognition, but in turn also your feelings, your body sensations, and your imagined relation to personal spirituality. (here, I guess, focus is on the feelings).
The key to getting it is to not do it (mostly) cognitively. The choice of verbs of 'feel' vs. 'think' is not arbitrary. As long as attention is focused on explaining, understanding, and other mental activities, the mind is leaning forward to examine a certain piece of information, or doing a meticulous systematic inventory, bit by bit, very focused. But all the while the tapped emotions are those associated with those pieces and/or the general inquiry mode, e.g., could be anxious, curious.
By contrast, checking in with one's feeling in a way of unwinding calmness and observing and seeing if there is any remaining discomfort to be sensed and where that discomfort might take me to (potentially unwholesome bits, subconsciously registered caveats) - is a way to bring up challenges that the consciousness had ignored or even pushed away.
So it's like a sort of intuition, or unfixation. If we think of a metaphor from eyesight, it's not looking at any one specific object, but resting the eye in a semi-unfocus, as the room perception settles into one picture of everything in its place.
Emotionally, it also has a quality of accepting things as they are (appreciating unwholesomeness).
Ok, so this is how to do the inquiry. The end state I am checking for is not so much calmness, but more of a deep joy. This comes when I have not just accepted, but indeed appreciated the options - from my intuitively best known stance, unrelated to the current case. I.e., I might come up with a completely different option, based on a much more fulfilling experience I've had before, that I therefore knew was possible (although I only remembered it as a feeling), and which got remembered as a personal standard of how awesome things can be in terms of relational quality. So, it's a subconscious check against one's experienced optimal solution or at least slightly above acceptable or good solutions from the past (the feeling one had).
The advantage is that, likely, different kinds of information, i.e., 'relational', are encoded in the emotion, and a feeling overlap check allows for more efficient processing than bit by bit. Maybe similar to the experience of tasting a dish to see if it is fully satisfying or needs anything else.
The difference with virtues or rightness, may be that we encode different emotions with these (based on our experiences with the terms in their experienced contexts), and that both of these appear to strive only for the good side and to discriminate the bad side, whereas wholesomeness has an acceptance of the imperfect, as well (albeit appropriately deprioritized under the more comprehensive view). 'Unwholesomeness' might accordingly be a misleading terms, as it's not an opposite, but a sensed subpar option. I'd opt for something like 'not-yet-wholesomeness'. Hence, wholesomeness might be the name for a heuristic check of completeness and optimality.
It sounds similar to holistic or ideal - though includes a bit less striving and more sensing for ease. It's not meant to evoke complacency, if I can tell.
The content of the judgment is not objectively informative, but a guidepost for an internal inquiry or check-in against one's individual experience base. If a lack is sensed, this can be investigated to find it, explicate it, and then share. If an OK option is sensed, one might still proceed to try and make it a great option. Someone sensing unease in the other, might invite them to share what would be a more wholesome option.
Still, as a generalizable rule, it seems to call for a relaxation before sharing one's judgment or course of action, and might lead to more balanced, less edgy choices.
...
Trying to cautiously apply this to my interpretation of the case of Ben's example. He chose to prioritize the mission over others' feelings. Now this seems perfectly fine, in terms of priorities. Still it seems to contain a regret about having been unable to optimize for both. The question here is, is it possible to do both? For Ben, it sounds it was mentally demanding to do both at the time, or maybe to cushion the rejection (which is a kind of practice, once I have mastered one, then to layer the other on top). So, here, applying wholesomeness can point to the wish to communicate a necessary rejection in a more caring way (to also preserve the relationship or strengthen the colleague after a blow - in a sincere way), but without significantly more effort (which supposes a bit of tryout). This legitimate wish seems easily confoundable in this example with the agreed upon irrelevance of bringing up feelings altogether as an uninformative complaint (which may or may not be the case). Feelings shouldn't be interpreted as being possible to be hurt under a mission or growth mindset at all, but expectations might be disappointed, trust eroded, and in essence some real issue might be hiding there, but most likely insecurity around exposure of perceived incompetence and fear or stress around workplace consequences, (resentment not yet overcome as a mode of meaning making) which may be addressed with a policy of emotional safety climate or constructive error culture. In turn, Ben might wonder, besides venting frustration, if he could, what positive effect would he like to have as an outcome of the communication (e.g., invite a learning, better coordination, better focus on the project) - and then try and explicitly boost this main purpose in the explanation, just as a bonus. With leisure and for extended collaboration, one might decide to explore expectations, failures to meet them, and optimal signaling. Or not, if it's too much effort. It's not a must, if one optimizes for mission only, one can, e.g., awkwardly ignore or rely on the local version of 'I know it appears harsh at first, now deal with it' or 'kids, will grow out of it'. If one also optimizes for swifter recovery and extended functioning of the 'missionaries', it might be worth a try. To me it seems, both choices are fine from an observer's perspective, but for Ben's individual preference set likely only one of them or an altogether different one might align best. Wholesomeness is then the check of being satisfied and/or holding oneself to highest desired standard.
Here, experienced lack of wholesomeness might point to either ignoring one's highest priority, or if it is met, a longing to meet the lower priorities, as well.
I wish I had a good way of teaching this... maybe the practice of breathing and putting down current surface emotions to settle and note any tensions underneath. what they might tel us. and then any wishes for improvement. Together with having good reasons for doing it in the first place, i.e., not ignoring the problem, but to uncover subconsciously stored information, optimal choice with less effort, invite creativity and wellbeing, have richness of experience in interaction, touch base with moral values when deciding.
For a seasoned or even 'compulsive' rationality user, it might seem irritating, 1) as the unfocus is like taking a break at the wrong time or giving up altogether, but it actually helps synthesizing. 2) one has to admit that actually one has overlooked a whole strand of available information, 3) it reminds of arrogance (but isn't actually), rather lack of practice in remembering to come from a place of compassion, gratitude, generosity. 4) The resulting higher-level symmetry or alignment can be annoying.
Given the subjectivity, I don't see how to adopt it for AI.
@owencb : I'm curious if this overlaps with your experience. Again, big respect for the post. The examples and connections shown are possibly outcomes of applying an idea of wholesomeness, and as a result not going against one's moral compass, in terms of relations to others and self, and examples are key to recognizing sources of tension. I'm trying to think of a good-to-wholesome example.