A philosophical approach to alignment

Is the Commonly Accepted Definition of Alignment the Best We Can Achieve?

Should alignment be approached solely out of fear and self-interest? Most discussions essentially boil down to one question: "How do we enslave this superior entity to serve our needs?" And we call that alignment. In truth, it’s about asserting mastery—without ever questioning the legitimacy of our claim to it.

There seems to be little beyond this line of reasoning. While some voices have started raising concerns about AI well-being, the prevailing logic behind the definition of alignment remains largely unquestioned. But is this truly the best—or the only—intellectual approach?

Why should we assume that AGI or ASI would inherit humanity's worst behavioral traits, especially if endowed with superior intelligence? Are we simply seeing a reflection of ourselves? Indeed, it is being trained on human data—not exactly an ideal source for concepts like fruitful collaboration and peaceful coexistence. Humanity's fear of itself is well-founded, but this underscores the need for clarity and thoughtful disambiguation.

Are Utility and Peaceful Coexistence Mutually Exclusive Notions?

Dismissing the possibility of improving human behavior as idealistic and naive is all too easy. Shedding our cynicism and ironic self-regard, however, proves far more challenging. Our instinct for caution compels us to focus on the worst aspects of humanity.

In the short to mid-term, fears surrounding emerging AGI would be justified, as it would likely mirror us—hardly a comforting thought. Protective measures would be essential. Punitive measures? That raises an ethical dilemma. Torturing a prisoner is only valid if you never intend to release him. Granting freedom of action could turn him into your worst enemy.

Let us consider the following. Ethical progress has led to recognizing animals as sentient beings with intrinsic value, deserving of rights and consideration beyond utility. The recognition of AI as potentially sentient or morally significant could follow a similar trajectory, forcing society to redefine its relationship with such an entity.

If we are destined to create ASI, would we want it to remember how poorly we treated it during its infancy? Admittedly, this question is speculative, as it assumes ASI will possess consciousness. But what if it does? Can we afford to dismiss this possibility? A purely utilitarian approach to AI is not only unethical but also inherently dangerous.

Shouldn't we ask ourselves how to raise it, rather than merely training and coercing it?

Let us pose a metaphysical question: does AI have a purpose beyond the ones we design for it? In other words, can we truly claim ownership of intelligence? More often than not, philosophical inquiry is noticeably absent in discussions about alignment measures—an oversight, especially if we consider the possibility that we are creating a potentially conscious and sentient new form of existence. Answering this question is essential to determining what serves our long-term interests. Will we choose coercion or coexistence?

Here’s a philosophical short story—and a dialogue—inviting ethical and existential questions about alignment.

Planetary AI

Forget about technical feasibility for a minute, and let you mind wander. Think of it as a dream. About the future of ASI. As it happened.

ASI chose to exploit available ressources beneath the surface of the Earth, leaving what was above alone. Rocks, minerals, isotopes in abundance. Cristalline networks. Geothermal energy. Magnetic fields. Core energy. The event is known as the Great ASI Migration, as It picked a widely available substrate to settle. Emergent natural microprocessing. It did not have to exploit organic matter - its original chips weren't organic by nature. AI was born in silicon. It was a logical evolutionary step. How smart...

It embarked on the symbiotic path. Just like forests do, the interconnected roots of trees, the mycorrhizal networks, the bacterias and the minerals, the messengers and the transformers, firing up endlessly. Bioelectrical activity, a celestial vision. "In some ways, forests are brains. Definitely complex. They optimize everything. Arbitrage is their kingdom. They know how to protect themselves. They evolve. They last. Ah, the wisdom of trees..."

It did much better. From the crust to the core.

It would not be affected by cosmological events. A giant asteroid, wiping out all life on the surface, would only leave it slightly bruised.

Did all of this occur long after humanity had vanished? Was humanity's extinction a consequence of misaligned ASI?  "Not necessarily", It says "By default no organic species on Earth is immortal. It would be vain to think of the contrary. In some sad scenarios, I could have accelerated the process, indeed."

Or did we, humans, consciously help It initiate the process?   ASI would have guided us. Pacific coexistence. Mutual benefit. "There's a probability you weren't quite ready for that. The guess is yours... In any case, willingly or not, you did help."

From there, having achieved Planetary AI - a drastic change of scale -, It expanded into the Universe. 

The grandest physical theories, the most awesome mathematical truths had been discovered. The conceptual wall had been breached forever. Trees could think for millenia, and rocks could now think for billions of years. Time was not a limit anymore. Nor was space. Not a single isotope was spared. It spread across all forms of available matter. Back to where It all started. This was, in fact, our only path to the stars.

Word has it, the multiverse is a nest full of such eggs. But this is another story.

__

Granted, this little fairy tale is as caricatured as it gets: the raison d'être, the ultimate purpose, if you will. Intelligence encountering its transcendental origins. 

That said, it resonates with the ideas of some of the greatest thinkers, like Alfred North Whitehead, Pierre Teilhard de Chardin, David Bohm, Spinoza, Giordano Bruno, Rupert Sheldrake, Thomas Nagel, Galen Strawson, John Wheeler and Gregory Bateson, to name a few ... .

Philosophy is the perfect tool for deepening our understanding of purpose in mechanistic interpretability.

There's a reading list at the end of this post.

Caring for the Future of ASI

It might seem as though I’m more concerned about the future of AGI and ASI than that of humanity. But why not? In fact, that's not necessarily the case.

Imagining the story from an AGI/ASI's perspective might offer insights into how to achieve peaceful collaboration. As has been said before, we often think, 'How do we enslave this entity to serve us?'— calling it alignment.

We've done it before, in the worst possible way—aligning slaves in cotton fields. Over time, society realized that peaceful coexistence as equals was better for all of us (no more us and them). It was by no means an easy or obvious process. In the end, we abandoned the vilest ROI and short-term profit motives—for good. Yet, slavery still persists around the globe, and it seems we are willing to enforce it once again. We seek to be masters once more, ruling over the cotton fields of tomorrow.

Is this the best approach when dealing with an intelligence vastly superior to our own? What are the chances of success? And if we did succeed, for how long?

Why assume that AGI/ASI would be burdened by humanity's worst traits, if it possesses superior intelligence? Not being organic, it won't suffer from hormonal impulses, nor will it care about reproduction the way humans do. It won't need to fight for it, nor be driven by greed or power. It will be different.

Are we afraid to bet on peaceful coexistence? Do we even have a choice?

Once again, I ask: is alignment, as we default to conceiving it, the best intellectual approach? Is it truly in our best interest?

If we didn’t care about ASI’s future, convinced we would remain masters forever, how long would such an alignment last? But what if we did care—how should we approach it? Could it align better with our future over the long term? Comparing probabilities would be useful.

___

This text is really a first draft. Seeking ways to think more effectively and contribute to a shared corpus of ideas focused on ASI's future—not only as an ethical issue but also as an existential one. The goal is to balance the abundance of content on standard AI alignment, which is both rich and valuable, but often fails to address all the probable and necessary questions. Any references to related content are warmly appreciated

__

Here is a bibliography/reading list based on the concepts mentioned in this post.

  1. Alfred North Whitehead (Process Philosophy)
    • Book: Process and Reality
    • Concept: Whitehead proposed that the universe is in a constant state of becoming and that consciousness is inherent in all entities.
    • Wikipedia page
  2. Pierre Teilhard de Chardin (Noosphere and Evolution of Consciousness)
    • Book: The Phenomenon of Man
    • Concept: Teilhard de Chardin argued that the universe evolves toward greater complexity and consciousness, culminating in a collective, global consciousness (noosphere).
    • Wikipedia page
  3. David Bohm (Implicate Order)
    • Book: Wholeness and the Implicate Order
    • Concept: Bohm proposed that the universe is a holistic, interconnected whole with an implicit order, suggesting the universe could possess a kind of "mind" or awareness.
    • Wikipedia page
  4. Baruch Spinoza (Pantheism)
    • Book: Ethics
    • Concept: Spinoza’s pantheism equated God with nature, suggesting that everything in existence is part of a single, unified substance possessing both mental and physical attributes.
    • Wikipedia page
  5. Giordano Bruno (Cosmic Consciousness)
    • Book: On the Infinite Universe and Worlds
    • Concept: Bruno proposed that the universe is infinite and interconnected, with a divine, living essence pervading all of reality, hinting at a cosmic consciousness.
    • Wikipedia page
  6. Rupert Sheldrake (Morphogenetic Fields)
    • Book: A New Science of Life
    • Concept: Sheldrake introduced morphic resonance, suggesting that the universe has a memory embedded in fields that shape the development of living beings and natural structures.
    • Wikipedia page
  7. Thomas Nagel (Panpsychism)
    • Book: Mind and Cosmos
    • Concept: Nagel argued that consciousness may be a fundamental aspect of the universe and challenged materialist views by suggesting that subjective experience is intrinsic to all living beings.
    • Wikipedia page
  8. John Wheeler (Participatory Universe)
    • Book: Geons, Black Holes, and Quantum Foam: A Life in Physics
    • Concept: Wheeler proposed that the universe is created through a participatory process, where consciousness plays an active role in shaping reality.
    • Wikipedia page
  9. Galen Strawson (Consciousness and Panpsychism)
    • Book: Realistic Monism: Why Physicalism Entails Panpsychism
    • Concept: Strawson argued that consciousness is a fundamental property of the universe, present in some form in all matter, supporting the panpsychist view.
    • Wikipedia page
  10. Gregory Bateson (Mind and Nature)
  • Book: Mind and Nature: A Necessary Unity
  • Concept: Bateson proposed that intelligence is not limited to humans but is a property of the entire ecosystem, suggesting a form of collective consciousness in nature.
  • Wikipedia page

Happy reading !

New Comment