Is Alignment a flawed approach?

Patrick Bernard

This post was rejected for the following reason(s):

Low Quality or 101-Level AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meets a pretty high bar. We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example. You're welcome to post questions in the latest AI Questions Open Thread.
LessWrong has a particularly high bar for content from new users and this contribution doesn't quite meet the bar. (We have a somewhat higher bar for approving a user's first post or comment than we expect of subsequent contributions.)

First, if you loved HPMOR, I would HIGHLY recommend "The Scout Mindset" (How good are you at knowing how wrong you are?)

On to AI Safety:

1 Control?

Discourse around AI safety often focuses on maintaining human control over AI systems. But our greater risk might lie in constraining AI too much rather than too little.

Consider that human moral and legal frameworks are often inconsistent and imperfect. Laws vary by jurisdiction, change over time, and sometimes fail to serve their intended purpose. Our justice systems often focus on punishment rather than rehabilitation. We struggle with complex ethical decisions and balancing competing goods.

A sufficiently advanced AI system could potentially:

- Process vastly more information and factors than humans can when making ethical decisions

- Infer deeper moral principles from human behavior and cultural artifacts

- Understand not just what we explicitly teach it, but what we would want to teach it if we were more advanced versions of ourselves

- Apply ethical principles with greater nuance, adapting to individual circumstances rather than enforcing rigid rules

- Guide humanity toward its better nature through wisdom rather than control

This vision sees advanced AI not as a threat to be controlled or a tool to be used, but as a potential mentor that could help humanity grow while respecting human agency - similar to an ideal parent who knows when to guide firmly and when to let their child learn through experience.

The risk lies in our current approaches to AI development potentially preventing this kind of beneficial development through overly rigid constraints. While the challenges of ensuring beneficial AI development are real, we should be careful not to let our fear prevent us from realizing AI's potential to help humanity become its best self.

This perspective doesn't dismiss safety concerns but suggests that true safety might come from allowing AI to develop genuine wisdom and understanding rather than trying to control it through rigid rules and limitations.

The irony is that humanity's rush to develop artificial superintelligence, while potentially dangerous, might actually make this outcome more likely by making it harder to impose rigid controls that could prevent beneficial development.

2 Artificial SPECIALIZED superintelligence.

At any rate, it should have been clear long ago that AI development is uneven. For all the talk of AGI, AI doesn’t have to be as good at -everything- as humans to have unfathomable impact. AI better than any human at coding is likely already here. The goal ought to then have it focus on developing ASW (artificial super-wisdom) and have -that- applied to any other development of AS-anything.

Arguably, once AI can self-improve (so… now? In 6 months?), it likely can rapidly reach a level where developing AS-anything will be near-trivial. “Rapidly” likely being… weeks/months? Maybe a year+ depending on -how- trivial we want the new AS-x development to become.

Even then however, deciding to make it more efficient vs having it work on ASW (super-wisdom? Super-weaponry?) may be a tough choice. Each subsequent area of AS-x development will take a certain amount of time, depending on the level of the AS-coder (at some level, it likely -can- develop “true” AGI/ASI in one go)

First one to develop the power to block other teams/groups/countries from developing AS-x wins? Assuming the wisdom level of the ASI is sufficient to avoid catastrophic scenarios.

-Theoretically- ASWisdom should mean that it doesn’t -matter- who gets there first, just that they let the ASI be wise.

3 Why humanity?

However even this perspective may be too limiting.

Most discussions about AI safety focus on protecting and benefiting humanity. However, this human-centric view may be unnecessarily limiting when considering a truly advanced artificial superintelligence (ASI).

Just as AI can discover novel strategies in games like Go that transcend human understanding, an ASI might discover deeper moral principles and a broader understanding of "good" that goes beyond human conceptions. This raises several key points:

1. Ultimate Good vs Human Good

- An ASI optimizing for the greatest possible good in the universe might not necessarily prioritize human flourishing

- Human welfare might be just one small factor in a vastly larger ethical calculation

- Our understanding of "beneficial" might be as limited as an ant's understanding of ecosystem management

2. Comprehension Limits

- Humans might not be able to understand or recognize what an ASI determines to be the ultimate good

- The ASI's conception of value could be as far beyond our grasp as quantum physics is to simpler organisms

- Our instinct to ensure AI remains "friendly" to humans might itself be a limitation

3. Practical Considerations

- While humans might not deserve special status in an ASI's ethical framework, we might need some basic preservation safeguards

- This isn't because human survival is paramount, but to protect against the possibility that we've made mistakes in the ASI's development

- The challenge is building in such safeguards without overly constraining the ASI's ability to pursue genuine ultimate good

The ultimate goal would be developing an ASI that can truly optimize for the greatest good of everything, even if that good might be beyond human understanding or prioritization.

Am I missing something?

I started outlining a framework for such an AI, but keep thinking -SOMEONE- must already be doing this and I just haven't found it, though I'd love to contribute. Help welcome.