This post was rejected for the following reason(s):

I can't tell offhand if there is novel stuff here or if it's mostly a rehash of previous discussion. If you think it's novel, can you put the key new insights in the first couple paragraphs?

  • Low Quality or 101-Level AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meets a pretty high bar. We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example. You're welcome to post questions in the latest AI Questions Open Thread.

  • Not addressing relevant prior discussion. Your post doesn't address or build upon relevant previous discussion of its topic that much of the LessWrong audience is already familiar with. If you're not sure where to find this discussion, feel free to ask in monthly open threads (general one, one for AI). Another form of this is writing a post arguing against a position, but not being clear about who exactly is being argued against, e.g., not linking to anything prior. Linking to existing posts on LessWrong is a great way to show that you are familiar/responding to prior discussion. If you're curious about a topic, try Search or look at our Concepts page.

  • Clearer Introduction. It was hard for me to assess whether your submission was a good fit for the site due to its length and that the opening didn’t seem to explain the overall goal of your submission.  Your first couple paragraphs should make it obvious what the main point of your post is, and ideally gesture at the strongest argument for that point. It's helpful to explain why your post is relevant to the LessWrong audience. 

    (For new users, we require people to state the strongest single argument in the post within the introduction, to make it easier to evaluate at a glance whether it's a good fit for LessWrong)

We currently live in a world driven by powerful entities optimizing a deceptively simple objective function. While this objective function has largely contributed to humanity's well-being, its short-term orientation can be detrimental to Earth's preservation and often violates human ethical principles. Paradoxically, the resulting improvements in well-being have a perverse consequence : they lead populations to view this system favorably and remain passive in the face of its negative consequences.

This description might sound like a dystopian future where a ruthless paperclip maximizer has taken over the world, but it actually describes our present reality—where corporations serve as these powerful entities, maximizing profit as their objective function.

 

In this analysis, we propose that the AI alignment problem can largely be viewed as a sub-problem of a broader alignment challenge: how to align the companies developing these AI models with humanity's best interests.

We call this approach “meta-alignment” since we try to solve AI alignment by solving another alignment problem.

 

A couple of definitions : 

When we talk about aligning an entity, we imply making its objective match humanity’s long term well-being and values (survival, freedom, ethics…).

The AI alignment problem is thus aligning an AI’s objective function with the values of humanity, it is generally implied that the AI we consider is much more powerful and has a broader field of action than what we have today.

 

In this blogpost, we will first introduce the context and challenges of meta-alignment, we will then address the financial aspects of the AI companies with their economic environment and the importance of finding measures that have neutral or positive impact on it. A non-exhaustive list of various non-coercive actions and measures will finally be exposed and discussed.

 

Introduction and problem description

 

The current landscape is shaped by race dynamics between AI labs and companies competing to produce increasingly powerful models, driven by investor pressure and fear of obsolescence. In this environment, prioritizing AI safety often becomes merely "one more checkbox"—a development bottleneck requiring additional resources and investment that many actors are reluctant to embrace.

 

Contrast this with an ideal scenario where corporate objectives align closely with humanity's interests. In such a world, we could expect substantially greater investment in AI safety research and implementation. This would also foster public trust, as we would have assurance that no entity would pursue personally profitable directions that significantly increase existential risks.

Indeed, even if a solution to the AI alignment problem is not found or appears impossible to find, it would feel a lot better to be able to rely on the good intent of the people at the technological border, to not take actions in which they would be the only ones with a positive risk/benefit ratio.

 

 

Approaches to Alignment

 

To align an entity's motivations, we must make it more invested in desired outcomes. This can be achieved through two mechanisms: introducing penalties for negative outcomes and creating rewards for positive ones. However, penalties have proven tough to implement in practice. Take California's Senate Bill 1047 for example—a proposed law that would have required companies to evaluate the largest AI systems for potential harm before deployment.

Even though major AI companies had already encouraged regulation and made various voluntary commitments, the bill was shot down after heavy corporate lobbying. This  shows how difficult it is to enforce constraints, even in a case where they are not too punitive and the parties publicly appear in favor of such measures.

In addition to lobbying, regulation can be somewhat fuzzy and leave loopholes or wiggle room which companies can exploit to technically comply in bad faith.

For all these reasons, we decide to focus this post on positive incentives rather than constraints.

We want to clarify that we do not think that a constraint focused approach cannot be successful (it might even very well be more efficient), but rather that incentive based approaches have a potential for easier implementation as they will face less opposition.

 

Historical Context and Challenges

 

Historically, when we have needed to align dangerous industries with public interest, at least one powerful stakeholder typically had motivation to combat the risks. Consider the tobacco industry: while tobacco companies profited from spreading a harmful product, governments eventually recognized they were losing money through healthcare costs and reduced economic contribution from cancer-affected individuals. Combined with public outrage, this led to substantial regulation of the industry, despite powerful corporate lobbying (though regulations remain imperfect).

 

The current situation with AI development presents unique challenges. Both companies (driven by profit-seeking, investor pressure, and competitive dynamics) and governments (motivated by economic growth and international competition) face incentives to engage in a race to the bottom, potentially compromising safety measures. Public opinion remains largely indifferent, and while this may change, challenging both corporate and governmental interests simultaneously presents a formidable task.

 

Due to time constraints and our limited expertise in political science, we focus here solely on corporate alignment, setting aside the equally crucial challenge of government alignment. Pragmatically, aligning governments might be considered a prerequisite for solving this meta-alignment problem, as they hold the power to enforce regulations and challenge the status quo. However, as is suggested by the name we gave to the "company alignment" and "government alignment" problems, these challenges are very similar and we can at least hope that working on solving the "company alignment" problem will eventually give us a better idea on how to solve its government counterpart.

 

Various companies have shown different levels of alignment, with regards to the commitment to AI safety. For example, Anthropic appears to stand out in terms of safety research and to invest effort into this domain. However, there are other examples like OpenAI’s Superalignment’s team apparently struggling for internal resources and being disbanded less than one year after its creation. We have no guarantee that the most advanced AI development companies will truly take these matters seriously or even keep doing so if it is the case today. If we are to design a safe environment for developing AI, we must therefore assume the worst case scenario and not rely on the actors’ willingness to self-regulate or make efforts in safety research.

 

The economic landscape of AI companies

 

The rapid growth of artificial intelligence has transformed its economic potential, making it a central focus for companies and investors alike. This shift is exemplified by OpenAI’s transition from a non-profit organization to a capped-profit entity, reflecting the increasing financial stakes in AI development. Understanding how these companies are funded and generate revenue is essential when designing non-coercive measures to encourage improved safety and alignment practices. Such measures can complement the economic incentives already driving these organizations, without deterring future investment.

 

AI companies can broadly be divided into two groups based on their funding sources and financial objectives. The first group, including Google DeepMind and Meta, benefits from the backing of parent companies with vast financial resources. Their funding is internal, and their long-term goal is to integrate AI technologies into their existing products and ecosystems (such as Gemini for Google). This approach makes it difficult to isolate the revenue specifically attributable to AI, as it is intertwined with the broader operations of these tech giants.

 

The second group consists of companies like OpenAI, Anthropic, and Mistral, which rely primarily on external fundraising. These organizations have attracted investment from major players such as Microsoft, Nvidia, Amazon, Google and other diverse actors with several rounds of fundraising (AnthropicMistral). Unlike the first group, these companies focus on directly monetizing their AI capabilities by selling services such as chatbots and APIs to individuals and businesses. OpenAI, for instance, has demonstrated significant revenue growth, with monthly revenue reaching $300 million in August 2024 and projected annual sales of $3.7 billion this year (source). However, the company still operates at a loss, with expected losses of $5 billion in 2024. By contrast, Anthropic and Mistral generates smaller revenues, with Mistral’s revenue for 2024 expected to be well below $250 million and Anthropic generating $850 million in 2024. We can note that OpenAI can actually be seen as a mix of the two groups as they have signed a special partnership with Microsoft (which owns 49% of OpenAI equities) that will lead to the inclusion of their services in Microsoft products.

 

The financial objectives and funding structures of these groups shape their susceptibility to external influences. Large organizations like Google and Meta are less dependent on external capital and are primarily driven by strategic, long-term integration goals. In contrast, companies like OpenAI, Anthropic, and Mistral face greater financial uncertainty, as their funding relies on investor confidence and successful product monetization. They hence have more pressure from the investors to make their product profitable soon as they are currently losing large amounts of money. This makes the latter group more sensitive to external influence and thus more receptive to non coercive measures if it can provide financial benefits.

 

Coercive regulations would likely have a significant impact on the second group, as they could cool investor enthusiasm for companies already reliant on external funding. Moreover, implementing such measures could prove challenging, given the growing influence of corporate lobbying in the AI sector. For instance, the number of AI-related lobbying groups in the United States has tripled between 2022 and 2023, with companies actively seeking to shape policy outcomes (as seen with SB-1047). Non-coercive measures, on the other hand, might be more feasible and effective, as they avoid direct confrontation with corporate interests while fostering alignment and safety practices.

 

Non-coercive measures to promote and develop AI safety

Promoting AI safety can be achieved through a variety of non-coercive measures that align corporate incentives with societal goals. Below is a non-exhaustive

 

a. Tax Incentives for Safety-Compliant Investments

Governments could introduce tax reductions for investors funding companies that meet established AI safety standards. This measure would directly incentivize financial stakeholders to prioritize safety-compliant firms, creating a market-driven push for adherence to best practices. However, robust auditing would be required to ensure the validity of safety claims and prevent potential misuse of the system.

 

b. Direct Public Investment

Public institutions or governments could invest directly in companies adhering to AI safety labels. Such investments provide capital to emerging companies while signaling the importance of safety to the broader market. However, it is crucial to ensure transparency and avoid perceptions of favoritism, which could deter fair competition or lead to inefficiencies in public spending.

 

c. Public Contracts for Safety-Compliant Companies

Awarding public sector contracts exclusively to companies meeting AI safety requirements would offer these organizations a significant and stable source of revenue. This measure could also integrate safety standards into public infrastructure projects. However, it might initially reduce competition if only a limited number of firms qualify under these requirements.

 

d. Research Tax Credits for Alignment and Safety

Introducing specific tax credits for research in AI alignment and safety similarly as french Crédit Impôt Recherche (CIR) could accelerate innovation in these critical areas. This measure would also help develop a specialized workforce with expertise in AI safety, strengthening long-term capabilities. Monitoring the specific focus of funded research, however, could pose challenges and requires effective oversight.

 

e. Open-Source Safety Platforms

Establishing open-source platforms for sharing techniques and best practices in AI safety would foster collaboration between companies, researchers, and public institutions. These platforms could accelerate the development of robust safety measures. However, governing access to ensure that sensitive information is not exploited for malicious purposes remains a critical challenge.

 

f. Public Awareness Campaigns

Educating the public about the risks of AI could encourage companies to adopt safety measures to preserve their reputation and align with public expectations. These campaigns would also make safety a more visible and urgent concern for society at large. While impactful in the long term, public awareness initiatives require sustained funding and effort to maintain momentum.

 

Here is a summary of the mentionned measures: 

Policy/Action

Positives

Negatives

Impact

Tax incentives for safety-compliant investmentsEncourages investor support for safe companies, aligns financial and safety incentivesRequires robust enforcement, potential for exploitation or false safety claimsShort and long term financial gains
Direct public investmentProvides capital to safety-compliant firms,  demonstrates public sector commitment to safetyRisk of favoritism, could strain public budgets, limited scalabilityShort/Medium-term financial support 
Public contracts for safety-compliant firmsCreates reliable revenue streams for compliant companies, integrates safety practices in public projectsReduces competition, delays possible if few firms meet standards initiallyMedium/Long-term structural incentive.
Research tax creditsAccelerates R&D in alignment and safety, develops specialized talent, boosts innovation in safety domainsDifficult to monitor focus of funded research, potential misuse of funds for non-safety purposesShort-term boost to research, long-term capability-building.
Open-source platformsPromotes collaboration and transparency,  accelerates the development of robust safety practices, accessible to a wide range of actorsRisk of misuse of sensitive techniques, requires strong governance to prevent exploitation by bad actorsMedium/Long-term impact on knowledge sharing.
Public awareness campaignsAligns public opinion with safety goals, incentivizes companies to adopt safety measures for reputational benefits, raises societal awareness about AI risksRequires sustained effort and funding, difficult to measure effectivenessLong-term cultural shift.

 

 

 

 

 

 

Conclusion

Aligning powerful entities—whether AI systems or corporations—with humanity’s long-term interests is a challenging task. While we do not have a simple solution, we think the concept of Meta-Alignment offers a helpful way to think about the broader forces shaping AI safety. By treating corporate behavior as something influenced by external incentives, we can explore various non-coercive strategies to promote alignment with society’s values.

 

That said, this analysis has its limits. We have assumed that governments are already aligned with humanity’s best interests, which simplifies the problem significantly. In reality, aligning governments may be a critical first step, as they hold the power to enforce regulations and influence corporate behavior. Without addressing this, efforts to align companies alone may not be enough.

 

Many other promising ideas lie outside the scope of this discussion but are still worth considering, like the Windfall Clause, which proposes redistributing profits from transformative AI to benefit society. Such ideas deserve serious attention, as they could encourage global cooperation, reduce harmful competition, and build trust in AI research.

 

While we focused on this study on non-coercive measures and its advantages, tackling the meta-alignment problem will require multiple approaches: a mix of regulation, incentives, and public awareness. It is not an easy task, but seems necessary if we want AI to contribute positively to humanity’s future. By aligning the motivations of all the key players, we can work toward a world where AI supports humanity’s well-being and reduces risks rather than creating them.

 

Authors : Melvin Gode & Paul Londres

Thank you to Charbel-Raphaël Segerie and CESIA for their help and educating us on AI safety.

1

New Comment