Embracing complexity when developing and evaluating  AI responsibly

Aliya Amirova

Complex Intervention Development and Evaluation Framework: A Blueprint for Ethical and Responsible AI Development and Evaluation

The rapid evolution of artificial intelligence (AI) presents significant opportunities, but it also raises serious concerns about its societal impact. Ensuring that AI systems are fair, responsible, and safe is more critical than ever, as these technologies become embedded in various aspects of our lives. In this post, I’ll explore how a framework used for developing and evaluating complex interventions (1:2) — common in public health and social sciences—can offer a structured approach for navigating the ethical challenges of AI development.

I used to believe that by applying first principles and establishing a clear set of rules, particularly in well-defined environments like games, we could guarantee AI safety. However, when I encountered large language models (LLMs) that interact with vast bodies of human knowledge, I realised how limited this view was. These models function in the "wild" world characterised by dynamic rules, fluid social contexts, and hard to predict human behaviour and interactions. This complexity calls for approaches that embrace uncertainty and acknowledge the messiness of real-world environments. This realisation led me to draw from methods used in social science and public health, particularly those related to complex interventions designed to influence behaviour and create change (1;2).

Some AI systems, such as Gemini, ChatGPT, Llama, much like complex interventions, are developed to interact with humans as agents and potentially influence human behaviour in dynamic, real-world contexts. First principles alone are insufficient to ensure fairness and safety in these systems. Foreseeing and eliminating bias in such a complex setting poses a challenge. It is inefficient to pinpoint biases in a complex system in a top down manner. Instead, we need rigorous, evidence-based approaches that place stakeholder acceptability and complex system design at its heart. A few methodologies are introduced in medicine and healthcare. These approaches (1;2) have the capacity to systematically assess the knowns, set out the uncertainties and unknowns of how these systems interact with the world.

Complex interventions are designed to bring about change within intricate complex systems. They often involve multiple interacting components at various levels of organisation or analysis (individual, systemic, population-based etc) and aim to influence behaviour of the system and subsequently targeted outcomes. Examples include public health programs (e.g., five a day fruit and vegetable programme, tobacco ban), educational reforms (e.g., the introduction of a curriculum), or social policy initiatives (e.g., universal basic income programs). Developing and evaluating these interventions requires a holistic approach that takes into account context, stakeholder engagement, and iterative refinement. From the outset, the needs and constraints of the target group are considered, ensuring the intervention is not only safe with no or minimal harm but is feasible, acceptable to the stakeholders, and is set out to achieve stakeholder preferred outcomes.

1. Applying the Complex Intervention Framework to AI

AI systems, particularly those like LLMs, share many characteristics with complex interventions. They operate within layered social and cultural contexts, interact with diverse individuals and communities, and might influence decision-making processes. Therefore, the principles and best practices for complex interventions can be adapted to inform responsible AI design and implementation. Here are several key considerations for applying this framework to AI systems’ development:

1.1. Contextual Awareness

AI systems must be developed with a deep understanding of the social, cultural, and political contexts in which they operate. This includes acknowledging potential biases, interpersonal dynamics, and ethical concerns that vary depending on the specific context.

Example from Complex Interventions: A smoking cessation program tailored for Indigenous communities might need to consider the cultural significance of tobacco and incorporate traditional healing practices (Cultural Context).
Example in AI: An AI model used for loan applications must recognize the historical biases in lending practices and ensure that it does not perpetuate those inequalities (Historic Context).

1.2. Underlying Theory/Mechanism of Action for AI

Just as with complex interventions, AI systems should have a well-defined "program theory" that outlines how the AI will function, interact with its environment, and influence user behaviour. This theory should be transparent and accessible to stakeholders.

Example from Complex Interventions: A childhood obesity reduction program might integrate education about healthy eating, increased physical activity, and parental involvement as part of its theory of change, relying on exciting theories and models outlining the expected mechanism of action (eg., Social Learning Theory).
Example in AI: An AI-driven personalised learning system might rely on a theory that adapts educational content and pacing to meet the unique needs and learning styles of individual students.

1.3. Stakeholder Engagement

Involving diverse stakeholders—including representatives from affected communities, as well as social scientists —is critical for developing AI systems in an inclusive and responsible way. This engagement ensures that different perspectives are incorporated, promoting fairness and equity.

Example from Complex Interventions: A community-based program to improve vegetable and fruit intake might include input from residents, schools, and social service agencies.
Example in AI: Engagement across all levels and stakeholders is crucial. For example, in the development of an AI-powered tool for predictive policing, input from law enforcement, civil rights groups, local communities, legal experts, and policymakers would be essential. This broad engagement ensures the system is not only technically sound but also ethically aligned, transparent, and considerate of the potential social impacts, particularly on vulnerable communities.

1.4. Addressing Uncertainties

AI systems, like complex interventions, operate in uncertain environments and can have unintended consequences. Ongoing monitoring and evaluation are essential to identify and mitigate these risks.

Example from Complex Interventions: A hospital readmission reduction program might describe and summarise uncertainties in the empirical evidence through means of rigorous synthesis, and require real-time monitoring to prevent unforeseen adverse outcomes.
Example in AI: An AI system used in hiring decisions should be continually monitored to ensure it does not inadvertently introduce unforeseen bias in the recruitment process.

1.5. Iterative Refinement

AI systems should be designed for continuous improvement based on real-world feedback, data, and evaluation. This iterative process ensures that the system adapts over time and remains aligned with ethical standards and societal needs.

Example from Complex Interventions: A program promoting healthy ageing might be revised based on participant feedback and new evidence regarding effective strategies.
Example in AI: A customer service chatbot should evolve over time by incorporating user feedback and refining its responses to enhance effectiveness.

1.6. Feasibility and Acceptability

AI systems, like complex interventions, must be feasible and acceptable to the target audience. This requires careful consideration of technical, economic, social, and ethical factors.

Ensuring that AI systems are both feasible and acceptable is critical to their success and responsible deployment. These considerations echo those found in the design of complex interventions, where practical constraints and user acceptance shape outcomes. Feasibility encompasses technical, economic, and operational dimensions. AI systems must be implementable given current technological capabilities, scalable within resource limits, and sustainable over time.

1.6.1. Technical Feasibility: Can the system be built and deployed effectively? Can the AI system be reliably deployed with available infrastructure and data?

Example in Complex Intervention: A telehealth program in rural areas may face challenges if local internet infrastructure is inadequate.
Example in AI: An AI-powered diagnostic tool requires access to high-quality medical data, as well as sufficient computing power to ensure accurate predictions.

1.6.2. Economic Feasibility: Is the system cost-effective and sustainable?

Example in Complex Intervention: A health intervention must be deliverable within a constrained budget, ensuring that the program can reach a wide population without excessive costs.
Example in AI: Is the cost of developing and maintaining an AI system for traffic management manageable for a municipality? Autonomous driving technology must be affordable enough to justify its large-scale deployment, both for manufacturers and consumers, making it accessible to different economic groups.

1..6.3. Social Acceptability: Will the system be trusted and embraced by its users? Will users trust an AI-powered virtual assistant to handle sensitive personal information?

Example in Complex Intervention: A new educational curriculum will only succeed if teachers and students accept it as a useful tool. Similarly, a walking group intervention designed for people living with arthritis will only be effective if the intensity, mode, format, and context are acceptable for someone living with arthritis.
Example in AI: If a chatbot designed for mental health support refuses to respond to certain sensitive queries due to safety restrictions, users might become frustrated or lose trust in the system, perceiving it as unhelpful or overly restrictive in critical moments.

1.6.4. Equity: Ensuring equitable access is essential.

Example in Complex Intervention: For instance, when advising on integrating a treadmill in a home environment for physical therapy or fitness, considerations must include cost, space requirements, and the economic capacity of users, ensuring that recommendations are practical and accessible to those from lower-income households.
Example in AI: a health monitoring application that provides personalised recommendations. For equitable access, the app should be available on low-cost smartphones and offer offline functionalities to accommodate users with limited internet access.

1.7. Economic and Societal Impact

The broader economic and societal impact of AI must be carefully evaluated, especially with regard to job displacement, privacy concerns, and the equitable distribution of benefits and risks.

Example from Complex Interventions: A welfare program might need to assess how it impacts employment rates and social mobility.
Example in AI: The widespread adoption of autonomous vehicles could have far-reaching effects on jobs in transportation and urban planning.

2. Evidence Synthesis and Rigorous Evaluation

For responsible AI development, rigorous evaluation is as crucial as it is for complex interventions. This involves designing robust evaluation protocols and establishing an evidence base for best practices.

2.1. Designing Evaluation Protocols and Building an Evidence Base

AI systems should be assessed using rigorous methodology (eg., RCTs, meta-analysis of robust evidence), and real-world outcome measures. Use real-world testing environments, control groups, and medical research methods, as well as those traditionally used in epidemiology.

AI safety and ethics should be informed by systematic reviews, empirical research, and collaboration between academia, industry, and other stakeholders, while adhering to best practices for transparent reporting. In medical research as well as healthcare research,, examples of frameworks for transparent and clear reporting include: EQUATOR (Enhancing the Quality and Transparency of Health Research), CONSORT (Consolidated Standards of Reporting Trials), and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). These frameworks help ensure that research findings are transparent, reproducible, and methodologically sound, and similar reporting standards should be adapted for AI safety research to maintain integrity and clarity in evaluation protocol of AI systems.

2.2. Bias Auditing

Regularly auditing AI models for bias, diversifying data sources, and monitoring AI-human interactions can help mitigate biases. Bias Auditing and Ongoing Monitoring (eg., work by Joy Buolamwini): Regular audits should be conducted to assess for bias, ensuring that AI models are fair and inclusive.

3. Complex Adaptive Systems in AI: Key Properties

3.1. Emergence

Emergent properties in AI systems are often unanticipated outcomes that arise from complex interactions between the system’s components and its environment. As in social interventions, this makes predicting all outcomes difficult, particularly as AI systems interact with a vast and varied user base.

Example in Complex Intervention: Group-based interventions for at-risk youth might inadvertently increase exposure to risky behaviours as social dynamics shift within the group (eg., deliberating online groups of adolescents living with anorexia or bulimia).
Example in AI: An AI system designed for content moderation could lead to unforeseen bias when deployed in different cultural contexts, where it misinterprets certain social cues or local norms, resulting in harmful exclusions or misclassifications.

To address emergence, AI developers must consider the system’s potential interactions with its social environment and remain vigilant about unintended outcomes. This necessitates a flexible approach to refinement, allowing the system to evolve based on real-world feedback.

3.2. Feedback Loops

Feedback occurs when a change in one part of the system influences other parts, either reinforcing or mitigating the original change. For AI systems, feedback mechanisms can influence user behaviour in ways that amplify both positive and negative outcomes.

Example in Complex Intervention: An alcohol ban reduces the visibility and social acceptability of drinking, creating a feedback loop that encourages more people to quit. However, it also encouraged underground binge drinking.
Example in AI: Recommendation algorithms on social media platforms create feedback loops by promoting content based on user engagement, which can inadvertently amplify polarising or extremist content if not carefully managed.

Designing responsible AI requires understanding and managing these feedback loops. Developers should regularly audit how AI systems influence user behaviour and be prepared to adjust algorithms to avoid reinforcing harmful patterns.

3.3. Adaptation

AI systems, like any part of a complex adaptive system, must be capable of adjusting to changes in the environment, whether those are technical, social, or regulatory shifts.

Example in Complex Intervention: Retailers adapted to a ban on multi-buy alcohol discounts by reducing the prices of individual products, maintaining competitiveness while complying with new regulations. Push notifications via phone targeted at early stages to initiate smoking cessation are different to those targeting maintenance of behaviour or relapse prevention.
Example in AI: AI systems used for legal case evaluations might need to adapt to changes in law or policy, adjusting their decision-making processes to remain in compliance with updated standards.

Adaptation in AI systems requires developers to anticipate the ways in which both users and other stakeholders may respond to the technology. By maintaining flexibility in design and deployment, AI systems can evolve alongside changing societal expectations or regulations.

3.4. Self-Organisation

Self-organisation refers to the spontaneous order that arises from local interactions, rather than from top-down control. In AI, this can manifest in decentralised systems or in how users co-opt technologies for new purposes.

Example in Complex Intervention: The creation of charity and patient groups represents a self-organised effort, which emerges organically in response to unmet needs within formal health systems. This may lead to positive health outcomes but also lead to the spread of medical misinformation.
Example in AI: Open-source AI platforms allow communities to self-organise around development, troubleshooting, and innovation. This may lead to expected positive outcomes and unexpected harms (eg., capabilities being taken advantage of by adversarial players).

AI developers can foster beneficial self-organisation by providing flexible, open systems that enable communities to contribute, iterate, and innovate. However, this must be balanced with governance mechanisms to prevent harmful or unethical uses.

Conclusion

The framework for complex intervention development and evaluation provides a valuable blueprint for managing the ethical and practical challenges posed by AI. By embracing principles such as contextual awareness, stakeholder engagement, iterative refinement, stakeholder acceptability and rigorous evaluation, we can guide the development of AI systems that are not only innovative but also fair, responsible, and safe for all.

Consideration of complex adaptive systems—emergence, feedback, adaptation, and self-organisation—offer valuable paradigm for the responsible development of AI. When combined with a focus on feasibility and acceptability, these principles help ensure that AI systems are not only technically sound but also socially and ethically aligned with the needs of the communities they serve. By adopting this comprehensive approach, we can create AI systems that are adaptive, equitable, and sustainable in the long term.

1. Skivington K, Matthews L, Simpson SA, Craig P, Baird J, Blazeby JM, et al. A new framework for developing and evaluating complex interventions: update of Medical Research Council guidance. BMJ. 2021 Sep 30;374:n2061.

2. Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M, et al. Developing and evaluating complex interventions: The new Medical Research Council guidance. BMJ. 2008 Sep 29;337:a1655.

I did wonder if this was AI written before seeing the comment thread. It takes a lot of effort for a human to write like an AI. Upvoted for effort.

I think this also missed the mark with the LW audience because it is about AI safety, which is largely separate from AGI alignment. LW is mostly focused on the second. It's addressing future systems that have their own goals, whereas AI safety addresses tool systems that don't really have their own goals. Those issues are important; but around here we tend to worry that we'll all die when a new generation of systems with goals are introduced, so we mostly focus on those. There is a lot of overlap between the two, but that's mostly in technical implementations rather than choosing desired behavior.

The main point I am trying to make is that AGI risks cannot be deduced or theorised solely in abstract terms. They must be understood through rigorous empirical research on complex systems. If you view AI as an agent in the world, then it functions as a complex intervention. It may or may not act as intended by its designer, it may or may not deliver preferred outcomes, and it may or may not be acceptable to the user. There are ways to estimate uncertainty in each of these parameters through empirical research. Actually, there are degrees to which it acts as intended, degrees to which it is acceptable, and so on. This calls for careful empirical research and system-level understanding.

I write academic papers in healthcare, psychology, and epidemiology for peer-review. I don't write blog posts every day, so thank you for your patience with this particular style, which was devised for guidelines and frameworks.

Thank you for sharing your thoughts on AI alignment, AI safety, and imminent threats. I posted this essay to demonstrate how public health guidelines and system thinking can be useful in preventing harm, inequality, and avoiding unforeseen negative outcomes in general. I wanted the LessWrong audience to gain perspectives from other fields that have been addressing rapidly emerging innovations—along with their benefits and harms—for centuries, with the aim of minimising risk and maximising benefit, keeping the wider public in mind.

I am aware of the narrative around the 'paperclip maximiser' threat. However, I believe it is important to recognise that the risks AI brings should not be viewed in the context of a single threat, a single bias, or one path to extinction. AI is a complex system, used in a complex setting—the social structure. It should be studied with due rigour, with a focus on understanding its complexity.

If you can suggest literature on AGI alignment that recognises the complexity of the issue and applies systems thinking to the problem, I would be grateful.

@Mitchell_Porter What made you think that I am not a native English speaker and what made you think that this post was written by AI?

@RobertM @Mitchell_Porter.

I guess the standardised language for framework development fails Turing Test.

The title is the play on words, merging the title of the guidelines authored by The Medical Research Council — "Complex Intervention Development and Evaluation Framework" — (1) and The Economic Forum for AI — "A Blueprint for Equity and Inclusion in Artificial Intelligence" (2). The blog I wrote closely follows the standardised structure for frameworks and guidelines, with specific subheadings that are easy to quote.

"Addressing Uncertainties" is a major requirement in the iterative process of development and refinement of complex intervention. I did not come up with it; it is an agreed-upon requirement in high-risk health application and research.

Would you like to engage with the content of the post? I thought LessWrong is about engaging in a debate where people learn and attempt to reach consensus.

My apologies. I'm usually right when I guess that a post has been authored by AI, but it appears you really are a native speaker of one of the academic idioms that AIs have also mastered.

As for the essay itself, it involves an aspect of AI safety or AI policy that I have neglected, namely, the management of socially embedded AI systems. I have personally neglected this in favor of SF-flavored topics like "superalignment" because I regard the era in which AIs and humans have a coexistence in which humans still have the upper hand as a very temporary thing. Nonetheless, we are still in that era right now, and hopefully some of the people working within that frame, will read your essay and comment. I do agree that the public health paradigm seems like a reasonable source of ideas, for the reasons that you give.

Not Mitchell, but at a guess:

LLMs really like lists
Some parts of this do sound a lot like LLM output:
- "Complex Intervention Development and Evaluation Framework: A Blueprint for Ethical and Responsible AI Development and Evaluation"
- "Addressing Uncertainties"
Many people who post LLM-generated content on LessWrong often wrote it themselves in their native language and had an LLM translate it, so it's not a crazy prior, though I don't see any additional reason to have guessed that here.

Having read more of the post now, I do believe it was at least mostly human-written (without this being a claim that it was at least partially written by an LLM). It's not obvious that it's particular relevant to LessWrong. The advice on the old internet was "lurk more"; now we show users warnings like this when they're writing their first post.

(edit: looks like I spoke too soon and this essay is 100% pure, old-fashioned, home-grown human)

This appears to be yet another post that was mostly written by AI. Such posts are mostly ignored.

This may be an example of someone who is not a native English speaker, using AI to make their prose more idiomatic. But then we can't tell how many of the ideas come from the AI as well.

If we are going to have such posts, it might be useful to have them contain an introductory note, that says something about the process whereby they were generated, e.g. "I wrote an outline in my native language, and then [specific AI] completed the essay in English", or, "This essay was generated by the following prompt... this was the best of ten attempts", and so on.

Hey, be civil! That is not nice. I am a human, I did not use AI.