Nora_Ammann

Current:

Past: 

  • Director and Co-Founder of "Principles of Intelligent Behaviour in Biological and Social Systems" (pibbss.ai)
  • Research Affiliate and PhD student with the Alignment of Complex Systems group, Charles University (acsresearch.org)
  • Programme Manager, Future of Humanity Institute, University of Oxford
  • Research Affiliate, Simon Institute for Longterm Governance

Sequences

The Value Change Problem (sequence)
Thoughts in Philosophy of Science of AI Alignment

Wiki Contributions

Comments

Sorted by

[Edited a bit for clafity]
 

(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)

Ryan -  I appreciate the donation, and in particular you sharing your reasoning here. 

I agree with a lot of what you write. Especially "connectors" (point 2) and bringing in relatively more senior academics from non-CS/non-ML fields (point 3) are IMO things that are valuable and PIBBSS has a good track record for delivering on. 

Regarding both your point 1 and reservation 1 (while trying to abstract a bit from the fact that terms like 'blue sky' research are somewhat fuzzy,  and that I expect at least some parts of a disagreement here might turn out to disappear when considering concrete examples), I do think there has been some change in PIBBSS'  research thinking & prioritization which has been unfolding in my head since summer 2023, and finding its way more concretely into PIBBSS' strategy since the start of 2024. Lucas is the best person to talk to this in more detail, but I'll still share a few thoughts that were on my mind back when I was still leading PIBBSS. 

I continue to believe that there is a lot of value to be had in investigating the underlying principles of intelligent behaviour (what one might refer to as blue sky or basic research), and to do so with a good dose of epistemic pluralism (studying such intelligent behaviour from/across a range of different systems, substrates and perspectives). I think this is a or the core aspect of the PIBBSS spirit. However, after the first 2 years of PIBBSS, I also wasn't entirely happy with our concrete research outputs. I thought a lot about what's up here (first mainly by myself, and later together with Lucas) and about how to do better - all the while staying true to the roots/generators/principles of PIBBSS, as I see them. 

One of the key axis of improvement we've grown increasingly confident about is what we sometimes refer to as "bridging the theory-practice gap". I'm pretty bullish on theory(!) -- but theory alone is not the answer.  Theory on its own isn't in a great position to know where to go next/what to prioritise, or whether it's making progress at all. I have seen many theoretical threads that felt intriguing/compelling, but failed to bottom out in something tangibly productive because they were lacking feedback loops that help guide them, and that would force them to operationalise abstract notions into something of concrete empirical and interventionist value. (EDIT: 'interventionist' is me trying to point to when a theory is good enough to allow you to intervene in the world or design artifacts in such a way that they reliably lead to what you intended.)

This is not an argument against theory, in my eyes, but an argument that theorizing about intelligent behaviour (or any given phenomena) will benefit a lot from having productive feedback loops with the empirical.  As such, what we've come to want to foster at PIBBSS is (a) ambitious, 'theory-first' AI safety research, (b) with an interdisciplinary angle, (c) that is able to find meaningful and iterative empirical feedback loops. It's not so much that a theoretical project that doesn't yet have a meaningful way of making contact with reality should be disregarded -- and more than a key source of progress for said project will be to find ways of articulating/operationalising that theory, so that it starts making actual (testable) predictions about a real system and can come to inform design choices.

These updates in our thinking led to a few different downstream decisions. One of them was trying to have our fellowship cohort have include some empirical ML profiles/projects (I endorse roughly a 10-15% fraction, similar to what Joseph suggests). Reasons for this are, both, that we think this work is likely to be useful, and also because it changes (and IMO improves) the cohort dynamics (compared to, say, 0-5% ML). That said, I agree that once going above 20%, I would start to worry that something essential about the PIBBSS spirit might get lost, and I'm not excited about that from an ecosystem perspective (given that e.g. MATS is doing what it's doing). 

Another downstream implication (though it's somewhat earlier days for PIBBSS on that one) is that I've become pretty excited about trying to help move ambitious ideas from (what I call) an 'Idea Readiness Level' (IDL; borrowing from the notion of 'Technology Readiness Levels') 1-3, to an IDL of 5-7. My thinking here is that once an idea/research agenda is at IDLs 5-7, it typically has been able to enter the broader epistemic discourse, it has some initial legible evidence/output it can rely on to making its own case -- and at that point I would say it no longer is in the area where PIBBSS has the greatest comparative advantage to support it. On the other hand, I think there isn't much (if any) obvious places where IDL 1-3 ideas get a chance to get iterate on quickly and stress-tested to develop into a more mature & empirically grounded agenda. (I think the way we were able to support Adam Shai & co in developing the computational mechanics agenda is a pretty great example of this use case I'm painting here -though notably their ideas were already relatively mature compared to other things PIBBSS might ambitiously help to mature.) 

I'd personally be very excited for a PIBBSS that becomes excellent at filling that gap, and think PIBBSS has a bunch of the necessary ingredients for that already. I see this as a potentially critical investment in medium term robustness of the research ecosystem, and into what i think is an undeniable need to come to base AI Safety on rigorous scientific understanding. (Though notably Lucas/Dusan, PIBBSS' new leadership, might disagree & should have a chance to speak for themselves here.) 

Yes, we upload them to our Youtube account modulo the speaker agreeing to it. The first few recordings from this series should be uploaded very shortly. 

While I don't think it's so much about selfishness as such, I think this points at something important, also discussed eg here: The self-unalignment problem

Nora_AmmannΩ110

Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.

I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to /doesn't contract anything you say (note: I only skimmed the post so might have missed some relevant detail, sorry!), but I wanted to check whether, even if not conflicting, it misses something you think is or might be important somehow.

Yeah, would be pretty keen to see more work trying to do this for AI risk/safety questions specifically: contrasting what different lenses "see" and emphasize, and what productive they critiques they have to offer to each other. 

Over the last couple of years, valuable progress has been made towards stating the (more classical) AI risk/safety arguments more clearly, and I think that's very productive for leading to better discourse (including critiques of those ideas). I think we're a bit behind on developing clear articulations of the complex systems/emergent risk/multi-multi/"messy transitions" angle on AI risk/safety, and also that progress on this would be productive on many fronts.

If I'm not mistaken there is some work on this in progress from CAIF (?), but I think more is needed. 

To follow up on this, we'll be hosting John's talk on Dec 12th, 9:30AM Pacific / 6:30PM CET

Join through this Zoom Link.

Title: AI would be a lot less alarming if we understood agents

Description:  In this talk, John will discuss why and how fundamental questions about agency - as they are asked, among others, by scholars in biology, artificial life, systems theory, etc. - are important to making progress in AI alignment. John gave a similar talk at the annual ALIFE conference in 2023, as an attempt to nerd-snipe researchers studying agency in a biological context.

--

To be informed about future Speaker Series events by subscribing to our SS Mailing List here. You can also add the PIBBSS Speaker Events to your calendar through this link.
 

I have no doubt Alexander would shine!

Happy to run a PIBBSS speaker event for this, record it and make it publicly available. Let me know if you're keen and we'll reach out to find a time.

FWIW I also think the "Key Phenomena of AI risk" reading curriculum (h/t TJ) does some of this at least indirectly (it doesn't set out to directly answer this question, but I think a lot of the answers to the question are comprise in the curriculum). 

(Edit: fixed link)

How confident are you about it not having been recorded? If not very, seems props worth checking again

Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular: 

  • an "internalist" notion, where 'scheming' is defined via the "system's internals", i.e. roughly: the system has goal A, acts as if it has goal B, until the moment is suitable to reveal it's true goal A.
  • an "externalist" notion, where 'scheming' is defined, either, from the perspective of an observer (e.g. I though the system has goal B, maybe I even did a bunch of more or less careful behavioral tests to raise my confidence in this assumption, but in some new salutation, it gets revealed that the system pursues B instead)
  • or an externalist notion but defined via the effects on the world that manifest (e.g. from a more 'bird's-eye' perspective, we can observe that the system had a number of concrete (harmful) effects on one or several agents via the mechanisms that those agents misjudged what goal the system is pursuing (therefor e.g. mispredicting its future behaviour, and basing their own actions on this wrong assumption)

It seems to me like all of these notions have different upsides and downsides. For example:

  • the internalist notion seems (?) to assume/bake into its definition of scheming a high degree of non-sphexishness/consequentialist cognition
  • the observer-dependent notion comes down to being a measure of the observer's knowledge about the system 
  • the effects-on-the-world based notion seems plausibly too weak/non mechanistic to be helpful in the context of crafting concrete alignment proposals/safety tooling
Load More