All of Nora_Ammann's Comments + Replies

Complex systems research as a field (and its relevance to AI Alignment)

Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.

I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to... (read more)

2TsviBT2y

I'm not sure I understand your question at all, sorry. I'll say my interpretation and then answer that. You might be asking: My answer is no, that doesn't sum up the essay. The essay makes these claims: 1. There many different directions in conceptspace that could be considered "more foundational", each with their own usefulness and partial coherence. 2. None of these directions gives a total ordering that satisfies all the main needs of a "foundational direction". 3. Some propositions/concepts not only fail to go in [your favorite foundational direction], but are furthermore circular; they call on themselves. 4. At least for all the "foundational directions" I listed, circular ideas can't be going in that direction, because they are circular. 5. Nevertheless, a circular idea can be pretty useful. I did fail to list "functional" in my list of "foundational directions", so thanks for bringing it up. What I say about foundational directions would also apply to "functional".

Nora_Ammann2y10

Yeah, would be pretty keen to see more work trying to do this for AI risk/safety questions specifically: contrasting what different lenses "see" and emphasize, and what productive they critiques they have to offer to each other.

Over the last couple of years, valuable progress has been made towards stating the (more classical) AI risk/safety arguments more clearly, and I think that's very productive for leading to better discourse (including critiques of those ideas). I think we're a bit behind on developing clear articulations of the complex systems/... (read more)

Nora_Ammann2y40

To follow up on this, we'll be hosting John's talk on Dec 12th, 9:30AM Pacific / 6:30PM CET.

Join through this Zoom Link.

Title: AI would be a lot less alarming if we understood agents

Description: In this talk, John will discuss why and how fundamental questions about agency - as they are asked, among others, by scholars in biology, artificial life, systems theory, etc. - are important to making progress in AI alignment. John gave a similar talk at the annual ALIFE conference in 2023, as an attempt to nerd-snipe researchers studying agency in a b... (read more)

2Alex_Altair2y

FYI this link redirects to a UC Berkeley login page.

Nora_Ammann2y84

I have no doubt Alexander would shine!

Happy to run a PIBBSS speaker event for this, record it and make it publicly available. Let me know if you're keen and we'll reach out to find a time.

Nora_Ammann2y*90

FWIW I also think the "Key Phenomena of AI risk" reading curriculum (h/t TJ) does some of this at least indirectly (it doesn't set out to directly answer this question, but I think a lot of the answers to the question are comprise in the curriculum).

(Edit: fixed link)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Nora_Ammann2y30

How confident are you about it not having been recorded? If not very, seems props worth checking again

6rorygreig2y

The workshop talks from the previous year's ALIFE conference (2022) seem to be published on YouTube, so I'm following up with whether John's talk from this year's conference can be released as well.

5johnswentworth2y

I mean, I could always re-present it and record if there's demand for that. ... or we could do this the fun way: powerpoint karaoke. I.e. you make up the talk and record it, using those slides. I bet Alexander could give a really great one.

Nora_Ammann2yΩ250

Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular:

an "internalist" notion, where 'scheming' is defined via the "system's internals", i.e. roughly: the system has goal A, acts as if it has goal B, until the moment is suitable to reveal it's true goal A.
an "externalist" notion, where 'scheming' is defined, either, from the perspective of an

Nora_Ammann2y11

Yeah neat, I haven't yet gotten to reading it but is definitely on my list. Seems (and some folks suggested to me) that it's quite related to the sort of thing I'm discussing in value change problem too.

2Richard_Ngo2y

There are some similarities, although I'm focusing on AI values not human values. Also, seems like the value change stuff is thinking about humanity on the level of an overall society, whereas I'm thinking about value systematization mostly on the level of an individual AI agent. (Of course, widespread deployment of an agent could have a significant effect on its values, if it continues to be updated. But I'm mainly focusing on the internal factors.)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata

Nora_Ammann2y10

Roughly... refers to/emphasizes the dynamic interaction between agent and environment and understands behavior/cognition/agency/... to emerge through that interaction/at that interface (rather than, e.g, trying to understand them as an internal property of the agent only)

1M. Y. Zuo2y

Can you link to a source for a definition of 'enactive'?

2Bird Concept2y

I can't quite tell how that's different from embeddedness. (Also if you have links to other places it's explained feel free to share them.)

4. Risks from causing illegitimate value change (performative predictors)

Nora_Ammann2yΩ121

Related to my point above (and this quoted paragraph), a fundamental nuance here is the distinction between "accidental influence side effects" and "incentivized influence effects". I'm happy to answer more questions on this difference if it's not clear from the rest of my comment.

Thanks for clarifying; I agree it's important to be nuanced here!

I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the... (read more)

4. Risks from causing illegitimate value change (performative predictors)

Nora_Ammann2yΩ121

A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necessarily the case.

Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about!

(Though it also can only give us s... (read more)

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem

2. Premise two: Some cases of value change are (il)legitimate

Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this.

Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a... (read more)

Nora_Ammann2yΩ111

Yeah interesting point. I do see the pull of the argument. In particular the example seems well chosen -- where the general form seems to be something like: we can think of cases where our agent can be said to be better off (according to some reasonable standards/form some reasonable vantage point) if the agent can make themselves be committed to continue doing a thing/undergoing a change for at least a certain amount of time.

That said, I think there are also some problems with it. For example, I'm wary of reifying "I-as-in-CEV" more than what is war... (read more)

2. Premise two: Some cases of value change are (il)legitimate

Nora_Ammann2yΩ121

yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you t... (read more)

Become a PIBBSS Research Affiliate

Nora_Ammann2y10

Yes, as stated the requirements section, affiliates are expected to attend retreats and I expect about 50/50 of events will be happening in US/Europe.

Become a PIBBSS Research Affiliate

Nora_Ammann2y20

We're not based in a single location. We are open to accepting affiliates that are based in whatever place, and are also happy to help them relocate (within some constraints) to somewhere else (e.g. where they have access to a more lively research/epistemic community) if that would be beneficial for them. That said, I also think we are best placed to help people (and have historically tended to run things) in either London/Oxford, Prague or Berkeley.

1RGRGRG2y

If I were to be accepted for this cycle, would I be expected to attend any events in Europe? To be clear, I could attend all events in and around Berkeley.

Become a PIBBSS Research Affiliate

Nora_Ammann2y20

Starting dates might indeed differ depending on candidates' situation. That said, we expect affiliates of this round will start sometime between mid-December and mid-January. We'll be running a research retreat to onboard affiliates within that same time frame.

Telopheme, telophore, and telotect

Telopheme, telophore, and telotect

Right, but I feel like I want to say something like "value grounding" as its analogue.

Also... I do think there is a crucial epistemic dymension to values, and the "[symbol/value] grounding" thing seems like one place where this shows quite well.

2TsviBT2y

Ok yeah I agree with this. Related: https://tsvibt.blogspot.com/2023/09/the-cosmopolitan-leviathan-enthymeme.html#pointing-at-reality-through-novelty And an excerpt from a work in progress: Example: Blueberries For example, I reach out and pick up some blueberries. This is some kind of expression of my values, but how so? Where are the values? Are the values in my hands? Are they entirely in my hands, or not at all in my hands? The circuits that control my hands do what they do with regard to blueberries by virtue of my hands being the way they are. If my hands were different, e.g. really small or polydactylous, my hand-controller circuits would be different and would behave differently when getting blueberries. And the deeper circuits that coordinate visual recognition of blueberries, and the deeper circuits that coordinate the whole blueberry-getting system and correct errors based on blueberrywise success or failure, would also be different. Are the values in my visual cortext? The deeper circuits require some interface with my visual cortex, to do blueberry find-and-pick-upping. And having served that role, my visual cortex is specially trained for that task, and it will even promote blueberries in my visual field to my attention more readily than yours will to you. And my spatial memory has a nearest-blueberries slot, like those people who always know which direction is north. It may be objected that the proximal hand-controllers and the blueberry visual circuits are downstream of other deeper circuits, and since they are downstream, they can be excluded from constituting the value. But that's not so clear. To like blueberries, I have to know what blueberries are, and to know what blueberries are I have to interact with them. The fact that I value blueberries relies on me being able to refer to blueberries. Certainly, if my hands were different but comparably versatile, then I would learn to use them to refer to blueberries about as well as my real hands

Telopheme, telophore, and telotect

The process that invents democracy is part of some telotect, but is it part of a telophore? Or is the telophore only reached when democracy is implemented?

Musing about how (maybe) certain telopheme impose constraints on the structure (logic) of their corresonding telophores and telotects. Eg democracy, freedom, autonomy, justice, corrigibility, rationality, ... (thought plausibly you'd not want to count (some of) those examples as telophemes in the first place?)

2TsviBT2y

I think that your question points out how the concepts as I've laid them out don't really work. I now think that values such as liking a certain process or liking mental properties should be treated as first-class values, and this pretty firmly blurs the telopheme / telophore distinction.

Announcing “Key Phenomena in AI Risk” (facilitated reading group)

Curious whether the following idea rhymes with what you have in mind: telophore as (sort of) doing ~symbol grounding, i.e. the translation (or capacity to translate) from description to (wordly) effect?

2TsviBT2y

It's definitely like symbol grounding, though symbol grounding is usually IIUC about "giving meaning to symbols", which I think has the emphasis on epistemic signifying?

Nora_Ammann2y60

Indeed that wasn't intended. Thanks a lot for spotting & sharing it! It's fixed now.

Announcing “Key Phenomena in AI Risk” (facilitated reading group)

Nora_Ammann2yΩ240

Good point! We are planning to gauge time preferences among the participants and fix slots then. What is maybe most relevant, we are intending to accommodate all time zones. (We have been doing this with PIBBSS fellows as well, so I am pretty confident we will be able to find time slots that work pretty well across the globe.)

Robustness to Scaling Down: More Important Than I Thought

Nora_Ammann3yΩ230

Here is another interpretation of what can cause a lack of robustness to scaling down:

(Maybe this is what you have in mind when you talk about single-single alignment not (necessaeraily) scaling to multi-multi alignment - but I am not sure that is the case, and even if it ism I feel pulled to stating it again more as I don't think it comes out as clearly as I would want it to in the original post.)

Taking the example of an "alignment strategy [that makes] the AI find the preferences of values and humans, and then pursu[e] that", robustness to scaling ... (read more)

Monks of Magnitude

Nora_Ammann3y50

Curious what different aspects the "duration of seclusion" is meant to be a proxy for?

You defindefinitelyitly point at things like "when are they expected to produce intelligible output" and "what sorts of questions appear most relevant to them". Another dimension that came to mind - but I am not sure you mean or not to include that in the concept - is something like "how often are they allowed/able to peak directly at the world, relative to the length of periods during which they reason about things in ways that are removed from empirical data"?

6Duncan Sabien (Inactive)3y

As Henry points out in his comment, certainly at least some 1,000 and 10,000-day monks must need to encounter the territory daily. I think that for some monks there is probably a restriction to actually not look for the full duration, but for others there are probably more regular contacts. I think that one thing the duration of seclusion is likely to be a firm proxy for is "length of time between impinging distractions." Like, there is in fact a way in which most people can have longer, deeper thoughts while hiking on a mountainside with no phone or internet, which is for most people severely curtailed even by having phone or internet for just 20min per day at a set time. So I think that even if a monk is in regular contact with society, the world, etc., there's something like a very strong protection against other people claiming that the monk owes them time/attention/words/anything.

Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

[Extended Deadline: Jan 23rd] Announcing the PIBBSS Summer Research Fellowship

PIBBSS Summer Research Fellowship -- Q&A event

What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar

The role of tribes in achieving lasting impact and how to create them

PIBBSS Summer Research Fellowship -- Q&A event

What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar

Nora_Ammann4y20

I think it's a shame that these days for many people the primary connotation of the word "tribe" is connected to culture wars. In fact, our decision to use this term was in part motivated by wanting to re-appropriate the term to something less politically loaded.

As you can read in our post (see "What is a tribe?"), we mean something particular. As any collective of human beings, it can in principle be subject to excessive in-group/out-group dynamics but that's by far not the only, nor the most interesting part of it.

2jbash4y

I did actually read that. I admit that I didn't read all the detailed advice about how to make one work, since I have no intention of doing so... but I did read the definition and the introductory part. It wouldn't have mattered what word you'd used. Your groups are actually smaller than most things called tribes anyway. I am reacting to the substance. I doubt that humans are , in a practical way, capable of tightening up their in-groups like that without at the same time increasing hostility to out-groups (or at least people who are out-of-the-group). Not in principle, but in practice. If nothing else, you have to start by giving some kind of preference to members of the tribe. And, since it's about mutual aid with certain costs, you have to enforce its boundaries. And set up norms about what you can and can't do and still be "in" (which will not all be formally considered, will not all be under organized control, and yet will involve enough people that they can't easily be changed, challenged, or made too complicated). I suspect that the specific scale of "up to the limit of the number of people who can all personally know each other" is a particularly dangerous scale. For one thing, that means that at the edges of the group, you will often know, and have some special duty toward, the person or people on one side of some brewing conflict... but you will NOT know or feel any special duty toward the person or people on the other side. For another, it's probably the scale at which people most often had occasion to attack each other in the "evolutionary environment". For a third, it means you're always at the risk of growing to the point of having to split the group, with no obvious way to handle that without generating acrimony. You may address that last one in your detailed material; I don't know. It's true, though, that the word "tribe" is kind of attached to that kind of concern. And there must be a reason why the word got a bad name, as well as a reason you

Nora_Ammann's Shortform

Nora_Ammann4y50

Context: (1) Motivations for fostering EA-relevant interdisciplinary research; (2) "domain scanning" and "epistemic translation" as a way of thinking about interdisciplinary research

[cross-posted to the EA forum in shortform]

List of fields/questions for interdisciplinary AI alignment research

The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifical... (read more)

Maps of Maps, and Empty Expectations

Nora_Ammann4y30

Glad to hear it seemed helpful!

FWIW I'd be interested in reading you spell out in more detail what you think you learnt from it about simulacra levels 3+4.

Re "writing the bottom line first": I'm not sure. I think it might be, but at least this connection didn't feel salient, or like it would buy me anything in terms of understanding, when thinking about this so far. Again interested in reading more about where you think the connections are.

To maybe say more about why (so far) it didn't seem clearly relevant to me: "Writing the bottom line first", to ... (read more)

3romeostevensit4y

I meant that empty expectations are another anchor for antidoting writing the bottom line first. As for simulacra levels: This highlights how we switch abstraction levels when we don't know how to solve a problem on the level we're on. This is a reasonable strategy in general that sometimes backfires.

How do we prepare for final crunch time?

Nora_Ammann4y90

Regarding "Staying grounded and stable in spite of the stakes":
I think it might be helpful to unpack the vritue/skill(s) involved according to the different timescales at which emergencies unfold.

For example:

1. At the time scale of minutes or hours, there is a virtue/skill of "staying level headed in a situation of accute crisis". This is the sort of skill you want your emergency doctor or firefighter to have. (When you pointed to the military, I think you in part pointed to this scale but I assume not only.)

From talking to people who do ... (read more)

Re language as an example: parties involved in communication using language have comparable intelligence (and even there I would say someone just a bit smarter can cheat their way around you using language).

Mhh yeah so I agree these examples of ways in which language "fails". But I think they don't bother me too much?
I put them in the same category as "two agents with good faith sometimes miscommunicate - and still, language overall is pragmatically", or "works good enough". In other words, even though there is potential for exploitation, that ... (read more)

a cascade of practically sufficient alignment mechanisms is one of my favorite ways to interpret Paul's IDA (Iterated Distillation-Amplification)

Yeah, great point!

However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved.

I agree and think this is a good point! I think on top of quantifying the required alignment reliability "at various levels of optimization" it would also be relevant to take the underlying territory/domain into account. We can say that a territory/domain has a specific epistemic and normative structure (which e.g. defines the error margin that is acceptable, or tracks the co-evolutionary dynamics).

Nora_Ammann's Shortform

The Inner Workings of Resourcefulness

Pragmatically reliable alignment
[taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]

AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a w... (read more)

The feeling of breaking an Overton window

The question you're pointing at is definitely interstinterestinging. A Freudian, slightly pointed way of phrasing it is something like: are human's deepest desires, in essence, good and altruistic, or violent and selifsh?

My guess is that this question is wrong-headed. For example, I think this is making a mistake of drawing a dichotomy and rivalry between my "oldest and deepest drives" and "reflective reasoning", and depending on your conception of which of these two wins, your answer to the above questions ends up being positive or negative. I don't... (read more)

Nora_Ammann4y40

[I felt inclined to look for observations of this thing outside of the context of the pandemic.]

Some observations:

I experience this process (either in full or the initial stages of it) for example when asked about my work (as it relates to EA, x-risks, AI safety, rationality and the like), or when sharing ~unconventional plan (e.g. "I'll just spend the next few months thinking about this") when talking to e.g. old friends from when I was growing up, people in the public sphere like a dentist, physiotherapist etc. This used to be also somewhat the cas... (read more)

As far as I can tell, I agree with what you say - this seems like a good account of how the cryptophraher's constraint cashes out in language.

To your confusion: I think Dennett would agree that it is Darwianian all the way down, and that their disagreement lies elsewhere. Dennet's account for how "reasons turn into causes" is made on Darwinian grounds, and it compels Dennett (but not Rosenberg) to conclude that purposes deserve to be treated as real, because (compressing the argument a lot) they have the capacity to affect the causal world.

Not sure this is useful?

I'm inclined to map your idea of "reference input of a control system" onto the concept of homeostasis, homeostatic set points and homeostatic loops. Does that capture what you're trying to point at?

(Assuming it does) I agree that that homeostasis is an interesting puzzle piece here. My guess for why this didn't come up in the letter exchange is that D/R are trying to resolve a related but slightly different question: the nature and role of an organism's conscious, internal experience of "purpose".

Purpose and its pursuit have a special role in how hu... (read more)

2Richard_Kennaway4y

I would map "homeostasis" onto "control system", but maybe that's just a terminological preference. The internal experience of purpose is a special case of internal experience, explaining which is the Hard Problem of Consciousness, which no-one has a solution for. I don't see a reason to deny this sort of purpose to animals, except to the extent that one would deny all conscious experience to them. I am quite willing to believe that (for example) cats, dogs, and primates have a level of consciousness that includes purpose. The evolutionary explanation does not make any predictions. It looks at what is, says "it was selected for", and confabulates a story about its usefulness. Why do we have five fingers? Because every other number was selected against. Why were they selected against? Because they were less useful. How were they less useful? They must have been, because they were selected against. Even if some content were put into that, it still would not explain the thing that was to be explained: what is purpose? It is like answering the question "how does a car work?" by expatiating upon how useful cars are.

In regards to "the meaning of life is what we give it", that's like saying "the price of an apple is what we give it". While true, it doesn't tell the whole story. There's actual market forces that dictate apple prices, just like there are actual darwinian forces that dictate meaning and purpose.

Agree; the causes that we create ourselves aren't all that governs us - in fact, it's a small fraction of that, considering physical, chemical, biological, game-theoretic, etc. constraints. And yet, there appears to be an interesting difference between the causes t... (read more)

I'm confused about the "purposes don't affect the world" part. If I think my purpose is to eat an apple, then there will not be an apple in the world that would have otherwise still been there if my purpose wasn't to eat the apple. My purpose has actual effects on the world, so my purpose actually exists.

So, yes, basically this is what Dennett reasons in favour of, and what Rosenberg is skeptical of.

I think the thing here that needs reconciliation - and what Dennett is trying to do - is to explain why, in your apple story, it's justified to use... (read more)