All of Nora_Ammann's Comments + Replies

I found this article ~very poor. Much of the rhetorical moves adopted in the pieces seem largely optimised for making it easy to stay on the "high horse". Talking about a singular AI doomer movement being one of them.  Having the stance that AGI is not near and thus there is nothing to worry about is another. Whether or not that's true, it certainly makes it easy to point your finger at folks who are worried and say 'look what silly theater'. 

I think it's somewhat interesting to ask whether there should be more coherence across safety efforts, an... (read more)

What's the case for it being a swiss cheese approach? That doesn't match how I think of it. 

2Quinn
I'm surprised to hear you say that, since you write I kinda think anything which is not a panacea is swiss cheese, that those are the only two options. In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year's language models, which is why I can agree mostly with Zac's talk and still work on GSAI (I don't think he talks about my cruxes). Specifically, I think the guarantees of each module and the guarantees of each pipe (connecting the modules) isolate/restrict the error to the world-model gap or the world-spec gap, and I think the engineering problems of getting those guarantees are straightforward / not conceptual problems. Furthermore, I think the conceptual problems with reducing the world-spec gap below some threshold presented by Safeguarded's TA1 are easier than the conceptual problems in alignment/safety/control.

Donated 1k USD, and might donate more after some further reflection.

I have myself and seen many others benefit a lot from the many things you've done over the years, in particular LW and Lighthaven. But beyond that, I also particularly value and respect you for the integrity and intellectual honesty I have found you demonstrate consistently. 

[Edited a bit for clafity]
 

(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)

Ryan -  I appreciate the donation, and in particular you sharing your reasoning here. 

I agree with a lot of what you write. Especially "c... (read more)

Yes, we upload them to our Youtube account modulo the speaker agreeing to it. The first few recordings from this series should be uploaded very shortly. 

While I don't think it's so much about selfishness as such, I think this points at something important, also discussed eg here: The self-unalignment problem

Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.

I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to... (read more)

2TsviBT
I'm not sure I understand your question at all, sorry. I'll say my interpretation and then answer that. You might be asking: My answer is no, that doesn't sum up the essay. The essay makes these claims: 1. There many different directions in conceptspace that could be considered "more foundational", each with their own usefulness and partial coherence. 2. None of these directions gives a total ordering that satisfies all the main needs of a "foundational direction". 3. Some propositions/concepts not only fail to go in [your favorite foundational direction], but are furthermore circular; they call on themselves. 4. At least for all the "foundational directions" I listed, circular ideas can't be going in that direction, because they are circular. 5. Nevertheless, a circular idea can be pretty useful. I did fail to list "functional" in my list of "foundational directions", so thanks for bringing it up. What I say about foundational directions would also apply to "functional".

Yeah, would be pretty keen to see more work trying to do this for AI risk/safety questions specifically: contrasting what different lenses "see" and emphasize, and what productive they critiques they have to offer to each other. 

Over the last couple of years, valuable progress has been made towards stating the (more classical) AI risk/safety arguments more clearly, and I think that's very productive for leading to better discourse (including critiques of those ideas). I think we're a bit behind on developing clear articulations of the complex systems/... (read more)

To follow up on this, we'll be hosting John's talk on Dec 12th, 9:30AM Pacific / 6:30PM CET

Join through this Zoom Link.

Title: AI would be a lot less alarming if we understood agents

Description:  In this talk, John will discuss why and how fundamental questions about agency - as they are asked, among others, by scholars in biology, artificial life, systems theory, etc. - are important to making progress in AI alignment. John gave a similar talk at the annual ALIFE conference in 2023, as an attempt to nerd-snipe researchers studying agency in a b... (read more)

2Alex_Altair
FYI this link redirects to a UC Berkeley login page.

I have no doubt Alexander would shine!

Happy to run a PIBBSS speaker event for this, record it and make it publicly available. Let me know if you're keen and we'll reach out to find a time.

FWIW I also think the "Key Phenomena of AI risk" reading curriculum (h/t TJ) does some of this at least indirectly (it doesn't set out to directly answer this question, but I think a lot of the answers to the question are comprise in the curriculum). 

(Edit: fixed link)

How confident are you about it not having been recorded? If not very, seems props worth checking again

6rorygreig
The workshop talks from the previous year's ALIFE conference (2022) seem to be published on YouTube, so I'm following up with whether John's talk from this year's conference can be released as well.
5johnswentworth
I mean, I could always re-present it and record if there's demand for that. ... or we could do this the fun way: powerpoint karaoke. I.e. you make up the talk and record it, using those slides. I bet Alexander could give a really great one.

Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular: 

  • an "internalist" notion, where 'scheming' is defined via the "system's internals", i.e. roughly: the system has goal A, acts as if it has goal B, until the moment is suitable to reveal it's true goal A.
  • an "externalist" notion, where 'scheming' is defined, either, from the perspective of an
... (read more)

Yeah neat, I haven't yet gotten to reading it but is definitely on my list. Seems (and some folks suggested to me) that it's quite related to the sort of thing I'm discussing in value change problem too.

2Richard_Ngo
There are some similarities, although I'm focusing on AI values not human values. Also, seems like the value change stuff is thinking about humanity on the level of an overall society, whereas I'm thinking about value systematization mostly on the level of an individual AI agent. (Of course, widespread deployment of an agent could have a significant effect on its values, if it continues to be updated. But I'm mainly focusing on the internal factors.)

Roughly... refers to/emphasizes the dynamic interaction between agent and environment and understands behavior/cognition/agency/... to emerge through that interaction/at that interface (rather than, e.g, trying to understand them as an internal property of the agent only)

1M. Y. Zuo
Can you link to a source for a definition of 'enactive'? 
2Bird Concept
I can't quite tell how that's different from embeddedness. (Also if you have links to other places it's explained feel free to share them.)

Related to my point above (and this quoted paragraph), a fundamental nuance here is the distinction between "accidental influence side effects"  and "incentivized influence effects". I'm happy to answer more questions on this difference if it's not clear from the rest of my comment.

Thanks for clarifying; I agree it's important to be nuanced here!

I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the... (read more)

A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necessarily the case.

Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about! 

(Though it also can only give us s... (read more)

Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this. 

Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a... (read more)

Yeah interesting point. I do see the pull of the argument. In particular the example seems well chosen -- where the general form seems to be something like: we can think of cases where our agent can be said to be better off (according to some reasonable standards/form some reasonable vantage point) if the agent can make themselves be committed to continue doing a thing/undergoing a change for at least a certain amount of time. 

That said, I think there are also some problems with it. For example, I'm wary of reifying "I-as-in-CEV" more than what is war... (read more)

yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you t... (read more)

Yes, as stated the requirements section, affiliates are expected to attend retreats and I expect about 50/50 of events will be happening in US/Europe. 

We're not based in a single location. We are open to accepting affiliates that are based in whatever place, and are also happy to help them relocate (within some constraints) to somewhere else (e.g. where they have access to a more lively research/epistemic community) if that would be beneficial for them. That said, I also think we are best placed to help people (and have historically tended to run things) in either London/Oxford, Prague or Berkeley. 

1RGRGRG
If I were to be accepted for this cycle, would I be expected to attend any events in Europe?  To be clear, I could attend all events in and around Berkeley.

Starting dates might indeed differ depending on candidates' situation. That said, we expect affiliates of this round will start sometime between mid-December and  mid-January. We'll be running a research retreat to onboard affiliates within that same time frame.

Right, but I feel like I want to say something like "value grounding"  as its analogue. 

Also... I do think there is a crucial epistemic dymension to values, and the "[symbol/value] grounding" thing seems like one place where this shows quite well.

2TsviBT
Ok yeah I agree with this. Related: https://tsvibt.blogspot.com/2023/09/the-cosmopolitan-leviathan-enthymeme.html#pointing-at-reality-through-novelty And an excerpt from a work in progress: Example: Blueberries For example, I reach out and pick up some blueberries. This is some kind of expression of my values, but how so? Where are the values? Are the values in my hands? Are they entirely in my hands, or not at all in my hands? The circuits that control my hands do what they do with regard to blueberries by virtue of my hands being the way they are. If my hands were different, e.g. really small or polydactylous, my hand-controller circuits would be different and would behave differently when getting blueberries. And the deeper circuits that coordinate visual recognition of blueberries, and the deeper circuits that coordinate the whole blueberry-getting system and correct errors based on blueberrywise success or failure, would also be different. Are the values in my visual cortext? The deeper circuits require some interface with my visual cortex, to do blueberry find-and-pick-upping. And having served that role, my visual cortex is specially trained for that task, and it will even promote blueberries in my visual field to my attention more readily than yours will to you. And my spatial memory has a nearest-blueberries slot, like those people who always know which direction is north. It may be objected that the proximal hand-controllers and the blueberry visual circuits are downstream of other deeper circuits, and since they are downstream, they can be excluded from constituting the value. But that's not so clear. To like blueberries, I have to know what blueberries are, and to know what blueberries are I have to interact with them. The fact that I value blueberries relies on me being able to refer to blueberries. Certainly, if my hands were different but comparably versatile, then I would learn to use them to refer to blueberries about as well as my real hands

The process that invents democracy is part of some telotect, but is it part of a telophore? Or is the telophore only reached when democracy is implemented?

Musing about how (maybe) certain telopheme impose constraints on the structure (logic) of their corresonding telophores and telotects. Eg democracy, freedom, autonomy, justice, corrigibility, rationality, ... (thought plausibly you'd not want to count (some of) those examples as telophemes in the first place?)




 

2TsviBT
I think that your question points out how the concepts as I've laid them out don't really work. I now think that values such as liking a certain process or liking mental properties should be treated as first-class values, and this pretty firmly blurs the telopheme / telophore distinction.

Curious whether the following idea rhymes with what you have in mind: telophore as (sort of) doing ~symbol grounding, i.e. the translation (or capacity to translate) from description to (wordly) effect? 

2TsviBT
It's definitely like symbol grounding, though symbol grounding is usually IIUC about "giving meaning to symbols", which I think has the emphasis on epistemic signifying?

Indeed that wasn't intended. Thanks a lot for spotting & sharing it! It's fixed now.

Good point! We are planning to gauge time preferences among the participants and fix slots then. What is maybe most relevant, we are intending to accommodate all time zones. (We have been doing this with PIBBSS fellows as well, so I am pretty confident we will be able to find time slots that work pretty well across the globe.)

Here is another interpretation of what can cause a lack of robustness to scaling down: 

(Maybe this is what you have in mind when you talk about single-single alignment not (necessaeraily) scaling to multi-multi alignment - but I am not sure that is the case, and even if it ism I feel pulled to stating it again more as I don't think it comes out as clearly as I would want it to in the original post.)

Taking the example of an "alignment strategy [that makes] the AI find the preferences of values and humans, and then pursu[e] that", robustness to scaling ... (read more)

Curious what different aspects the "duration of seclusion" is meant to be a proxy for? 

You defindefinitelyitly point at things like "when are they expected to produce intelligible output" and "what sorts of questions appear most relevant to them". Another dimension that came to mind - but I am not sure you mean or not to include that in the concept - is something like "how often are they allowed/able to peak directly at the world, relative to the length of periods during which they reason about things in ways that are removed from empirical data"? 

5Duncan Sabien (Deactivated)
As Henry points out in his comment, certainly at least some 1,000 and 10,000-day monks must need to encounter the territory daily.  I think that for some monks there is probably a restriction to actually not look for the full duration, but for others there are probably more regular contacts. I think that one thing the duration of seclusion is likely to be a firm proxy for is "length of time between impinging distractions."  Like, there is in fact a way in which most people can have longer, deeper thoughts while hiking on a mountainside with no phone or internet, which is for most people severely curtailed even by having phone or internet for just 20min per day at a set time. So I think that even if a monk is in regular contact with society, the world, etc., there's something like a very strong protection against other people claiming that the monk owes them time/attention/words/anything.

PIBBSS Summer Research Fellowship -- Q&A event

  • What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
  • For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
  • When? Wednesday 12th January, 7 pm GMT
  • Where? On Google Meet, add to your calendar

PIBBSS Summer Research Fellowship -- Q&A event

  • What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
  • For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
  • When? Wednesday 12th January, 7 pm GMT
  • Where? On Google Meet, add to your calendar

I think it's a shame that these days for many people the primary connotation of the word "tribe" is connected to culture wars. In fact, our decision to use this term was in part motivated by wanting to re-appropriate the term to something less politically loaded.

As you can read in our post (see "What is a tribe?"), we mean something particular. As any collective of human beings, it can in principle be subject to excessive in-group/out-group dynamics but that's by far not the only, nor the most interesting part of it. 

2jbash
I did actually read that. I admit that I didn't read all the detailed advice about how to make one work, since I have no intention of doing so... but I did read the definition and the introductory part. It wouldn't have mattered what word you'd used. Your groups are actually smaller than most things called tribes anyway. I am reacting to the substance. I doubt that humans are , in a practical way, capable of tightening up their in-groups like that without at the same time increasing hostility to out-groups (or at least people who are out-of-the-group). Not in principle, but in practice. If nothing else, you have to start by giving some kind of preference to members of the tribe. And, since it's about mutual aid with certain costs, you have to enforce its boundaries. And set up norms about what you can and can't do and still be "in" (which will not all be formally considered, will not all be under organized control, and yet will involve enough people that they can't easily be changed, challenged, or made too complicated). I suspect that the specific scale of "up to the limit of the number of people who can all personally know each other" is a particularly dangerous scale. For one thing, that means that at the edges of the group, you will often know, and have some special duty toward, the person or people on one side of some brewing conflict... but you will NOT know or feel any special duty toward the person or people on the other side. For another, it's probably the scale at which people most often had occasion to attack each other in the "evolutionary environment". For a third, it means you're always at the risk of growing to the point of having to split the group, with no obvious way to handle that without generating acrimony. You may address that last one in your detailed material; I don't know. It's true, though, that the word "tribe" is kind of attached to that kind of concern. And there must be a reason why the word got a bad name, as well as a reason you

Context:  (1) Motivations for fostering EA-relevant interdisciplinary research; (2) "domain scanning" and "epistemic translation" as a way of thinking about interdisciplinary research

[cross-posted to the EA forum in shortform]
 

List of fields/questions for interdisciplinary AI alignment research

The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifical... (read more)

Glad to hear it seemed helpful!

FWIW I'd be interested in reading you spell out in more detail what you think you learnt from it about simulacra levels 3+4.

Re "writing the bottom line first": I'm not sure. I think it might be, but at least this connection didn't feel salient, or like it would buy me anything in terms of understanding, when thinking about this so far. Again interested in reading more about where you think the connections are. 

To maybe say more about why (so far) it didn't seem clearly relevant to me: "Writing the bottom line first", to ... (read more)

3romeostevensit
I meant that empty expectations are another anchor for antidoting writing the bottom line first. As for simulacra levels: This highlights how we switch abstraction levels when we don't know how to solve a problem on the level we're on. This is a reasonable strategy in general that sometimes backfires.

Regarding "Staying grounded and stable in spite of the stakes": 
I think it might be helpful to unpack the vritue/skill(s) involved according to the different timescales at which emergencies unfold. 

For example: 

1. At the time scale of minutes or hours, there is a virtue/skill of "staying level headed in a situation of accute crisis". This is the sort of skill you want your emergency doctor or firefighter to have. (When you pointed to the military, I think you in part pointed to this scale but I assume not only.)

From talking to people who do ... (read more)

Re language as an example: parties involved in communication using language have comparable intelligence (and even there I would say someone just a bit smarter can cheat their way around you using language). 

Mhh yeah so I agree these examples of ways in which language "fails". But I think they don't bother me too much? 
I put them in the same category as "two agents with good faith sometimes miscommunicate - and still, language overall is pragmatically", or "works good enough". In other words, even though there is potential for exploitation, that ... (read more)

a cascade of practically sufficient alignment mechanisms is one of my favorite ways to interpret Paul's IDA (Iterated Distillation-Amplification)

Yeah, great point!

However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved. 

I agree and think this is a good point! I think on top of quantifying the required alignment reliability "at various levels of optimization" it would also be relevant to take the underlying territory/domain into account. We can say that a territory/domain has a specific epistemic and normative structure (which e.g. defines the error margin that is acceptable, or tracks the co-evolutionary dynamics). 


 

Pragmatically reliable alignment
[taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]

AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research. 

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a w... (read more)

The question you're pointing at is definitely interstinterestinging. A Freudian, slightly pointed way of phrasing it is something like: are human's deepest desires, in essence, good and altruistic, or violent and selifsh? 

My guess is that this question is wrong-headed. For example, I think this is making a mistake of drawing a dichotomy and rivalry between my "oldest and deepest drives" and "reflective reasoning", and depending on your conception of which of these two wins, your answer to the above questions ends up being positive or negative. I don't... (read more)

[I felt inclined to look for observations of this thing outside of the context of the pandemic.]

Some observations: 

I experience this process (either in full or the initial stages of it) for example when asked about my work (as it relates to EA, x-risks, AI safety, rationality and the like), or when sharing ~unconventional plan (e.g. "I'll just spend the next few months thinking about this") when talking to e.g. old friends from when I was growing up, people in the public sphere like a dentist, physiotherapist etc. This used to be also somewhat the cas... (read more)

As far as I can tell, I agree with what you say - this seems like a good account of how the cryptophraher's constraint cashes out in language. 

To your confusion: I think Dennett would agree that it is Darwianian all the way down, and that their disagreement lies elsewhere. Dennet's account for how "reasons turn into causes" is made on Darwinian grounds, and it compels Dennett (but not Rosenberg) to conclude that purposes deserve to be treated as real, because (compressing the argument a  lot) they have the capacity to affect the causal world.

Not sure this is useful?

I'm inclined to map your idea of "reference input of a control system" onto the concept of homeostasis, homeostatic set points and homeostatic loops. Does that capture what you're trying to point at?

(Assuming it does) I agree that that homeostasis is an interesting puzzle piece here. My guess for why this didn't come up in the letter exchange is that D/R are trying to resolve a related but slightly different question: the nature and role of an organism's conscious, internal experience of "purpose". 

Purpose and its pursuit have a special role in how hu... (read more)

2Richard_Kennaway
I would map "homeostasis" onto "control system", but maybe that's just a terminological preference. The internal experience of purpose is a special case of internal experience, explaining which is the Hard Problem of Consciousness, which no-one has a solution for. I don't see a reason to deny this sort of purpose to animals, except to the extent that one would deny all conscious experience to them. I am quite willing to believe that (for example) cats, dogs, and primates have a level of consciousness that includes purpose. The evolutionary explanation does not make any predictions. It looks at what is, says "it was selected for", and confabulates a story about its usefulness. Why do we have five fingers? Because every other number was selected against. Why were they selected against? Because they were less useful. How were they less useful? They must have been, because they were selected against. Even if some content were put into that, it still would not explain the thing that was to be explained: what is purpose? It is like answering the question "how does a car work?" by expatiating upon how useful cars are.

In regards to "the meaning of life is what we give it", that's like saying "the price of an apple is what we give it". While true, it doesn't tell the whole story. There's actual market forces that dictate apple prices, just like there are actual darwinian forces that dictate meaning and purpose.

Agree; the causes that we create ourselves aren't all that governs us - in fact, it's a small fraction of that, considering physical, chemical, biological, game-theoretic, etc. constraints. And yet, there appears to be an interesting difference between the causes t... (read more)

I'm confused about the "purposes don't affect the world" part. If I think my purpose is to eat an apple, then there will not be an apple in the world that would have otherwise still been there if my purpose wasn't to eat the apple. My purpose has actual effects on the world, so my purpose actually exists.

So, yes, basically this is what Dennett reasons in favour of, and what Rosenberg is skeptical of. 

I think the thing here that needs reconciliation - and what Dennett is trying to do - is to explain why,  in your apple story, it's justified to use... (read more)

Thanks :)

> I will note that I found the "Rosenberg's crux" section pretty hard to read, because it was quite dense. 

Yeah, you're right - thanks for the concrete feedback ! 

I wasn't originally planning to make this a public post and later failed to take a step back and properly model what it would be like as a reader without the context of having read the letter exchange. 

I consider adding a short intro paragraph to partially remedy this.  

3Bird Concept
Makes sense! An intro paragraph could be good :) 

While I'm not an expert, I did study political science and am Swiss. I think this post paints an accurate picture of important parts of the Swiss political system. Also, I think (and admire) how it explains very nicely the basic workings of a naturally fairly complicated system.

If people are interested in reading more about Swiss Democracy and its underlying political/institutional culture (which, as pointed out in the post, is pretty informal and shaped by its historic context), I can recommend this book: https://www.amazon.com/Swiss-Democracy-Solut... (read more)

Answer by Nora_Ammann50
Are there any existing variolation projects that I can join?

FWIW, there is this I know of: https://1daysooner.org/

That said, last time I've got an update from them (~1 month ago), any execution of these trials was still at least a few months away. (You could reach out to them via the website for more up to date information.) Also, there is a limited number of places where the trials can actually take place, so you'd have to check whether there is anything close to where you are.

(Meta: This isn't necessarily an endoresement of your main qusetion.)

That's cool to hear!

We are hoping to write up our current thinking on ICF at some point (although I don't expect it to happened within the next 3 months) and will make sure to share it.

Happy to talk!

Load More