What's the case for it being a swiss cheese approach? That doesn't match how I think of it.
Donated 1k USD, and might donate more after some further reflection.
I have myself and seen many others benefit a lot from the many things you've done over the years, in particular LW and Lighthaven. But beyond that, I also particularly value and respect you for the integrity and intellectual honesty I have found you demonstrate consistently.
[Edited a bit for clafity]
(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)
Ryan - I appreciate the donation, and in particular you sharing your reasoning here.
I agree with a lot of what you write. Especially "c...
Yes, we upload them to our Youtube account modulo the speaker agreeing to it. The first few recordings from this series should be uploaded very shortly.
While I don't think it's so much about selfishness as such, I think this points at something important, also discussed eg here: The self-unalignment problem
Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.
I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to...
Yeah, would be pretty keen to see more work trying to do this for AI risk/safety questions specifically: contrasting what different lenses "see" and emphasize, and what productive they critiques they have to offer to each other.
Over the last couple of years, valuable progress has been made towards stating the (more classical) AI risk/safety arguments more clearly, and I think that's very productive for leading to better discourse (including critiques of those ideas). I think we're a bit behind on developing clear articulations of the complex systems/...
To follow up on this, we'll be hosting John's talk on Dec 12th, 9:30AM Pacific / 6:30PM CET.
Join through this Zoom Link.
Title: AI would be a lot less alarming if we understood agents
Description: In this talk, John will discuss why and how fundamental questions about agency - as they are asked, among others, by scholars in biology, artificial life, systems theory, etc. - are important to making progress in AI alignment. John gave a similar talk at the annual ALIFE conference in 2023, as an attempt to nerd-snipe researchers studying agency in a b...
I have no doubt Alexander would shine!
Happy to run a PIBBSS speaker event for this, record it and make it publicly available. Let me know if you're keen and we'll reach out to find a time.
FWIW I also think the "Key Phenomena of AI risk" reading curriculum (h/t TJ) does some of this at least indirectly (it doesn't set out to directly answer this question, but I think a lot of the answers to the question are comprise in the curriculum).
(Edit: fixed link)
How confident are you about it not having been recorded? If not very, seems props worth checking again
Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular:
Yeah neat, I haven't yet gotten to reading it but is definitely on my list. Seems (and some folks suggested to me) that it's quite related to the sort of thing I'm discussing in value change problem too.
Roughly... refers to/emphasizes the dynamic interaction between agent and environment and understands behavior/cognition/agency/... to emerge through that interaction/at that interface (rather than, e.g, trying to understand them as an internal property of the agent only)
Related to my point above (and this quoted paragraph), a fundamental nuance here is the distinction between "accidental influence side effects" and "incentivized influence effects". I'm happy to answer more questions on this difference if it's not clear from the rest of my comment.
Thanks for clarifying; I agree it's important to be nuanced here!
I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the...
A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necessarily the case.
Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about!
(Though it also can only give us s...
Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this.
Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a...
Yeah interesting point. I do see the pull of the argument. In particular the example seems well chosen -- where the general form seems to be something like: we can think of cases where our agent can be said to be better off (according to some reasonable standards/form some reasonable vantage point) if the agent can make themselves be committed to continue doing a thing/undergoing a change for at least a certain amount of time.
That said, I think there are also some problems with it. For example, I'm wary of reifying "I-as-in-CEV" more than what is war...
yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you t...
Yes, as stated the requirements section, affiliates are expected to attend retreats and I expect about 50/50 of events will be happening in US/Europe.
We're not based in a single location. We are open to accepting affiliates that are based in whatever place, and are also happy to help them relocate (within some constraints) to somewhere else (e.g. where they have access to a more lively research/epistemic community) if that would be beneficial for them. That said, I also think we are best placed to help people (and have historically tended to run things) in either London/Oxford, Prague or Berkeley.
Starting dates might indeed differ depending on candidates' situation. That said, we expect affiliates of this round will start sometime between mid-December and mid-January. We'll be running a research retreat to onboard affiliates within that same time frame.
Right, but I feel like I want to say something like "value grounding" as its analogue.
Also... I do think there is a crucial epistemic dymension to values, and the "[symbol/value] grounding" thing seems like one place where this shows quite well.
The process that invents democracy is part of some telotect, but is it part of a telophore? Or is the telophore only reached when democracy is implemented?
Musing about how (maybe) certain telopheme impose constraints on the structure (logic) of their corresonding telophores and telotects. Eg democracy, freedom, autonomy, justice, corrigibility, rationality, ... (thought plausibly you'd not want to count (some of) those examples as telophemes in the first place?)
Curious whether the following idea rhymes with what you have in mind: telophore as (sort of) doing ~symbol grounding, i.e. the translation (or capacity to translate) from description to (wordly) effect?
Indeed that wasn't intended. Thanks a lot for spotting & sharing it! It's fixed now.
Good point! We are planning to gauge time preferences among the participants and fix slots then. What is maybe most relevant, we are intending to accommodate all time zones. (We have been doing this with PIBBSS fellows as well, so I am pretty confident we will be able to find time slots that work pretty well across the globe.)
Here is another interpretation of what can cause a lack of robustness to scaling down:
(Maybe this is what you have in mind when you talk about single-single alignment not (necessaeraily) scaling to multi-multi alignment - but I am not sure that is the case, and even if it ism I feel pulled to stating it again more as I don't think it comes out as clearly as I would want it to in the original post.)
Taking the example of an "alignment strategy [that makes] the AI find the preferences of values and humans, and then pursu[e] that", robustness to scaling ...
Curious what different aspects the "duration of seclusion" is meant to be a proxy for?
You defindefinitelyitly point at things like "when are they expected to produce intelligible output" and "what sorts of questions appear most relevant to them". Another dimension that came to mind - but I am not sure you mean or not to include that in the concept - is something like "how often are they allowed/able to peak directly at the world, relative to the length of periods during which they reason about things in ways that are removed from empirical data"?
PIBBSS Summer Research Fellowship -- Q&A event
PIBBSS Summer Research Fellowship -- Q&A event
I think it's a shame that these days for many people the primary connotation of the word "tribe" is connected to culture wars. In fact, our decision to use this term was in part motivated by wanting to re-appropriate the term to something less politically loaded.
As you can read in our post (see "What is a tribe?"), we mean something particular. As any collective of human beings, it can in principle be subject to excessive in-group/out-group dynamics but that's by far not the only, nor the most interesting part of it.
Context: (1) Motivations for fostering EA-relevant interdisciplinary research; (2) "domain scanning" and "epistemic translation" as a way of thinking about interdisciplinary research
[cross-posted to the EA forum in shortform]
The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifical...
Glad to hear it seemed helpful!
FWIW I'd be interested in reading you spell out in more detail what you think you learnt from it about simulacra levels 3+4.
Re "writing the bottom line first": I'm not sure. I think it might be, but at least this connection didn't feel salient, or like it would buy me anything in terms of understanding, when thinking about this so far. Again interested in reading more about where you think the connections are.
To maybe say more about why (so far) it didn't seem clearly relevant to me: "Writing the bottom line first", to ...
Regarding "Staying grounded and stable in spite of the stakes":
I think it might be helpful to unpack the vritue/skill(s) involved according to the different timescales at which emergencies unfold.
For example:
1. At the time scale of minutes or hours, there is a virtue/skill of "staying level headed in a situation of accute crisis". This is the sort of skill you want your emergency doctor or firefighter to have. (When you pointed to the military, I think you in part pointed to this scale but I assume not only.)
From talking to people who do ...
Re language as an example: parties involved in communication using language have comparable intelligence (and even there I would say someone just a bit smarter can cheat their way around you using language).
Mhh yeah so I agree these examples of ways in which language "fails". But I think they don't bother me too much?
I put them in the same category as "two agents with good faith sometimes miscommunicate - and still, language overall is pragmatically", or "works good enough". In other words, even though there is potential for exploitation, that ...
a cascade of practically sufficient alignment mechanisms is one of my favorite ways to interpret Paul's IDA (Iterated Distillation-Amplification)
Yeah, great point!
However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved.
I agree and think this is a good point! I think on top of quantifying the required alignment reliability "at various levels of optimization" it would also be relevant to take the underlying territory/domain into account. We can say that a territory/domain has a specific epistemic and normative structure (which e.g. defines the error margin that is acceptable, or tracks the co-evolutionary dynamics).
Pragmatically reliable alignment
[taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]
AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.
Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a w...
The question you're pointing at is definitely interstinterestinging. A Freudian, slightly pointed way of phrasing it is something like: are human's deepest desires, in essence, good and altruistic, or violent and selifsh?
My guess is that this question is wrong-headed. For example, I think this is making a mistake of drawing a dichotomy and rivalry between my "oldest and deepest drives" and "reflective reasoning", and depending on your conception of which of these two wins, your answer to the above questions ends up being positive or negative. I don't...
[I felt inclined to look for observations of this thing outside of the context of the pandemic.]
Some observations:
I experience this process (either in full or the initial stages of it) for example when asked about my work (as it relates to EA, x-risks, AI safety, rationality and the like), or when sharing ~unconventional plan (e.g. "I'll just spend the next few months thinking about this") when talking to e.g. old friends from when I was growing up, people in the public sphere like a dentist, physiotherapist etc. This used to be also somewhat the cas...
As far as I can tell, I agree with what you say - this seems like a good account of how the cryptophraher's constraint cashes out in language.
To your confusion: I think Dennett would agree that it is Darwianian all the way down, and that their disagreement lies elsewhere. Dennet's account for how "reasons turn into causes" is made on Darwinian grounds, and it compels Dennett (but not Rosenberg) to conclude that purposes deserve to be treated as real, because (compressing the argument a lot) they have the capacity to affect the causal world.
Not sure this is useful?
I'm inclined to map your idea of "reference input of a control system" onto the concept of homeostasis, homeostatic set points and homeostatic loops. Does that capture what you're trying to point at?
(Assuming it does) I agree that that homeostasis is an interesting puzzle piece here. My guess for why this didn't come up in the letter exchange is that D/R are trying to resolve a related but slightly different question: the nature and role of an organism's conscious, internal experience of "purpose".
Purpose and its pursuit have a special role in how hu...
In regards to "the meaning of life is what we give it", that's like saying "the price of an apple is what we give it". While true, it doesn't tell the whole story. There's actual market forces that dictate apple prices, just like there are actual darwinian forces that dictate meaning and purpose.
Agree; the causes that we create ourselves aren't all that governs us - in fact, it's a small fraction of that, considering physical, chemical, biological, game-theoretic, etc. constraints. And yet, there appears to be an interesting difference between the causes t...
I'm confused about the "purposes don't affect the world" part. If I think my purpose is to eat an apple, then there will not be an apple in the world that would have otherwise still been there if my purpose wasn't to eat the apple. My purpose has actual effects on the world, so my purpose actually exists.
So, yes, basically this is what Dennett reasons in favour of, and what Rosenberg is skeptical of.
I think the thing here that needs reconciliation - and what Dennett is trying to do - is to explain why, in your apple story, it's justified to use...
Thanks :)
> I will note that I found the "Rosenberg's crux" section pretty hard to read, because it was quite dense.
Yeah, you're right - thanks for the concrete feedback !
I wasn't originally planning to make this a public post and later failed to take a step back and properly model what it would be like as a reader without the context of having read the letter exchange.
I consider adding a short intro paragraph to partially remedy this.
While I'm not an expert, I did study political science and am Swiss. I think this post paints an accurate picture of important parts of the Swiss political system. Also, I think (and admire) how it explains very nicely the basic workings of a naturally fairly complicated system.
If people are interested in reading more about Swiss Democracy and its underlying political/institutional culture (which, as pointed out in the post, is pretty informal and shaped by its historic context), I can recommend this book: https://www.amazon.com/Swiss-Democracy-Solut...
Are there any existing variolation projects that I can join?
FWIW, there is this I know of: https://1daysooner.org/
That said, last time I've got an update from them (~1 month ago), any execution of these trials was still at least a few months away. (You could reach out to them via the website for more up to date information.) Also, there is a limited number of places where the trials can actually take place, so you'd have to check whether there is anything close to where you are.
(Meta: This isn't necessarily an endoresement of your main qusetion.)
That's cool to hear!
We are hoping to write up our current thinking on ICF at some point (although I don't expect it to happened within the next 3 months) and will make sure to share it.
Happy to talk!
I found this article ~very poor. Much of the rhetorical moves adopted in the pieces seem largely optimised for making it easy to stay on the "high horse". Talking about a singular AI doomer movement being one of them. Having the stance that AGI is not near and thus there is nothing to worry about is another. Whether or not that's true, it certainly makes it easy to point your finger at folks who are worried and say 'look what silly theater'.
I think it's somewhat interesting to ask whether there should be more coherence across safety efforts, an... (read more)