All of particlemania's Comments + Replies

I expect it matters to the extent we care about whether the generalizing to the new question is taking place in the expensive pretraining phase, or in the active in-context phase.

Not to pick on you specifically, but just as a general comment, I'm getting a bit worried about the rationalist decontextualized content policing. It seems it usually goes like this: someone cultivates an epistemological practice (say how to extract conceptual insights from diverse practices) -> they decide to cross-post their thoughts on a community blog interested in epistemology -> somebody else unfamiliar with the former's body of work comes across it -> interprets it into a pattern they might rightfully have identified as critique-worthy ->... (read more)

2[comment deleted]

I would agree that it would be good and reasonable to have a term to refer to the family of scientific and philosophical problem spanned by this space. At the same time, as the post says, the issue is when there is semantic dilution, people talking past each other, and coordination-inhibiting ambiguity.

P3 seems helpful but insufficient for good long term outcomes

Now take a look at something I could check with a simple search: an ICML Workshop that uses the term alignment mostly to mean P3 (task-reliability) https://arlet-workshop.github.io/

One might want t... (read more)

particlemaniaΩ110

First of all, these are all meant to denote very rough attempts at demarcating research tastes.

It seems possible to be aiming to solve P1 without thinking much of P4, if a) you advocate ~Butlerian pause, or b) if you are working on aligned paternalism as the target behavior (where AI(s) are responsible for keeping humans happy, and humans have no residual agency or autonomy remaining).

Also a lot of people who focus on the problem from a P4 perspective tend to focus on the human-AI interface, where most of the relevant technical problems lie, but this might reduce their attention on issues of mesa-optimizers or emergent agency despite the massive importance of those issues to their project in the long run.

Okasha's paper is addressing emerging discussions in biology that are talking about organisms-as-agents in particular, otherwise being called the Return of the Organism turn in philosophy of biology.

In the paper, he adds "Various concepts have been offered as ways of fleshing out this idea of organismic autonomy, including goal-directedness, functional organization, emergence, self-maintenance, and individuality. Agency is another possible candidate for the job."

This seems like a reasonable stance so far as I can tell, since organisms seem to have some str... (read more)

My understanding of Steel Late Wittgenstein's response would be that you could agree with that words and concepts are distinct, and mapping is not always 1-1, but that what concepts get used is also significantly influenced by which features of the world are useful in some contexts of language (/word) use. 

Rewards and Utilities are different concepts. To reject that reward is necessary to get/build agency is not the same thing as rejecting EU maximization as a basin of idealized agency.

4Roman Leventov
The relevant paragraph that I quoted refutes exactly this. In the bolded sentence, "value function" is used as a synonym to "utility function". You simply cannot represent an agent that always seeks to maximise "empowerment" (as defined in the paper for Self-preserving agents), for example, or always seeks to minimise free energy (as in Active Inference agents), as maximising some quantity over its lifetime: if you integrate empowerment or free energy over time you don't get a sensible information quantity that you can label as "utility". This is an uncontroversial idea, and is not a contribution of the paper. The paper contributes a formal demonstration that such agents are "stable", "self-preserving". Previously, this hasn't been shown for arbitrary Active Inference agents, formally. Note that the fact that these agents are not utility maximisers doesn't mean they don't instrumentally converge. Cf. https://www.lesswrong.com/posts/ostLZyhnBPndno2zP/active-inference-as-a-formalisation-of-instrumental. I haven't read the full paper yet, maybe I will see how the framework in there could admit mild optimisation, but so far I don't.

As an addendum, it seems to me that you may not necessarily need a 'long-term planner' (or 'time-unbounded agent') in the environment. A similar outcome may also be attainable if the environment contains a tiling of time-bound agents who can all trade across each other in ways such that the overall trade network implements long term power seeking.

Concept Dictionary.


Concepts that I intend to use or invoke in my writings later, or are parts of my reasoning about AI risk or related complex systems phenomena.