All of PaulK's Comments + Replies

PaulK10

I wonder if you can recover Kelly from linear utility in money, plus a number of rounds unknown to you and chosen probabilistically from a distribution.

1SimonM
No, it's fairly straightforward to see this won't work Let N be the random variable denoting the number of rounds. Let x = p*w+(1-p)*l where p is probability of winning and w=1-f+o*f, l=1-f the amounts we win or lose betting a fraction f of our wealth. Then the value we care about is E[x^N], which is the moment generating function of X evaluated at log(x). Since our mgf is increasing as a function of x, we want to maximise x. ie our linear utility doesn't change
PaulK20

In the soaking-up-extra-compute case? Yeah, for sure, I can only really picture it (a) on a very short-term basis, for example maybe while linking up tightly for important negotations (but even here, not very likely). Or (b) in a situation with high power asymmetry. For example maybe there's a story where 'lords' delegate work to their 'vassals', but the workload intensity is variable, so the vassals have leftover compute, and the lords demand that they spend it on something like blockchain mining. To compensate for the vulnerability this induces, the lords would also provide protection.

PaulK10
  1. Yup, all that would certainly make it more complicated. In a regime where this kind of tightly-controlled delegation were really important, we might also demand our counterparties standardize their hardware so they can't play tricks like this.
  2. I was picturing a more power-asymmetric situation, more like a feudal lord giving his vassals lots of busywork so they don't have time to plot anything.
Answer by PaulK*2-1

We might develop schemes for auditable computation, where one party can come in at any time and check the other party's logs. They should conform to the source code that the second party is supposed to be running; and also to any observable behavior that the second party has displayed. It's probably possible to have logging and behavioral signalling be sufficiently rich that the first party can be convinced that that code is indeed being run (without it being too hard to check -- maybe with some kind of probabilistically checkable proof).

However, this only... (read more)

3O O
Can the agent just mute their capabilities when they do this computation? There are very slick ways to speed up computation and likewise slick ways to slow down computation. The agent could say, mess up cache coherency in its hardware, store data types differently, ignore the outputs of some of its compute, or maybe run faster than the other agent expects by devising a faster algorithm, using hardware level optimizations that use strange physics the other agent hasn’t thought of, etc. Secondly, how would an agent convince another to run expensive code that takes up their entire compute? If you are some nation in medieval Europe, and another adjacent nation demanded every able bodied person to enter a triathlon to measure their net strength, would any sane leader agree to that?
2Capybasilisk
Wouldn’t that also leave them pretty vulnerable?
PaulK32

Sorry, I guess I didn't make the connection to your post clear. I substantially agree with you that utility functions over agent-states aren't rich enough to model real behavior. (Except, maybe, at a very abstract level, a la predictive processing? (which I don't understand well enough to make the connection precise)). 

Utility functions over world-states -- which is what I thought you meant by 'states' at first -- are in some sense richer, but I still think inadequate.

And I agree that utility functions over agent histories are too flexible.

I was sort ... (read more)

PaulK20

Oh, huh, this post was on the LW front page, and dated as posted today, so I assumed it was fresh, but the replies' dates are actually from a month ago.

3DragonGod
Reposted it because I didn't get any good answers last time, and I'm working on a post that's a successor to this one currently and would really appreciate the good answers I did not get.
4the gears to ascension
lesswrong has a bug that allows people to restore their posts to "new" status on the frontpage by moving them to draft and then back.
Answer by PaulK10

(A somewhat theologically inspired answer:)

Outside the dichotomy of values (in the shard-theory sense) vs. immutable goals, we could also talk about valuing something that is in some sense fixed, but "too big" to fit inside your mind. Maybe a very abstract thing. So your understanding of it is always partial, though you can keep learning more and more about it (and you might shift around, feeling out different parts of the elephant). And your acted-on values would appear mutable, but there would actually be a, perhaps non-obvious, coherence to them.

It's po... (read more)

2DragonGod
My claim is mostly that real world intelligent systems do not have values that can be well described by a single fixed utility function over agent states. I do not see this answer as engaging with that claim at all. If you define utility functions over agent histories, then everything is an expected utility maximiser for the function that assigns positive utility to whatever action the agent actually took and zero utility to every other action. I think such a definition of utility function is useless. If however you define utility functions over agent states, then your hypothesis doesn't engage with my claim at all. The reason that real world intelligent systems aren't utility functions isn't because the utility function is too big to fit inside them or because of incomplete knowledge. My claim is that no such utility function exists that adequately describes the behaviour of real world intelligent systems. I am claiming that there is no such mathematical object, no single fixed utility function over agent states that can describe the behaviour of humans or sophisticated animals. Such a function does not exist.
2PaulK
Oh, huh, this post was on the LW front page, and dated as posted today, so I assumed it was fresh, but the replies' dates are actually from a month ago.
PaulK21

I still don't know exactly what parts of my comment you're responding to. Maybe talking about a concrete sub-agent coordination problem would help ground this more.

But as a general response: in your example it sounds like you already have the problem very well narrowed down, to 3 possibilities with precise probabilities. What if there were 10^100 possibilities instead? Or uncertainty where the full real thing is not contained in the hypothesis space?

PaulK10

This is for logical coordination? How does it help you with that?

3quetzal_rainbow
Like it helps everywhere when uncertainty is here? Imagine a problem "You are in Prisoner's dilemma with such and such payoffs, find optimal strategy if distribution of your possible opponents is 25% CooperateBots, 33% DefectBots and 42% those who actually knows decision theory".
PaulK204

IMO, coordination difficulties among sub-agents can't be waved away so easily. The solutions named, side-channel trades and counterfactual coordination, are both limited.

I would frame the nature of their limits, loosely, like this. In real minds (or at least the human ones we are familiar with), the stuff we care about lives in a high-dimensional space. A mind could be said to be, roughly, a network spanning such a space. A trade between elements (~sub-agents) that are nearby in this space will not be too hard to do directly. But for long-distance trades, ... (read more)

1eapi
Loosely related to this, it would be nice to know if systems which reliably don't turn down 'free money' must necessarily have almost-magical levels of internal coordination or centralization. If the only things which can't (be tricked into) turn(ing) down free money when the next T seconds of trade offers are known are Matrioshka brains at most T light-seconds wide, does that tell us anything useful about the limits of that facet of dominance as a measure of agency?
4quetzal_rainbow
It seems to me that this is basically solved by "you put probability distributions over all things that you don't actually know and may have disagreement about"
PaulK20

Probably some students will actually be quite bothered by this and be left with lingering, subtle confusion and discomfort. It is, in a sense, taking a shortcut past all the objections and alternatives that real humans had historically to these ideas. And IMO some students will be much better served by going the long way around, studying the ideas along with their history.

PaulK10

One response to frame-control-y situations is, instead of making accusations that as you say can lead to a he-said-she-said situation, to personally fall back to a more careful, defensive posture vis a vis framing, accepting that there seem to be strong framing differences among the people here, and communicating this posture to others. In other words, accepting when it seems to be too hard to directly create common knowledge about what is happening at the level of framing.

PaulK132

Random question, tangential to this post in particular (but not the series): should we expect genes to be doing something like geometric rationality in their propagation? When a new gene emerges and starts to spread, even if it greatly increases host fitness on average, its # of copies could easily drop to 0 by chance. So it "should want" to be cautious, like a kelly better, and maximize its growth geometrically rather than arithmetically.

Not sure quite how that logic should cash out though. For one, genes that make their hosts more cautious (reduce fitnes... (read more)

PaulK60

Is there an arithmetic vs. geometric rationality thing (a la Scott Garrabrant's recent series) going on with genes?

Like, at equilibrium, the ratio of different genetic variants should be determined by the arithmetic expectation of the number of copies they pass on to the next generation. But for new variants just starting out, the population size (# of copies of that variant) could easily hit 0 and get wiped out, so it should be more cautious -- the population should want to maximize the geometric expectation of its growth rate, like a kelly better.

Does this make sense? I don't know actual population genetics math.

PaulK40

Wow, I came here to say literally the same thing about commensurability: that perhaps AM is for what's commensurable, and GM is for what's incommensurable.

Though, one note is that to me it actually seems fine to consider different epistemic viewpoints as incommensurate. These might be like different islands of low K-complexity, that each get some nice traction on the world but in very different ways, and where the path between them goes through inaccessibly-high K-complexity territory.

PaulK10

Another setting that seems natural and gives rise to multiplicative utility is if we are trying to cover as much of a space as possible, and we divide it dimension-wise into subspace, each tracked by a subagent. To get the total size covered, we multiply together the sizes covered within each subspace.

We can kinda shoehorn unequal weighing in here if we have each sub-agent track not just the fractional or absolute coverage of their subspace, but the per-dimension geometric average of their coverage.

For example, say we're trying to cover a 3D cube that's 10... (read more)

PaulK20

These are super interesting ideas, thanks for writing the sequence!

I've been trying to think of toy models where the geometric expectation pops out -- here's a partial one, which is about conjunctivity of values:

Say our ultimate goal is to put together a puzzle (U = 1 if we can, U = 0 if not), for which we need 2 pieces. We have sub-agents A and B who care about the two pieces respectively, each of whose utility for a state is its probability estimates for finding its piece there. Then our expected utility for a state is the product of their utilities (ass... (read more)

1PaulK
Another setting that seems natural and gives rise to multiplicative utility is if we are trying to cover as much of a space as possible, and we divide it dimension-wise into subspace, each tracked by a subagent. To get the total size covered, we multiply together the sizes covered within each subspace. We can kinda shoehorn unequal weighing in here if we have each sub-agent track not just the fractional or absolute coverage of their subspace, but the per-dimension geometric average of their coverage. For example, say we're trying to cover a 3D cube that's 10x10x10, with subagent A minding dimension 1 and subagent B minding dimensions 2 and 3. A particular outcome might involve A having 4/10 coverage and B having 81/100 coverage, for a total coverage of (4/10)*(81/100), which we could also phrase as (4/10)*(9/10)^2. I'm not sure how to make uncertainty work correctly within each factor though.
PaulK54

I also think that the fact that AI safety thinking is so much driven by these fear + distraction patterns, is what's behind the general flail-y nature of so much AI safety work. There's a lot of, "I have to do something! This is something! Therefore, I will do this!"

3Valentine
I agree… and also, I want to be careful of stereotypes here. Like, I totally saw a lot of flail nature in what folk were doing when I was immersed in this world years ago. But I also saw a lot of faux calmness and reasonableness. That's another face of this engine. And I saw some glimmers of what I consider to be clear lucidity. And I saw a bunch that I wasn't lucid enough at the time to pay proper attention to, and as such I don't have a clear opinion about now. I just lack data because I wasn't paying attention to the people standing in front of me. :-o But with that caveat: yes, I agree.
PaulK90

I think your diagnosis of the problem is right on the money, and I'm glad you wrote it. 

As for your advice on what a person should do about this, it has a strong flavor of: quit doing what you're doing and go in the opposite direction. I think this is going to be good for some people but not others. Sometimes it's best to start where you are. Like, one can keep thinking about AI risk while also trying to become more aware of the distortions that are being introduced by these personal and collective fear patterns.

That's the individual level though, and... (read more)

2Valentine
I mostly just agree. I hesitate to ever give a rationalist the advice to keep thinking about something that's causing them to disembody while they work on embodiment. Even if there's a good way for them to do so, my impression is that most who would be inclined to try cannot do that. They'll overthink. It's like suggesting an alcoholic not stop cold-turkey but leaving them to decide how much to wean back. But I do think there's a balance point that if it could be enacted would actually be healthier for quite a few people. I'm just not holding most folks' hands here! So the "cold turkey" thing strikes me as better general advice for those going at it on their own with minimal support.
5PaulK
I also think that the fact that AI safety thinking is so much driven by these fear + distraction patterns, is what's behind the general flail-y nature of so much AI safety work. There's a lot of, "I have to do something! This is something! Therefore, I will do this!"
PaulK20

Nice essay, makes sense to me! Curious how you see this playing into machine intelligence.

One thought is that "help maintain referential stability", or something in that ballpark, might be a good normative target for an AI. Such an AI would help humans think, clarify arguments, recover dropped threads of meaning. (Of course, done naively, this could be very socially disruptive, as many social arrangements depend on the absence of clear flows of meaning.)

PaulK10

As a slightly tangential point, I think if you start thinking about how to cast survival / homeostasis in terms of expected-utility maximization, you start having to confront a lot of funny issues, like, "what happens if my proxies for survival change because I self-modified?", and then more fundamentally, "how do I define / locate the 'me' whose survival I am valuing? what if I overlap with other beings? what if there are multiple 'copies' of me?". Which are real issues for selfhood IMO.

2tailcalled
In the evolutionary case, the answer is that this is out of distribution, so it's not evolved to be robust to such changes.
PaulK40

>There is no way for the pursuit of homeostasis to change through bottom-up feedback from anything inside the wrapper.  The hierarchy of control is strict and only goes one way.

Note that people do sometimes do things like starve themselves to death or choose to become martyrs in various ways, for reasons that are very compelling to them. I take this as a demonstration that homeostatic maintenance of the body is in some sense "on the same level" as other reasons / intentions / values, rather than strictly above everything else.

2tailcalled
"No way" is indeed an excessively strong phrasing, but it seems clear to me that pursuit of homeostasis is much more robust to perturbations than most other pursuits.
PaulK10

I do see the inverse side: a single fixed goal would be something in the mind that's not open to critique, hence not truly generally intelligent from a Deutschian perspective (I would guess; I don't actually know his work well).

To expand on the "not truly generally intelligent" point: one way this could look is if the goal included some tacit assumptions about the universe that turned out later not to be true in general -- e.g. if the agent's goal was something involving increasingly long-range simultaneous coordination, before the discovery of relativity -- and if the goal were really unchangeable, then it would bar or at least complicate the agent's updating to a new, truer ontology.

PaulK30

I've been thinking along the same lines, very glad you've articulated all this!

1Sune
Same, it’s what I tried to ask here, just elaborated better than I could have done myself.
1Jeff Rose
Same here.
PaulK30

The way I understand the intent vs. effect thing is that the person doing "frame control" will often contain multitudes: an unconscious, hidden side that's driving the frame control, and then the more conscious side that may not be very aware of it, and would certainly disclaim any such intent.

PaulK30

Small typo: you have two sections numbered [7.2]

5Mark Xu
both of those sections draw from section 7.2 of the original paper
PaulK10

(I assume that by "gears-level models" you mean a combination of reasoning about actors' concrete capabilities; and game-theory-style models of interaction where we can reach concrete conclusions? If so,)

I would turn this around, and say instead that "gears-level models" alone tend to not be that great for understanding how power works. 

The problem is that power is partly recursive. For example, A may have power by virtue of being able to get B to do things for it, but B's willingness also depends on A's power. All actors, in parallel, are looking aro... (read more)

2johnswentworth
I do not think they're far from being able to describe most important power dynamics, although one does need to go beyond game-theory-101. In particular, Schelling's work is key for things like "rumor starting a stampede" or abstractions having causal power, as well as properly understanding threats, the importance of risk tolerance, and various other aspects particularly relevant to bargaining dynamics.
PaulK50

Interesting essay!

In your scenario where people deliberate while their AIs handle all the competition on their behalf, you note that persuasion is problematic: this is partly because, with intent-aligned AIs, the system is vulnerable to persuasion in that "what the operator intends" can itself become a target of attack during conflict.

Here is another related issue. In a sufficiently weird or complex situation, "what the operator intends" may not be well-defined -- the operator may not know it, and the AI may not be able to infer it with confidence. In this... (read more)

PaulK30

The next time you are making a complicated argument, if you can, try and watch yourself recalling bits and pieces at a time. To me, it feels viscerally like I have the whole argument in mind, but when I look closely, it's obviously not the case. I'm just boldly going on and putting faith in my memory system to provide the next pieces when I need them. And usually it works out.

Yes! And, I would offer an additional, alternative way of phrasing this: "you" actually do have the whole argument in mind, but it's a higher-level "you", a slower but more inclusive ... (read more)

PaulK30

I disagree that mesa optimization requires explicit representation of values. Consider an RL-type system that (1) learns strategies that work well in its training data, and then (2) generalizes to new strategies that in some sense fit well or are parsimonious with respect to its existing strategies. Strategies need not be explicitly represented. Nonetheless, it's possible for those initially learned strategies to implicitly bake in what we could call foundational goals or values, that the system never updates away from.

For another angle, consider that valu... (read more)

PaulK50

(tl;dr: I think a lot of this is about one-way (read-only) vs. two-way communication)

As a long-term meditator and someone who takes contents of phenomenal consciousness as quite "real" in their own way, I enjoyed this post -- it helped me clarify some of my disagreements with these ideas, and to just feel out this conceptual-argumentative landscape.

I want to draw out something about "access consciousness" that you didn't mention explicitly, but that I see latent in both your account (correct me if I'm wrong) and the SEP's discussion of it (ctrl-F for "acce... (read more)

PaulK30

Also, on your description of designs factorizing into parts, maybe you already know this, but I wanted to highlight that often "factorization", even when neat, isn't just a straightforward decomposition into separate parts. For example, say you're designing a distributed system. You might have a kind of "vertical" decomposition into roles like leader and follower. But then also a "horizontal" decomposition into different kinds of data that get shared in different ways. The logic of roles and kinds of data might then interact, so that the algorithm is really conceptually two-dimensional.

(These kinds of issues make cognition harder to factorize)

2Alex Flint
Yes I agree with this. Another example is the way a two-by-four length of timber is a kind of "interface" between the wood mill and the construction worker. There is a lot of complexity in the construction of these at the wood mill, but the standard two-by-four means that the construction worker doesn't have to care. This is also a kind of factorization that isn't about decomposition into parts or subsystems.
PaulK20

Thanks for the thought-provoking post, Alex.

Thinking about how exactly design stories help create trust, I came upon what might be a useful distinction: whether the design is good according to the considerations known to the designer, vs. whether all relevant considerations are present. A good design story lets us check both of these. The first being false means the designer just did a bad job, or perhaps is hiding something. The second being false means there are actually just considerations the designer didn't know about -- for example because they ... (read more)

3Alex Flint
Well said, friend. Yes when we have a shared understanding of what we're building together, with honest and concise stories flowing in both directions, we have a better chance of actually understanding what all the stakeholders are trying to achieve, which at least makes it possible to find a design that is good for everyone. The distinction you point out between a design error and missing information seems like a helpful distinction to me. Thank you. It reminds me of the idea of interaction games that CHAI is working on. Instead of having a human give a fully-specified objective to a machine right up front, the idea is to define some protocol for ongoing communication between human and machine about what the objective actually is. When I design practical things I notice this kind of thing happening between me and the world as I start with a vague sense of what my objective is and gradually clarify it through experimentation. I'm curious to hear about your experience designing and building things, and how it matches up with the model we're talking about here.
PaulK40

On the first, more philosophical part of your post: I think your notion of "freedom-as-arbitrariness" is actually also what allows for "freedom-as-optimization", in the following way.

Suppose I have an abstract set of choices. These can be instantiated in a concrete situation, which then carries its own set of considerations. When I go to do my optimizing in a given concrete situation, the more constrained or partisan my choice is in the abstract, the more difficult is my total optimization. Conversely, the freer, the more arbitrary the... (read more)

PaulK20

Cool. I've had one brief, spontaneous experience, while circling, of that sort of concept -> vision 'synaesthesia': seeing dark halos around people, that I think represented their anxiety and desire to avoid talking about certain things.

But I'd never imagined working deliberately with vision in that way.

4ChristianKl
I think that's the direction. The perception for emotional states that I developed myself is very kinesthetic in nature. I can see how the phrase "dark halos" would map there, but for myself it has a slightly different quality. Aliveness might be a name for my qualia. It's certainly very different then the kind of aura glow effects that get created by shutting down error correction. From what I heared of other people it seems that some people have experiences where there are colors as well. When I'm doing circling or leading a meditation the kind of informationen you point to is an important guidance for me.
PaulK140

So is this a fair summary?

Contemplative practitioners sometimes have great psyche-refactoring experiences, "insights". But, when interpreting & integrating them, they fail to keep a strong enough epistemic distinction between their experience and the ultimate reality it arises from. And then they make crazy inferences about the nature of that ultimate reality.

5romeostevensit
Right. As several commenters have pointed out, I might be giving too much benefit of the doubt with the complicated explanation given that most people's empirical and inferential engines aren't exactly v8's. That said, I hope the network refactoring angle is helpful to those who do have decent epistemic standards.
PaulK120
When this happens with parts of the network that are involved with the visual system, for instance, the visual field can actually dissolve into a bunch of vibrations temporarily as you refactor parts of the network related to extremely low level things like edge or motion detection (this is also where 'auras' come from imo)

Wow, I've never heard of this, and it sounds really interesting. Would you care to elaborate, on what kind of refactoring is going on, and what the resulting 'auras' are / mean?

If you shut down certain error correction mechanisms you might see a red glow around a green object, a orange glow around a blue object and similar effects.

When people are then very surprised about witnessing those effects, it's often easy to sell them the idea that there are mysterious powers involved in aura seeing.

When it comes to more advanced work with the concepts, some people seem to have synesthesia where some their brain manages to express some intuitive information that's available to them in forms of colors.

PaulK21

You can get into some weird, loopy situations when people reflect enough to lift up the floorboards, infer some "player-level" motivations, and then go around talking or thinking about them at the "character level". Especially if they're lacking in tact or social sophistication. I remember as a kid being so confused about charitable giving -- because, doesn't everyone know that giving is basically just a way of trying to make yourself look good? And doesn't everyone know that that's Wrong? So shouldn't everyone ... (read more)

PaulK10

Yeah, I think costly signalling is definitely part of it. I think there's really several different things going on in the birthday example. One, the friend knows that you decided to spend the evening with them, so they can infer that you want to perform friendship, and/or anticipate having a good time with them, enough to make you decide that. This is the costly signalling part. But then there's also the stuff that actually happens at the party: talking, laughing together, etc. I think this is what actually accounts for most of the "feeling closer". (Or perhaps these two effects act on different levels of "feeling closer").

Anyway this is maybe getting unnecessarily analytical.

3Raemon
Nod. FWIW, I was actually in part referring to costly-signal-to-yourself. I also agree that there's probably multiple different levels of feeling closer.
PaulK20
A ritual is about making a sacrifice to imbue a moment with symbolic power, and using that power to transform yourself.

I'm really curious where you're getting the sacrifice part from! Or how important you think it is. Because my experience with rituals doesn't generally include sacrificing anything; and the bits of sociology I've read about ritual (mostly Randall Collins' book Interaction Ritual Chains) don't mention it much. It does resonate with perhaps a western-magical perspective?

5Raemon
Yeah, I think the most important bit is the "investing a moment with symbolic power". Definitions vary, and come to think of it I'm not sure which piece I read that emphasized the sacrifice element. But I remember the context being "all rituals involve sacrifice – some minor, some major. The default sacrifice is time, even if all you're doing is getting together to wish someone happy birthday. More significant and resonant sacrifices tend to make the experience more powerful." In the birthday example, you're all sacrificing an evening to transform your relationship with a person [i.e make yourself closer], and transform that person. I think you can argue that that's more of a word game than a real argument. But still seems worth noting that rituals with more expenditures of time seem to be more potent. If I stop by my friend's party for 5 minutes or the office just gathers everyone together to eat a cake before returning to work, that makes us less close than if I spend a whole evening with my friend.
PaulK130

Great essay!

Another aspect of this divide is about articulability. In a nurturing context, it's possible to bring something up before you can articulate it clearly, and even elicit help articulating it.

For example, "Something about <the proposal we're discussing> strikes me as contradictory -- like it's somehow not taking into account <X>?". And then the other person and I collaborate to figure out if and what exactly that contradiction is.

Or more informally, "There's something about this that feels uncomfor... (read more)