User Comment Replies

Thanks for these last three posts!

Just sharing some vibe I've got from your.. framing!
Minimalism ~ path ~ inside-focused ~ the signal/reward
Maximalism ~ destination ~ outside-focused ~ the world

These two opposing aesthetics is a well-known confusing bit within agent foundation style research. The classical way to model an agent is to think as it is maximizing outside world variables. Conversely, we can think about minimization ~ inside-focused (reward hacking type error) as a drug addict accomplishing "nothing"

Feels there is also something to s... (read more)

4adamShimi8mo

I see what you're pointing out, but in my head, the minimalism and maximalism that I've discussed both allow you quick feedback loops, which is generally the way to go for complex stuff. The tradeoff lies more in some fuzzy notion of usability: * With the minimalism approach, you can more easily iterate in your head, but you need to do more work to lift the basic concepts to the potentially more tricky abstactions you're trying to express * With the maximalist approach, you get affordances that are eminently practical, so that many of your needs are solved almost instantly; but you need to spend much more expertise and mental effort to simulate what's going to happen in your head during edge-cases. I'm obviously memeing a bit, but the real pattern I'm point out is more for "french engineering school education", which you also have, rather than mere frenchness.

Secondary forces of debt

Antoine de Scorraille8mo20

Oh. As I read those first lines, I thought, "Isn't it obvious?!!! How the hell did the author not notice that at like, 5 years old?" I mean, it's such a common plot in fiction. And paradoxically, I thought I wasn't good at knowing social dynamics. But maybe that's an exception: I have a really good track record at guessing werewolves in the werewolf game. So maybe I'm just good at theory (very relevant to this game) but still bad at reading people.

The idea of applying it to wealth is interesting, though.

Summary: Surreal Decisions

Antoine de Scorraille8mo10

Me too :) But I then realized it's not any "game", but the combinatorial ones.

Average utilitarianism is non-local

Antoine de Scorraille9mo-20

This.

Sum-threshold attacks

Antoine de Scorraille9mo30

Excellent post! I think it's closely related to (but not reducible to) to the general concept of Pressure. Both pressure as

the physical definition (which is, indeed, a sum of many micro forces on a frontier surface, but able to affect at macroscale the whole object).
the metaphorical common use of it (social pressure etc): see the post.

Sum-threshold attacks

Antoine de Scorraille9mo10

It's not about bad intentions in most practical cases, but about biases. Hanlon's razor doesn't apply (or, very weakly) to systemic issues.

Game Theory without Argmax [Part 1]

Antoine de Scorraille1y20

It's fixed ;)

Game Theory without Argmax [Part 1]

Antoine de Scorraille1y*20

Exercise 1:

The empty set is the only one. For any nonempty set X, you could pick $z e r o_{X} = λ^{x \in X} .0$ as a counterexample: $a r g m a x_{X} (z e r o_{X}) = X$

Exercise 2:

The agent will choose an option which scores better than the threshold $α$ .

It's a generalization of satisficers, these latter are thresholders such as $u^{- 1} (α)$ is nonempty.

Exercise 3:

$m a j o r i t y_{X} = λ^{u : X \to R} . {x \in X | \forall x^{'} \in X, C a r d (u^{- 1} ({u (x)})) \geq C a r d (u^{- 1} ({u (x^{'})}))}$

Exercise 4:

I have discovered a truly marvelous-but-infinite solution of this, which this finite comment is too narrow to contain.

Exercise 5:

The generalisable

... (read more)

1rotatingpaguro1y

I'd like to read your solution to exercise 6, could you add math formatting? I have a hard time reading latex code directly. You can do that with the visual editor mode by selecting the math and using the contextual menu that appears automatically, or with $ in the Markdown editor. There are $ in your comment, so I guess you inadvertently typed in visual mode using the Markdown syntax.

Game Theory without Argmax [Part 1]

Antoine de Scorraille1y*20

Even with finite sets, it doesn't work because the idea to look for "closest to $X_{l e g a l}$ " is not what we looking for.

Let $X = {A l i c e; B o b; C h a r l i e}$ a class of students, scoring within a range $R = {0; 1; 2; 3; 4; 5}$ , $u : X \to R$ . Let $ψ$ the (uniform) better-than-average optimizer, standing for the professor picking any student who scores better than the mean.

Let $X_{l e g a l} = {A l i c e; B o b}$ (the professor $ψ$ despises Charlie and ignores him totally).

If u(Alice) = 5 and u(Bob) = 4, their average is 4.5 so only Alice should be picked by the constrained optimisation.

Howewer, with your propos... (read more)

1rotatingpaguro1y

Ok, take 2. If I understand correctly, what you want must be more like "restrict the domain of the task before plugging it into the optimiser," and less like "restrict the output of the optimiser." I don't know how to do that agnostically, however, because optimisers in general have the domain of the task baked in. Indeed the expression for a class of optimisers is JP(X,R), with X in it. Considering better-than-average optimisers from your example, they are a class with a natural notion of "domain of the task" to tweak, so I can naturally map any initial optimiser to a new one with a restricted task domain: JP(X,R)→JP(Xlegal,R), by taking the mean over Xlegal. But given a otherwise unspecified ψ∈JP(X,R), I don't see a natural way to define a ψ′∈JP(Xlegal,R). Assuming there's no more elegant answer than filtering for that (ψ′(u)=ψ(u)∩Xlegal), then the question must be: is there another minimally restrictive class of optimisers with such a natural notion, which is not the one with the "detested element" ⊥ already proposed by the OP? Try 1: consequentialist optimisers, plus the assumption u(Xlegal)=u(X), i.e., the legal moves do not restrict the possible payoffs. Then, since the optimiser picks actions only through u−1(r), for each r I can delete illegal actions from the preimage, without creating new broken outputs. However, this turns out to be just filtering, so it's not an interesting case. Try 2: the minimal distill of try 1 is that the output either is empty or contains legal moves already, and then I filter, so yeah not an interesting idea. Try 3: invariance under permutation of something? A task invariant under permutation of x is just a constant task. An optimiser "invariant under permutation of X" does not even mean something. Try 4: consider a generic map X→JP(X,R). This does not say anything, it's the baseline. Try 5: analyse the structure of a specific example. The better-than-average class of optimisers is ψ(u:X→R)={x:u(x)≥∑r∈Rr/|R|}. It is cons

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

Antoine de Scorraille2y50

Natural language is lossy because the communication channel is narrow, hence the need for lower-dimensional representation (see ML embeddings) of what we're trying to convey. Lossy representations is also what Abstractions are about.
But in practice, you expect Natural Abstractions (if discovered) cannot be expressed in natural language?

2johnswentworth2y

I expect words are usually pointers to natural abstractions, so that part isn't the main issue - e.g. when we look at how natural language fails all the time in real-world coordination problems, the issue usually isn't that two people have different ideas of what "tree" means. (That kind of failure does sometimes happen, but it's unusual enough to be funny/notable.) The much more common failure mode is that a person is unable to clearly express what they want - e.g. a client failing to communicate what they want to a seller. That sort of thing is one reason why I'm highly uncertain about the extent to which human values (or other variations of "what humans want") are a natural abstraction.

Shannon's Surprising Discovery

Antoine de Scorraille2y30

There is one catch: in principle, there could be multiple codes/descriptions which decode to the same message. The obvious thing to do is then to add up the implied probabilities of each description which produces the same message. That indeed works great. However, it turns out that just taking the minimum description length - i.e. the length of the shortest code/description which produces the message - is a good-enough approximation of that sum in one particularly powerful class of codes: universal Turing machines.

Is this about K-complexity... (read more)

2johnswentworth2y

Yes.

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

Antoine de Scorraille2y41

Do we really have such good interpretations for such examples? It seems to me that we have big problems in the real world because we don't.
We do have very high-level interpretations, but not enough to have solid guarantees. After all, we have a very high-level trivial interpretation of our ML models: they learn! The challenge is not just to have clues, but clues that are relevant enough to address safety concerns in relation to impact scale (which is the unprecedented feature of the AI field).

Four levels of understanding decision theory

Antoine de Scorraille2y32

Pretty cool!
Just to add, although I think you already know: we don't need to have a reflexive understanding of your DT to put it into practice, because messy brains rather than provable algo etc....
And I always feel it's kinda unfair to dismiss as orthogonal motivations "valuing friendliness or a sense of honor" because they might be evolutionarily selected heuristics to (sort of) implement such acausal DT concerns!

K-complexity is silly; use cross-entropy instead

Antoine de Scorraille2y10

Great! Isn't it generalizable to any argmin/argmax issues? Especialy thinking about the argmax decision theories framework, which is a well-known difficulty for safety concerns.
Similarly, in EA/action-oriented discussions, there is a reccurent pattern like:

Eager-to-act padawan: If world model/moral theory X is best likely to be true (due to evidence y z...), we need to act accordingly with the controversial Z! Seems best EU action!

Experienced jedi: Wait for a minute. You have to be careful with this way of thinking, because there are unkwown unknown,... (read more)

Study Guide

Antoine de Scorraille2y30

Breadth Over Depth -> To reframe, is it about to optimize for known unknown?

2johnswentworth2y

Yes, that's an accurate reframing.

Exploiting Newcomb's Game Show

Antoine de Scorraille2y10

This CDT protagonist is not winning the game in a predictable and avoidable way, so he's a bad player
Yes, in this example and many others, have legibility is a powerful strategy (big chunks of social skills are about that)

The Waluigi Effect (mega-post)

Antoine de Scorraille2y10

I'm confused, could you clarify? I interpret your "Wawaluigi" as two successive layers of deception within a simulacra, which is unlikely if WE is reliable, right?
I didn't say anything about Wawaluigis and I agree that they are not Luigis, because as I said, a layer of Waluigi is not a one-to-one operator. My guess is about a normal Waluigi layer, but with a desirable Waluigi rather than a harmful Waluigi.

The Waluigi Effect (mega-post)

Antoine de Scorraille2y10

"Good" simply means "our targeted property" here. So my point is, if WE is true to any property P, we could get a P-Waluigi through some anti-P (pseudo-)naive targeting.

I don't get your second point, we're talking about simulacra not agent, and obviously this idea would only be part of a larger solution at best. For any property P, I expect several anti-P so you don't have to instanciate an actually bad Luigi, my idea is more about to trap deception as a one-layer only.

3Cleo Nardo2y

This wouldn't work. Wawaluigis are not luigis.

The Waluigi Effect (mega-post)

Antoine de Scorraille2y20

Honest Why-not-just question: if the WE is roughly "you'll get exactly one layer of deception" (aka a Waluigi), why not just anticipate by steering through that effect? To choose an anti-good Luigi to get a good Waluigi?

4gwern2y

I'm not sure what you mean by that. In literary terms, would that just be an evil protagonist who may at some point have the twist of turning out to secretly be genuinely good? But there don't seem to be too many stories or histories like that, and the ones that start with evil protagonist usually end with that: villains like Hitler, Stalin, Mao, or Pol Pot don't suddenly redeem themselves spontaneously. (Stories where the villain is redeemed almost always start with a good Luigi/hero, like Luke Skywalker redeeming Darth Vader.) Can you name 3 examples which start solely with an 'anti-good Luigi' and end in a 'good Waluigi'? And if the probability of such a twist remains meaningful, that doesn't address the asymmetry: bad agents can be really bad, while good agents can do only a little good, and the goal is systems of 100% goodness with ~100% probability, not 99% badness and then maybe a short twist ending of goodness with 1% probability (even if that twist would ensure no additional layers of deception - deliberately instantiating an overtly evil agent just to avoid it being secretly evil would seem like burning down the village to save it).

Confusing the ideal for the necessary

Antoine de Scorraille2y30

This seems to generalize your recent Confusing the goal and the path take:

the goal -> the necessary: to achieve the goal, it's tautologically necessary to achieve it
the path -> the ideal one wants: to achieve the goal, one draws some ideal path to do so, confusing it with the former.

To quote your post:

For it conjures obstacles that were never there.

Or, recalling the wisdom of good old Donald Knuth:

Premature optimization is the root of all evil

2adamShimi2y

Good point! The difference is that the case explained in this post is one of the most sensible version of confusing the goal and the path, since there the path is actually a really good path. On the other version (like wanting to find a simple theory simply, the path is not even a good one!

Formalization as suspension of intuition

Antoine de Scorraille2y10

Agreed. It's the opposite assumption (aka no embeddedness) for which I wrote this; fixed.

Psychological Disorders and Problems

Antoine de Scorraille2y43

Cool! I like this as an example of difficult-to-notice problems are generally left unsolved. I'm not sure how serious this one is, though.

9adamShimi2y

Hot take: I would say that most optimization failures I've observed in myself and in others (in alignment and elsewhere) boil down to psychological problems.

Confusing the goal and the path

Antoine de Scorraille2y*10

And it feels like becoming a winner means consistently winning.

Reminds me strongly of the difficulty of accepting commitment strategies in decision theories as in Parfit's Hitchhiker: one gets the impression that a rational agent win-oriented should win in all situations (being greedy); but in reality, this is not always what winning looks like (optimal policy rather than optimal actions).

For it conjures obstacles that were never there.

Let's try apply this to a more confused topic. Risky. Recently I've slighty updated away from the mesa paradigm, rea... (read more)

Formalization as suspension of intuition

Antoine de Scorraille2y*133

Surprisingly, this echoes to me a recent thought: "frames blind us"; it comes to mind when I think of how traditional agent models (RL, game theory...) blinded us for years to their implicit assumption of no embeddedness. As with non-Euclidean shift, the change has come from relaxing the assumptions.
This post seems to complete your good old Traps of Formalization. Maybe it's a hegelian dialectic:
I) intuition domination (system 1)
II) formalization domination (system 2)
III) modulable formalization (improved system 1/cautious system 2)
Deconfusion is rea... (read more)

1TAG2y

Most things are embedded, so it's the right assumption to choose, if you have to choose one.

2adamShimi2y

That definitely feels right, with a caveat that is dear to Bachelard: this is a constant process of rectification that repeats again and again. There is no ending, or the ending is harder to find that what we think.

6Richard_Kennaway2y

See also Terry Tao on three levels of mathematical understanding.

If Wentworth is right about natural abstractions, it would be bad for alignment

Antoine de Scorraille2y*73

In my view you misunderstood JW's ideas, indeed. His expression "far away relevant"/"distance" is not limited to spatial or even time-spatial distance. It's a general notion of distance which is not fully formalized (work's not done yet).
We have indeed concerns about inner properties (like your examples), and it's something JW is fully aware. So (relevant) inner structures could be framed as relevant "far away" with the right formulation.

Latent Variables and Model Mis-Specification

Antoine de Scorraille2y10

Find $θ$ to maximize the predictive accuracy on the observed data, $\sum_{i = 1}^{z} l o g p_{θ} (y_{i}, x_{i})$ , where $p_{θ} (y_{i} | x_{i}) = \sum_{z} p_{θ} (y_{i}, z | x_{i})$ . Call the result $θ_{0}$ .

Isn't the z in the sum on the left a typo? I think it should be n

Latent Variables and Model Mis-Specification

Antoine de Scorraille2y10

Is the adversarial perturbation not, in itself, a mis-specification? If not, I would be glad to have your intuitive explanation of it.

Plans Are Predictions, Not Optimization Targets

Antoine de Scorraille2y30

Funny meta: I'm reading this just after finishing your two sequences about Abstraction, which I find very exciting! But surprise, your plan changes ! Did I read all that for nothing? Fortunately, I think it's mostly robust, indeed :)

What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

Antoine de Scorraille2y30

The difference (here) between "Heuristic" and "Cached-Solutions" seems to me analogous to the difference between lazy evaluation and memoization:

Lazy evaluation ~ Heuristic: aims to guide the evaluation/search by reducing its space.
Memoization ~ Cached Solutions: stores in memory the values/solutions already discovered to speed up the calculation.

AMA Conjecture, A New Alignment Startup

Antoine de Scorraille3y10

Ya I'll be there so I'd be glad to see you, especially Adam!

AMA Conjecture, A New Alignment Startup

Antoine de Scorraille3y10

We are located in London.

Great! Is there a co-working space or something? If so, where? Also, are you planning to attend EAG London as a team?

2Connor Leahy3y

We currently have a (temporary) office in the Southwark area, and are open to visitors. We’ll be moving to a larger office soon, and we hope to become a hub for AGI Safety in Europe. And yes! Most of our staff will be attending EAG London. See you there?

LESSWRONG
LW

All of Antoine de Scorraille's Comments + Replies