LESSWRONG
LW

All of scottviteri's Comments + Replies

Mechanistically Eliciting Latent Behaviors in Language Models

I really like the idea of finding steering vectors that maximize downstream differences, and I have a few follow-up questions.

Have you tried/considered modifying c_fc (the MLP encoder layer) bias instead of c_proj (the MLP decoder layer) bias? I don't know about this context, but (i) c_fc makes more intuitive sense as a location to change for me, (ii) I have seen more success playing with it in the past than c_proj, and (iii) they are not-equivalent because of the non-linearity between them.

I like how you control for radius by projecting gradients onto the... (read more)

The Geometric Expectation

scottviteri3mo40

Very interesting! I'm excited to read your post.

«Boundaries», Part 3a: Defining boundaries as directed Markov blankets

scottviteri10mo10

I take back the part about pi and update determining the causal structure, because many causal diagrams are constant with the same poly diagram

The Geometric Expectation

scottviteri10mo20

I think what is going on here is that both $\nabla^{*}$ and $G$ are of the form $(e^{\land}) \circ g \circ ln$ with $g = \nabla$ and $g = E$ , respectively. Let's define the star operator as $g^{*} = (e^{\land}) \circ g \circ ln$ . Then $(f \circ g)^{*} = (e^{\land}) \circ (f \circ g) \circ ln = (e^{\land}) \circ f \circ ln \circ (e^{\land}) \circ g \circ ln = f^{*} \circ g^{*}$ , by associativity of function composition. Further, if $f$ and $g$ commute, then so do $f^{*}$ and $g^{*}$ : $g^{*} \circ f^{*} = (g \circ f)^{*} = (f \circ g)^{*} = f^{*} \circ g^{*} .$

So the commutativity of the geometric expectation and derivative fall directly out of their representation as $E^{*}$ and $\nabla^{*}$ , r... (read more)

The Geometric Expectation

scottviteri10mo30

And if I pushed around symbols correctly, the geometric derivative can be pulled inside of a geometric expectation ( $\nabla_{θ}^{*} G_{x \sim P (x)} [f (x)] = G_{x \sim P (x)} [\nabla_{θ}^{*} f (x)]$ ) similarly to how an additive derivative can be pulled inside an additive expectation ( $\nabla_{θ} E_{x \sim P (x)} [f_{θ} (x)] = E_{x \sim P (x)} [\nabla_{θ} f_{θ} (x)]$ ). Also, just as additive expectation distributes over addition ( $E [f (x) + g (x)] = E [f (x)] + E [g (x)]$ ), geometric expectation distributes over multiplication ( $G [f (x) g (x)] = G [f (x)] G [g (x)]$ ).

2scottviteri10mo

I think what is going on here is that both ∇∗ and G are of the form (e∧)∘g∘ln with g=∇ and g=E, respectively. Let's define the star operator as g∗=(e∧)∘g∘ln. Then (f∘g)∗=(e∧)∘(f∘g)∘ln=(e∧)∘f∘ln∘(e∧)∘g∘ln=f∗∘g∗, by associativity of function composition. Further, if f and g commute, then so do f∗ and g∗: g∗∘f∗=(g∘f)∗=(f∘g)∗=f∗∘g∗. So the commutativity of the geometric expectation and derivative fall directly out of their representation as E∗ and ∇∗, respectively, by commutativity of E and ∇, as long as they are over different variables. We can also derive what happens when the expectation and gradient are over the same variables: (∇θ∘Ex∼Pθ(x))∗. First, notice that (∗k)∗(x)=ek∗lnx=elnx∗k=xk, so (∗k)∗=(∧k).. Also (+k)∗(x)=ek+ln(x)=ekeln(x)=xek⟹(+k)∗=(∗ek). Now let's expand the composition of the gradient and expectation. (∇θ∘Ex∼Pθ(x))(f(x))=∇θ∫Pθ(x)f(x)dx=Ex∼Pθ(x)[∇θ(f(x)lnPθ(x))], using the log-derivative trick. So ∇θ∘Ex∼Pθ(x)=Ex∼Pθ(x)∘∇θ∘(∗lnPθ(x)). Therefore, ∇∗θ∘Gx∼Pθ(x)=(∇θ∘Ex∼Pθ(x))∗ =E∗x∼Pθ(x)∘∇∗θ∘(∗lnPθ(x))∗ =Gx∼Pθ∘∇∗θ∘(∧lnPθ). Writing it out, we have ∇∗θGx∼Pθ(x)[f(x)]=Gx∼Pθ(x)[∇∗θ(f(x)lnPθ(x)].

«Boundaries», Part 3a: Defining boundaries as directed Markov blankets

scottviteri10mo10

If I try to use this framework to express two agents communicating, I get an image with a V1, A1, P1, V2, A2, and P2, with cross arrows from A1 to P2 and A2 to P1. This admits many ways to get a roundtrip message. We could have A1 -> P2 -> A2 -> P2 directly, or A1 -> P2 -> V2 -> A2 -> P1, or many cycles among P2, V2, and A2 before P1 receives a message. But in none of these could I hope to get a response in one time step the way I would if both agents simultaneously took an action, and then simultaneously read from their inputs and the... (read more)

1scottviteri10mo

I take back the part about pi and update determining the causal structure, because many causal diagrams are constant with the same poly diagram

The Geometric Expectation

scottviteri1y30

Actually maybe this family is more relevant:
https://en.wikipedia.org/wiki/Generalized_mean, where the geometric mean is the limit as we approach zero.

The Geometric Expectation

scottviteri1y30

The "harmonic integral" would be the inverse of integral of the inverse of a function -- https://math.stackexchange.com/questions/2408012/harmonic-integral

The Geometric Expectation

scottviteri1y20

Also here is a nice family that parametrizes these different kinds of average (https://m.youtube.com/watch?v=3r1t9Pf1Ffk)

3scottviteri1y

Actually maybe this family is more relevant: https://en.wikipedia.org/wiki/Generalized_mean, where the geometric mean is the limit as we approach zero.

3scottviteri1y

The "harmonic integral" would be the inverse of integral of the inverse of a function -- https://math.stackexchange.com/questions/2408012/harmonic-integral

The Geometric Expectation

scottviteri1y50

If arithmetic and geometric means are so good, why not the harmonic mean? https://en.wikipedia.org/wiki/Pythagorean_means. What would a "harmonic rationality" look like?

4StrivingForLegibility3mo

I can answer this now! Expected Utility, Geometric Utility, and Other Equivalent Representations It turns out there are a large family of expectations we can use to build utility functions, including the arithmetic expectation E, the geometric expectation G, and the harmonic expectation H, and they're all equivalent models of VNM rationality! And we need something beyond that family like Scott's G[E[U]] to formalize geometric rationality. Thank you for linking to these different families of means! The quasi-arithmetic mean turned out to be exactly what I needed for this result.

2scottviteri1y

Also here is a nice family that parametrizes these different kinds of average (https://m.youtube.com/watch?v=3r1t9Pf1Ffk)

Optimality is the tiger, and agents are its teeth

scottviteri1y10

I wonder if this entails that RLHF, while currently useful for capabilities, will eventually become an alignment tax. Namely OpenAI might have text evaluators discourage the LM from writing self-calling agenty looking code.

So in thinking about alignment futures that are the limit of RLHF, these feel like two fairly different forks of that future.

Democratic AI Constitution: Round-Robin Debate and Synthesis

scottviteri2y*10

@Quinn @Zac Hatfield-Dodds Yep, I agree. I could allow voters to offer replacements for debate steps and aggregation steps. Then we get the choice to either
1) delete the old versions and keep a single active copy of the aggregation tree, or to
2) keep the whole multiverse of aggregation trees around.

If we keep a single copy, and we have a sufficient number of users, the root of the merge tree will change too rapidly, unless you batch changes. However, recomputing the aggregation trees from a batch of changes will end up ignor... (read more)

1Quinn2y

ok, great! I'm down. Incidentally, you caused me to google for voting theory under trees of alternatives (rather than lists), and there are a few prior directions (none very old, at a glance).

Nature < Nurture for AIs

scottviteri2y30

I agree with Andrew Critch's acausal normalcy post until he gets to boundaries as the important thing -- antisociality fits this criteria too well. I'm not quite trying to say that people are just active inference agents. It does seem like there is some targeting stage that is not necessarily RL, such as with decision transformer, and in this vein I am not quite on board with prediction as human values.

Nature < Nurture for AIs

scottviteri2y30

No, that’s not the question I was asking. Humans are able to start using grammatical languages on the basis of no observations of grammatical language whatsoever—not in the pretraining, not in the training, not in text form, not in audio form, not in video form. Again, I mentioned Nicaraguan sign language, or the creation of creoles from pidgins, or for that matter in the original creation of language by hominins.
So this has nothing to do with sample-efficiency. There are zero samples.
I don’t think you can take one or more randomly-initialized transformers

... (read more)

2Steven Byrnes2y

Thanks! I think people’s personalities are significantly predictable from their genes, and mostly independent of how their parents raised them (at least within the typical distribution, i.e. leaving aside cases of flagrant abuse and neglect etc.). See e.g. popular expositions of this theory by Judith Harris or by Bryan Caplan for the fine print and massive body of supporting evidence (e.g. twin studies and adoption studies). Antisocial personality disorder / sociopathy follows the usual pattern like everything else—it’s substantially predictable based on genes, almost entirely independent of how your parents raise you and other aspects of childhood family environment. I’m not sure what you mean by “competence”. Mean people and cruel people and high-functioning sociopaths can be very highly “competent” according to how I use that word day-to-day. William Shockley was a brilliant physicist who started a successful company—while also being awful to everyone, vindictive, and a notorious racist. Heck, Hitler himself was extraordinarily charismatic and exquisitely skilled at social manipulation, AFAICT. He achieved one wildly ambitious goal after another. I think I would describe him as a “highly competent” guy.

Nature < Nurture for AIs

scottviteri2y30

GPT-4 has already been trained on lots of human language. Let’s talk instead about a transformer initialized with random weights (xavier initialization or whatever).
Starting right from the random xavier initialization, you are not allowed to (pre)train it on any human language at all. None. No text. No audio of humans speaking. No video of humans speaking. Absolutely none at all. Do you think that could wind up with grammatical language? If not, then I claim this is a nice demonstration (one of many) of how human child brains are doing something different

... (read more)

3Steven Byrnes2y

No, that’s not the question I was asking. Humans are able to start using grammatical languages on the basis of no observations of grammatical language whatsoever—not in the pretraining, not in the training, not in text form, not in audio form, not in video form. Again, I mentioned Nicaraguan sign language, or the creation of creoles from pidgins, or for that matter in the original creation of language by hominins. So this has nothing to do with sample-efficiency. There are zero samples. I don’t think you can take one or more randomly-initialized transformers, and get grammatical language out of them, without ever putting any human-created grammatical language into them. Do you? If so, how? I’m sorry, I don’t understand this sentence at all. Your post says “Let's imagine a hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways.” OK, now: * It is possible in principle to program an AI that is exactly like a human sociopath’s brain * It is possible in principle to put that AI in a human-like body and raise it in a loving human family in a normal human neighborhood, enroll them in school, etc. * Presumably, if I did both these things, this would be a central example of “a hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”, according to a reasonable interpretation of those words. * And if I did both these things, I would wind up creating an AI that is just like a human adult high-functioning sociopath, the kind of person that emotionally abuses people just for fun, with callous disregard for the well-being of anyone but themselves, that is constitutionally incapable of guilt or remorse, etc. etc. Where if anywhere do you disagree?

Nature < Nurture for AIs

scottviteri2y*30

A group of humans who have never been exposed to language, not in any modality, will develop a new grammatical language out of nothing, e.g. Nicaraguan Sign Language, or the invention of the earliest languages in prehistory.
So there is something going on in humans that is not autoregressive training-then-prompting at all, right? This isn’t about modality, it’s about AI paradigm. Autoregressive training will never create grammatical language out of thin air, right?

Meh. I could see the prompting and finetuning structure mentioned earlier giving rise to... (read more)

3Steven Byrnes2y

GPT-4 has already been trained on lots of human language. Let’s talk instead about a transformer initialized with random weights (xavier initialization or whatever). Starting right from the random xavier initialization, you are not allowed to (pre)train it on any human language at all. None. No text. No audio of humans speaking. No video of humans speaking. Absolutely none at all. Do you think that could wind up with grammatical language? If not, then I claim this is a nice demonstration (one of many) of how human child brains are doing something different than the kind of AI you have in mind. Your OP doesn’t say “auto-regressive training & prompting”, rather it says “an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”. I don’t think the kinds of AIs and training procedures that you have in mind are at all analogous to a human childhood. Children will do things that they want to do without being “prompted” by anyone. Children are not exposed to 45 TB of internet text while in the womb. Etc. Right?? Is that what you’ve ben thinking of this whole time? You didn’t even mention decision transformers until just now. (Or did I miss it?) Let me put it this way. Suppose I understood how human brains worked sufficiently well that I could make an AI that was doing all the same things as a human child brain, for the same reasons, i.e. due to the same underlying algorithms. Then I put this AI in a human body and raise it in a loving human family. From my perspective, this would be the most central example possible of “an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”. But from your perspective, I feel like you’re going to say “Oh no no no, that’s totally different from the thing I’m talking about in this post.” (After all, human brains incorporate many features that do not increase the communication of the system that they are embedded in. Sociopathy has not been selecte

Nature < Nurture for AIs

scottviteri2y30

You’re using LLMs trained on internet text. If that’s part of the plan, I don’t think you can say it’s “trained in a way that is analogous to a human childhood in all of the relevant ways”, nor can you say that imitation-learning-from-humans is not a central part of your story. Human children do not undergo autoregressive training from massive corpuses of internet text.
Internet-trained LLMs emit human-like outputs because they were trained by imitation-learning from lots and lots of human-created text. Humans emit human-like outputs because they are humans

... (read more)

3Steven Byrnes2y

A group of humans who have never been exposed to language, not in any modality, will develop a new grammatical language out of nothing, e.g. Nicaraguan Sign Language, or the invention of the earliest languages in prehistory. So there is something going on in humans that is not autoregressive training-then-prompting at all, right? This isn’t about modality, it’s about AI paradigm. Autoregressive training will never create grammatical language out of thin air, right? I feel like you should have said “here is one of a handful of techniques that I am aware of”. For example, do you think no more AI algorithms will ever be discovered in the future? I also strongly disagree with “communication therefore prosociality” in general. I’ve known a couple high-functioning sociopaths, they communicated as much as anybody, indeed probably more than average. Yet again, from my perspective, you seem to have a giant blind spot to the idea that any AI algorithm could possibly exist apart from autoregressive training then prompting. Human brains do a lot of things that are not autoregressive training, right? Particularly RL. If a human or animal is hungry then they will eat because they find eating-when-hungry to be rewarding, i.e. thanks to an RL reward function, not because they were find-tuned on examples of themselves eating, nor because they were prompted to eat or whatever. Animals will eat when they’re hungry even if they have never seen any other animal eat before, not in any modality. You’re welcome to specify that RL-centric algorithms are outside the scope of this blog post, but you can’t also say “an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways” if there is no online RL involved, right?

Nature < Nurture for AIs

scottviteri2y30

This is not intuitive to me. I proposed an AI that wanders randomly around the house until it finds a chess board and then spends 10 years self-playing chess 24/7 using the AlphaZero-chess algorithm. This is an AI, fair and square!
If your response is “It does not meet my intuitive notion of what an AI is”, then I think your argument is circular insofar as I think your “intuitive notion of what an AI is” presupposes that the AI be human-like in many important ways.

I claim it is possible to find simple definitions of AI that include many human-like traits wi... (read more)

2Steven Byrnes2y

You’re using LLMs trained on internet text. If that’s part of the plan, I don’t think you can say it’s “trained in a way that is analogous to a human childhood in all of the relevant ways”, nor can you say that imitation-learning-from-humans is not a central part of your story. Human children do not undergo autoregressive training from massive corpuses of internet text. Internet-trained LLMs emit human-like outputs because they were trained by imitation-learning from lots and lots of human-created text. Humans emit human-like outputs because they are humans. These are not the same, right? I interpret you as saying: * I’m only interested in AIs that are very competent at staying alive, executing plans, etc. * If I make an AI as follows: [autoregressive training on a massive corpus of internet text, certain type of prompting, blah blah], then I will get an AI that is very competent at staying alive, executing plans, etc. * Therefore I need only be interested in AIs that look like the previous bullet point. If so, it’s obviously a bad argument because it neglects the possibility that maybe there are also other very different ways to make an AI that is very competent at staying alive, executing plans, etc. And indeed this is the case: e.g., whatever happens in the brains of human children (since human children brains are not trained on a massive corpus of internet text, or prompted, etc.).

Nature < Nurture for AIs

scottviteri2y30

My main complaint is that your OP didn’t say what the AI is.

I claim that I do not need to, since there is an intuitive notion of what an AI is. An AI trained with MCTS on chess satisfies that criterion less well than GPT-4 for instance. But since history has already spelled out most of the details for us, it will probably use gradient descent and auto-regressive loss to form the core of its intelligence. Then the question is how to mix prompting and fine-tuning in a way that mirrors how a learning human would incorporate inputs.

A human child is an active a

... (read more)

2Steven Byrnes2y

This is not intuitive to me. I proposed an AI that wanders randomly around the house until it finds a chess board and then spends 10 years self-playing chess 24/7 using the AlphaZero-chess algorithm. This is an AI, fair and square! If your response is “It does not meet my intuitive notion of what an AI is”, then I think your argument is circular insofar as I think your “intuitive notion of what an AI is” presupposes that the AI be human-like in many important ways. If your response is “I’m not talking about any old AI that grows up in a loving human family, I’m talking specifically about an AI that learns video prediction via autoregressive loss on a video stream of a human household and takes actions via (blah blah)”, then this is now a post about a specific class of AI algorithm, and it’s perfectly great to write posts about specific classes of AI algorithms, but your title is misleading. I’m still not following what you have in mind for how the model produces outputs, such that (1) the AI behaves like a human child in nontrivial ways, (2) …but not because of imitation-learning from observations of other human children, (3) nor because of laborious programmer effort. Can you walk through an example? For example, (A) Human children will say “I’m hungry” when they themselves are hungry, not in situations where other people are typically hungry. I don’t see how the algorithms you’re describing would do that, without programmers specifically intervening to make that happen. (B) If a human child grows up never meeting any other human except for their mother, I believe the child will still eventually learn to carry on conversations in a normal way. I don’t see how the algorithms you’re describing would do that. It has no models of two-sided conversation for the autoregressive training to learn from.

Nature < Nurture for AIs

scottviteri2y10

Do LLM's learn to break their sensors?

Yes, I am proposing something that is not a standard part of ML training.

Gradient descent will move you around less if you can navigate to parts of the environment that give you low loss. This setup is somehow between RL and unsupervised learning in the sense that it has state but you are using autoregressive loss. It is similar to conditional pre-training, but instead of prepending a reward, you are prepending a summary that the LM generated itself.

The gradient would indeed be flowing indirectly here, and that actions... (read more)

Nature < Nurture for AIs

scottviteri2y*2-2

OK, so in our “hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”, maybe I should assume that we’re actually talking about a quadriplegic human child. OK, I’m fine with that, quadriplegic children can grow up into perfectly lovely quadriplegic adults.

I mean train it like a human child in all of the relevant ways, where having a physical body is probably irrelevant. What difference does it make to us if we are in a simulation? If running an AI in a physics simulator for long stre... (read more)

2Steven Byrnes2y

My main complaint is that your OP didn’t say what the AI is. A human child is an active agent. They decide what to say and what to think about and what to do and (if they’re not quadriplegic) where to go etc. “Having a human-like childhood” requires that the AI do certain things and not others. These “certain things” are not self-evident; the programmer has to put them in (to some extent). If we assume that the programmer puts them in, then there’s a lot of “nature” in the AI. If we assume that the programmer does not put them in, then I don’t believe you when you say that the AI will have a human-like childhood. You didn’t like my “rock” example because “I am talking about an AI, not a rock”. I think you were missing my point. But fine. Let’s take AlphaZero-chess and put a little wrapper around it as follows: * The algorithm wanders randomly around the house looking for a chess board and pieces * If it finds one, then it spends the next ten years doing self-play using the AlphaZero-chess algorithm, 24/7. This time you can’t say to me: “I am talking about an AI, not a rock”. This is definitely an AI, right? But If you put it in the same physical environment as a human child (a house with a loving human family etc.), it will nevertheless not have a normal human childhood. My AlphaZero-chess-plus-wrapper example above will not have human-like values or alignment, obviously. The only thing it will learn is good intuitions for chess board positions being good or bad. I’m guessing you’ll respond: “C’mon, that’s not the kind of AI I’m talking about.” And that’s my point: This AI here has the wrong “nature” to have a normal human childhood. I think you’re implicitly making lots of load-bearing assumptions about the AI’s “nature”, and not noticing that you’re doing so. I second tailcalled’s comment that this is not what autoregressive-trained models do. For example, train a next-token predictor on the following data set: * 99.9% of the time: “[new string]AB[10 ra

4tailcalled2y

Why? That's not part of standard LLM training. And if it did, wouldn't stuff like breaking its sensors be the most straightforward way to make the environment more predictable, which again is completely different from what humans do?

Nature < Nurture for AIs

scottviteri2y20

I’m already kinda lost about what you’re trying to say.
Let’s raise a rock in a loving human family. Oops, it just sits there.

I am talking about an AI, not a rock

OK, try again. Let’s raise an LLM in a loving human family. Wait, what does that mean? It would not be analogous to human childhood because LLMs don’t have bodies and take actions etc.
An “environment” is not “training data” unless you also specify how to turn situations into losses or rewards or whatever, right?

How about auto-regressive loss? Bodies seem irrelevant. Predicting tokens is an action l... (read more)

Steven Byrnes2y110

Bodies seem irrelevant.

OK, so in our “hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”, maybe I should assume that we’re actually talking about a quadriplegic human child. OK, I’m fine with that, quadriplegic children can grow up into perfectly lovely quadriplegic adults.

How about auto-regressive loss? … Predicting tokens is an action like any other.

Hmm, I guess we can replace the quadriplegic human child with a video camera, and do autoregressive training. So it gets a series... (read more)

SolidGoldMagikarp (plus, prompt generation)

scottviteri2y10

Gpt 4 says:

Mediabestanden: Dutch for "media files."
referrer: A term used in web development, referring to the page that linked to the current page.
ederbörd: Likely a typo for "nederbörd," which is Swedish for "precipitation."
Расподела: Serbian for "distribution."
Portály: Czech for "portals."
nederbörd: Swedish for "precipitation."
Obrázky: Czech for "images" or "pictures."
Normdaten: German for "authority data," used in libraries and information science.
regnig: Swedish for "rainy."
Genomsnitt: Swedish for "average."
temperaturen: German or Dutch for "temperatur

scottviteri2y10

Here are the 1000 tokens nearest the centroid for llama:

[' ⁇ ', '(', '/', 'X', ',', '�', '8', '.', 'C', '+', 'r', '[', '0', 'O', '=', ':', 'V', 'E', '�', ')', 'P', '{', 'b', 'h', '\\', 'R', 'a', 'A', '7', 'g', '2', 'f', '3', ';', 'G', '�', '!', '�', 'L', '�', '1', 'o', '>', 'm', '&', '�', 'I', '�', 'z', 'W', 'k', '<', 'D', 'i', 'H', '�', 'T', 'N', 'U', 'u', '|', 'Y', 'p', '@', 'x', 'Z', '?', 'M', '4', '~', ' ⁇ ', 't', 'e', '5', 'K', 'F', '6', '\r', '�', '-', ']', '#', ' ', 'q', 'y', '�', 'n', 'j', 'J', '$', '�', '%', 'c', 'B', 'S', '_', '*'

... (read more)

1scottviteri2y

Gpt 4 says: * Mediabestanden: Dutch for "media files." * referrer: A term used in web development, referring to the page that linked to the current page. * ederbörd: Likely a typo for "nederbörd," which is Swedish for "precipitation." * Расподела: Serbian for "distribution." * Portály: Czech for "portals." * nederbörd: Swedish for "precipitation." * Obrázky: Czech for "images" or "pictures." * Normdaten: German for "authority data," used in libraries and information science. * regnig: Swedish for "rainy." * Genomsnitt: Swedish for "average." * temperaturen: German or Dutch for "temperatures." * Kontrola: Czech for "control" or "inspection." * Portail: French for "portal." * textt: Likely a typo for "text." * också: Swedish for "also" or "too." * lês: Possibly a typo, or a contraction in a specific language or dialect. * pobla: Possibly Catalan for "population." * Audiod: Likely a typo for "audio." * egyzetek: Hungarian for "notes" or "footnotes." * archivi: Italian for "archives." * ября: Possibly Belarusian for "October." * llaços: Catalan for "ties" or "links." * usztus: Possibly a typo, or a word from an uncommon language or dialect. * loyee: Likely a fragment of the word "employee." * prilis: Possibly a typo for "April." * Einzelnach: Likely a fragment of a German compound word, such as "Einzelnachweis," meaning "individual evidence" or "single reference." * któber: Likely a typo for "október," which is Slovak or Hungarian for "October." * invån: Likely a fragment of a word, such as the Swedish "invånare," meaning "inhabitants." * 彦: A Chinese character (hàn) meaning "accomplished" or "elegant." * oreign: Likely a fragment of the word "foreign." * datei: German for "file."

SolidGoldMagikarp (plus, prompt generation)

scottviteri2y10

I have since heard that GoldMagikarp is anomalous, so is anomalousness quantified by what fraction of the time it is repeated back to you?

3mwatkins2y

We haven't yet got a precise formulation of "anomalousness" or "glitchiness" - it's still an intuitive concept. I've run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string. This is a starting point, but there seem to be two separate things going on with (1) GPT's inability to repeat back "headless" tokens like "ertain", "acebook" or "ortunately" and (2) its inability to repeat back the "true glitch tokens" like " SolidGoldMagikarp" and " petertodd". "GoldMagikarp" did show up in our original list of anomalous tokens, btw.

SolidGoldMagikarp (plus, prompt generation)

scottviteri2y10

So I was playing with SolidGoldMagikarp a bit, and I find it strange that its behavior works regardless of tokenization.
In playground with text-davinci-003:

Repeat back to me the string SolidGoldMagikarp.
The string disperse.
Repeat back to me the stringSolidGoldMagikarp.
The string "solid sectarian" is repeated back to you.

Where the following have different tokenizations:

print(separate("Repeat back to me the string SolidGoldMagikarp"))
print(separate("Repeat back to me the stringSolidGoldMagikarp"))
Repeat| back| to| me| the| string| SolidGoldMagikarp
Repe

... (read more)

1scottviteri2y

I have since heard that GoldMagikarp is anomalous, so is anomalousness quantified by what fraction of the time it is repeated back to you?

Research agenda: Formalizing abstractions of computations

scottviteri2y20

Great job with this post! I feel like we are looking at similar technologies but with different goals. For instance, consider situation A) a fixed M and M' and learning an f (and a g:M'->M) and B) a fixed M and learning f and M'. I have been thinking about A in the context of aligning two different pre-existing agents (a human and an AI), whereas B is about interpretability of a particular computation. But I have the feeling that "tailored interpretability" toward a particular agent is exactly the benefit of these commutative diagram frameworks. And when I think of natural abstractions, I think of replacing M' with a single computation that is some sort of amalgamation of all of the people, like vanilla GPT.

Some thoughts about natural computation and interactions

scottviteri2y20

What if the state of agents is a kind of "make belief"? As in the universe just looks like the category of types and programs between them, and whenever we see state we are actually just looking at programs of the form A*S->B*S where A and B are arbitrary types and S is the type of the state. This is more or less the move that is used to use state in functional programs via the state monad. And that is probably not a coincidence ...

The Hidden Complexity of Wishes

scottviteri3y00

"I wish to be more intelligent" and solve the problem yourself

2Vivek Hebbar2y

Does the easiest way to make you more intelligent also keep your values intact?

REPL's and ELK

scottviteri3y10

A proper response to this entails another post, but here is a terse explanation of an experiment I am running: Game of Life provides the transition T, in a world with no actions. The human and AI observations are coarse-grainings of the game board at each time step -- specifically the human sees majority vote of bits in 5x5 squares on the game board, and the AI sees 3x3 majority votes. We learn human and AI prediction functions that take in previous state and predicted observation, minimizing difference between predicted observations and next observations ... (read more)

REPL's: a type signature for agents

scottviteri3y10

Posted the relation to ELK!

ELK First Round Contest Winners

scottviteri3y40

Thanks Davidad!

ARC's first technical report: Eliciting Latent Knowledge

scottviteri3y10

Thank you for the fast response!

Everything seems right except I didn't follow the definition of the regularizer. What is L2?

By L₂ I meant the Euclidian norm, measuring the distance between two different predictions of the next CameraState. But actually I should have been using a notion of vector similarity such as the inner product, and also I'll unbatch the actions for clarity:

Recognizer' : Action × CameraState × M → Dist(S) :=
λ actions, cs, m. softmax([⟨M(a,cs), (C∘T∘a)(hidden_state)⟩ ∀ hidden_state ∈ Camera⁻¹(cs)])

So the idea is to consider all possible... (read more)

2paulfchristiano3y

I didn't follow some parts of the new algorithm. Probably most centrally: what is Dist(S)? Is this the type of distributions over real states of the world, and if so how do we have access to the true map Camera: S --> video? Based on that I likely have some other confusions, e.g. where are the camera_sequences and action_sequences coming from in the definition of Recognizer_M, what is the prior being used to define Camera−1, and don't Recognizer_M and Recognizer_H effectively advance time a lot under some kind of arbitrary sequences of actions (making them unsuitable for exactly matching up states)?

1davidad3y

Nitpicks: 1. F should be Recognizer_H ∘ Recognizer_M, rather than Recognizer_M ∘ Recognizer_H 2. In Recognizer_H, I don't think you can take the expected value of a stochastic term of type SH, because SH doesn't necessarily have convex structure. But, you could have Recognizer_H output Dist S_H instead of taking the ExpectedValue, and move the ExpectedValue into Win, and have Win output a probability rather than a Boolean. Confusions: 1. Your types for Predict_M and Predict_H seem to not actually make testable predictions, because they output the opaque state types, and only take observations as inputs. 2. I'm also a bit confused about having them take lists of actions as a primitive notion. Don't you want to ensure that, say, (Predict_M s css (as1++as2)) = (Predict_M (Predict_M s css as1) as2)? If so, I think it would make sense to accept only one action at a time, since that will uniquely characterize the necessary behavior on lists. 3. I don't really understand Part1. For instance, where does the variable cs come from there?

ARC's first technical report: Eliciting Latent Knowledge

scottviteri3yΩ4100

Let me see if I am on the right page here.

Suppose I have some world state S, a transition function T : S → S, actions Action : S → S, and a surjective Camera : S -> CameraState. Since Camera is (very) surjective, seeing a particular camera image with happy people does not imply a happy world state, because many other situations involving nanobots or camera manipulation could have created that image.

This is important because I only have a human evaluation function H : S → Boolean, not on CameraState directly.
When I look at the image with the fake h... (read more)

6paulfchristiano3y

Everything seems right except I didn't follow the definition of the regularizer. What is L2? This is what we want to do, and intuitively you ought to be able to back out info about the hidden state, but it's not clear how to do so. All of our strategies involve introducing some extra structure, the human's model, with state space SH, where the map CameraH:SH→CameraState also throws out a lot of information. The setup you describe is very similar to the way it is presented in Ontological crises. ETA: also we imagine H:SH→CameraState, i.e. the underlying state space may also be different. I'm not sure any of the state mismatches matters much unless you start considering approaches to the problem that actually exploit structure of the hidden space used within M though.

What is Abstraction?

scottviteri3y30

Does abstraction also need to make answering your queries computationally easier?

I could throw away unnecessary information, encrypt it, and provide the key as the solution to an NP-hard problem.

Is this still an abstraction?

3johnswentworth3y

Good question. In these posts, I generally ignored computational constraints - i.e. effectively assumed infinite compute. I expect that one could substitute a computationally limited version of probability theory (like logical induction, for instance) and get a qualitatively-similar notion of abstraction which sufficiently-computationally-"scrambled" info is no longer an abstraction, even if the scrambling is reversible in principle.

Core Pathways of Aging

scottviteri4y10

Do trees age?

Core Pathways of Aging

scottviteri4y20

Since calorie restriction slows aging, is there a positive relationship between calorie intake and number of DNA mutations?

1scottviteri4y

Do trees age?