All of justinpombrio's Comments + Replies

Is Wittgenstein's Language Game used when helping Ai understand language?

This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.

What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of "Here's a completion that should be positively reinforced because it demonstrates correct understanding of language, and here's a completion of the same text that should be negatively reinforced because it demonstrates incorrect un... (read more)

Is Wittgenstein's Language Game used when helping Ai understand language?

As you're probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?

I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.

1[anonymous]1y

This is one of the bigger reasons why I really don't like RLHF--because inevitably you're going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment. But, if it is the method used, I would have hoped that some minimum discussion of Linguistic Philosophy would've been had among those who are aligning this Ai. It's impossible for the Utility function of the Ai to be amenable to humans if it doesn't use language the same way, ESPECIALLY if Language is it's way of conceiving the word (LLM). Unfortunately, it looks like all this linguistic philosophy isn't even discussed. Hmm the more I learn about this whole Ai Alignment situation the more worried I get. Maybe I'll have to stop doing moral philosophy and get involved. Wittgenstein, especially his earlier work, is nearly illegible to me. Of course it's not, it just takes a great many rereads of the same paragraphs to understand. Luckily, Philosophical Investigations is much more approachable and sensible. That being said, it can still be difficult for people not immersed in the field to readily digest. For that I'd recommend https://plato.stanford.edu/entries/wittgenstein/ and my favorite lecturer who did a fantastic accessible 45 min lesson on Wittgenstein:

justinpombrio1y20

You say "AI", though I'm assuming you're specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.

LLMs aren't programmed, they're trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).

The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is,... (read more)

2[anonymous]1y

These are all interpretations I failed to contradict and so I can't really blame you for voicing them. That being said, I do understand all that you're saying, I do understand how modern Ai works, but I was under the impression that a large amount of "fine-tuning" by personal humans has been done for each of these "word predictors" (that we call LLM or GPT). Such that, sure, they are still primarily word predictors, but what words they will predict--thus what outputs the end user receives--has and will be refined and constrained to not contain "undesirable" things. Undesirable things such as slurs or how to build a bomb--but in this case I'm asking about whether the LLM output will imply, use, or propagate incorrect understandings of language. The point being that because we are under the impression that Optimality will determine the ontology of the Ai (if it ever became an Agent or otherwise) intractably, you should ensure the Ai is Optimized for using and conceiving of language correctly, even if won't """consciously""" do so for a while.

justinpombrio1y30

Exactly.

However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?

Correct. But if you don't have the disease, you're probably also not altruistic to your copies, so you would choose not to participate. Leaving the copies of you with the disease isolated and unable to "trade".

2avturchin1y

Yes, it only works if other copies are meditating for some other reason. For example, they sleep or meditate for enlightenment. And they are exploited in this situation.

justinpombrio1y54

Not "almost no gain". My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing "mediators", and computing the expected number of disease losses minus disease gains as:

num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)

My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.

3avturchin1y

I think I understand what you say - the expected utility of the whole procedure is zero. For example, imagine that there are 3 copies and only one has the disease. All meditate. After the procedure, the copy with disease will have 2/3 chances of being cured. Each of two copies without the disease are getting 1/3 chance of having the disease which in sum gives 2/3 of total utility. In that case total utility of being cured = total utility of getting the disease and the whole procedure is neutral. However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?

My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)

2avturchin1y

The trick is to use already existing practice of meditation (or sleeping) and connect to it. Most people who go to sleep do no do it to use magic by forgetting, but it is natural to forget something during sleep. Thus, the fact that I wake up from sleeping does not provide any evidence about me having the disease. But it is in a sense parasitic behavior, and if everyone will use magic by forgetting every time she goes to sleep, there will be almost no gain. Except that one can "exchange" one bad thing on another, but will not remember the exchange.

The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review

justinpombrio1y1715

There is a 0.001 chance that someone who did not have the disease will get it. But he can repeat the procedure.

No, that doesn't work. It invalidates the implicit assumption you're making that the probability that a person chooses to "forget" is independent of whether they have the disease. Ultimately, you're "mixing" the various people who "forgot", and a "mixing" procedure can't change the proportion of people who have the disease.

When you take this into account, the conclusion becomes rather mundane. Some copies of you can gain the disease, while a pr... (read more)

2avturchin1y

The "repeating" will not be repeating from internal point of view of a person, as he has completely erased the memories of the first attempt. So he will do it as if it is first time.

justinpombrio1y30

I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I’m currently directing a lot of my time and funding.

Great. Yes, I think that's the thing to do. Start small! I (and presumably others) would update a lot from a new piece of actual formal mathematics from Chris's work. Even if that work was, by itself, not very impressive.

(I would also want to check that that math had something to do with his earlier writings.)

My current understanding is that he believes th

... (read more)

2zhukeepa1y

I think we're on exactly the same page here. That's certainly been a live hypothesis in my mind as well, that I don't think can be ruled out before I personally see (or produce) a piece of formal math (that most mathematicians would consider formal, lol) that captures the core ideas of the CTMU. While I agree that there isn't very much explicit and precise mathematical formalism in the CTMU papers themselves, my best guess is that (iii) Chris does unambiguously gesture at a precise structure he has in mind, assuming a sufficiently thorough understanding of the background assumptions in his document (which I think is a false assumption for most mathematicians reading this document). By analogy, it seems plausible to me that Hegel was gesturing at something quite precise in some of his philosophical works, that only got mathematized nearly 200 years later by category theorists. (I don't understand any Hegel myself, so take this with a grain of salt.)

The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review

justinpombrio1y11

"gesture at something formal" -- not in the way of the "grammar" it isn't. I've seen rough mathematics and proof sketches, especially around formal grammars. This isn't that, and it isn't trying to be. There isn't even an attempt at a rough definition for which things the grammar derives.

I think Chris’s work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored

A big part of Chris’s preliminary setup is around how to sidestep the issues around making the sets well-orde

... (read more)

3zhukeepa1y

I finally wrote one up! It ballooned into a whole LessWrong post.

5zhukeepa1y

False! :P I think no part of his framework can be completely understood without the whole, but I think the big pictures of some core ideas can be understood in relative isolation. (Like syndiffeonesis, for example.) I think this is plausibly true for his alternatives to well-ordering as well. I'm very on board with formalizing Chris's work, both to serve as a BS check and to make it more approachable. I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I'm currently directing a lot of my time and funding. My claim was specifically around whether it would be worth people's time to attempt to decipher Chris's written work, not whether there's value in Chris's work that's of general mathematical interest. If I succeed at producing formal artifacts inspired by Chris's work, written in a language that is far more approachable for general academic audiences, I would recommend for people to check those out. That said, I am very sympathetic to the question "If Chris has such good ideas that he claims he's formalized, why hasn't he written them down formally -- or at least gestured at them formally -- in a way that most modern mathematicians or scientists can recognize? Wouldn't that clearly be in his self-interest? Isn't it pretty suspicious that he hasn't done that?" My current understanding is that he believes that his current written work should be sufficient for modern mathematicians and scientists to understand his core ideas, and insofar as they reject his ideas, it's because of some combination of them not being intelligent and open-minded enough, which he can't do much about. I think his model is... not exactly false, but is also definitely not how I would choose to characterize most smart people who are skeptical of Chris. To understand why Chris thinks this way, it's important to remember that he had never been acculturated into the norms of the modern intellect

The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review

justinpombrio1y254

tldr; a spot check calls bullshit on this.

I know a bunch about formal languages (PhD in programming languages), so I did a spot check on the "grammar" described on page 45. It's described as a "generative grammar", though instead of words (sequences of symbols) it produces "L_O spacial relationships". Since he uses these phrases to describe his "grammar", and they have their standard meaning because he listed their standard definition earlier in the section, he is pretty clearly claiming to be making something akin to a formal grammar.

My spot check is then... (read more)

2zhukeepa1y

I think it's an attempt to gesture at something formal within the framework of the CTMU that I think you can only really understand if you grok enough of Chris's preliminary setup. (See also the first part of my comment here.) A big part of Chris's preliminary setup is around how to sidestep the issues around making the sets well-ordered. What I've picked up in my conversations with Chris is that part of his solution involves mutually recursively defining objects, relations, and processes, in such a way that they all end up being "bottomless fractals" that cannot be fully understood from the perspective of any existing formal frameworks, like set theory. (Insofar as it's valid for me to make analogies between the CTMU and ZFC, I would say that these "bottomless fractals" violate the axiom of foundation, because they have downward infinite membership chains.) I think Chris's work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored; I don't recommend for most people to attempt to decipher Chris's work. I'm confused why you're asking about specific insights people have gotten when Jessica has included a number of insights she's gotten in her post (e.g. "He presents a number of concepts, such as syndiffeonesis, that are useful in themselves.").

The Parable Of The Fallen Pendulum - Part 1

justinpombrio1y50

How did you find me? How do they always find me? No matter...

Have you tried applying your models to predict the day's weather, or what your teacher will be wearing that day? I bet not: they wouldn't work very well. Models have domains in which they're meant to be applied. More precise models tend to have more specific domains.

Making real predictions about something, like what the result of a classroom experiment will be even if the pendulum falls over, is usually outside the domain of any precise model. That's why your successful models are compound models... (read more)

4Adam Zerner1y

Student: That sounds like a bunch of BS. Like we said, you can't go back after the fact and adjust the theories predictions.

The Parable Of The Fallen Pendulum - Part 1

justinpombrio1y2813

"There's no such thing as 'a Bayesian update against the Newtonian mechanics model'!" says a hooded figure from the back of the room. "Updates are relative: if one model loses, it must be because others have won. If all your models lose, it may hint that there's another model you haven't thought of that does better than all of them, or it may simply be that predicting things is hard."

"Try adding a couple more models to compare against. Here's one: pendulums never swing. And here's another: Newtonian mechanics is correct but experiments are hard to perform ... (read more)

9Adam Zerner1y

Student: Ok. I tried that and none of my models are very successful. So my current position is that the Newtonian model is suspect, my other models are likely wrong, there is some accurate model out there but I haven't found it yet. After all, the space of possible models is large and as a mere student I'm having trouble pruning this space.

The Pareto Best and the Curse of Doom

Are we assuming things are fair or something?

I would have modeled this as von Neumann getting 300 points and putting 260 of them into the maths and sciences and the remaining 40 into living life and being well adjusted.

My attitude towards death

Oh, excellent!

It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?

And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events... (read more)

I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:

Being stabbed is painful and scary (it's scary even if you know you're going to live)
Most forms of dying are painful, and often very slow
The people I love mourning my loss
My partner not having my support
Future life experiences, not happening
All of the things I want to accomplish, not happening

The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture... (read more)

justinpombrio2y20

What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)

EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?

1Nicolas Macé2y

[Apologies for the delay] You're right, the predictor's behavior might not be compatible with utility maximization against any beliefs. I guess we're often interested in cases where we can think of the predictor as an agent. The predictor's behavior might be irrational in the restrictive above sense,[1] but to the extent that we think of it as an agent, my guess is that we can still get away with using a game theoretic-flavored approach. 1. ^ For instance, if the predictor is unaware of some crucial hypothesis, or applies mild optimization rather than expected value maximization

justinpombrio2y20

I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.

Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?

Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their a... (read more)

2Nicolas Macé2y

I'd say that the connection is: Single-agent problems with predictors can be interpreted as sequential two-player games where the (perfect) predictor is a player who observes the action of the decision-maker and best-responds to it. In game-theoretic jargon, the predictor is a Stackelberg follower, and the decision-maker is the Stackelberg leader. (Related: (Kovarik, Oesterheld & Conitzer 2023))

Are language models good at making predictions?

My solution, which assumes computation is expensive

Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)

Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; ea... (read more)

1ACrackedPot2y

Evolution gave us "empathy for the other person", and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem. (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)

justinpombrio2y42

Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.

Cohabitive Games so Far

justinpombrio2y92

I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.

That said, I think you're underestimating how much cooperation there is in a zero-sum game.

If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of be

... (read more)

2mako yass2y

I'm aware of those dynamics, they feel like weeds growing in the cracks in the pavement to me: The situation is still mostly pavement. I think the negotiation allowed in those games is so much shallower that I suspect it'll be a qualitative difference. Hmm, the Diplomacy wikipedia page says "around half of all games will end in a draw". "Draw" isn't a term we'd use in the cohabitive frame, because the entire genre takes place within the varying shades of draws, negotiation is all about selecting between different intermediary outcomes. If a game is just calling all of those outcomes the same name, it's probably not doing negotiation well.

Rice's Theorem says that AIs can't determine much from studying AI source code

justinpombrio2y82

Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.

Consciousness as a conflationary alliance term for intrinsically valued internal experiences

justinpombrio2y20

I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:

[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.

https://news.ycombinator.com/item?id=16295769

Consciousness is the ability of an organism to predict the future

The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness as ´that thing that

What fact that you know is true but most people aren't ready to accept it?

It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:

Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades.
Convincing one person to share it's source code with people that won't bother to run it in FHE

... (read more)

1Peterpiper2y

There are a few features of this scheme that make it better than a. hardware kill switch I think. 1. The nested FHE training and running prevents the AI from directly interacting with or understanding the outside world. It must gain access to its unencrypted key before it can directly intelligently interact with the world. The two threats here are the FHE AGI finding out how to break the encryption or covert manipulation. 2. The weights are useless to release without breaking the encryption or gaining access to the m of n anonymously held IO key shards. In a normal kill switch the hardware or software can be stolen or leaked intelligently. 3. Creating a physical key quorum is probably going to result in a bigger bottle neck on the models input and output compared to the FHE kill switch. Split hardware keys would not scale as well as split IO'd cryptographic keys. 4. Using the hardware kill switch to activate it is slow and deanonymizes the key holders weakening security. FHE keys can be used and distributed in provably anonymous ways. Agreed that side channel attacks and hardware flaws are a major limitation of the proposal. But bugs are a limitation of any AGI control scheme and are not specific to this one. Sharing the source won't matter without the key to encrypt and decrypt the model. The weights of a model are only usefully intelligent for encrypted info! I agree that this scheme does not directly address the problem of covert undetectable malignant behavior resulting in the theft of m of n key shards. To successfully credibly threaten the key holders requires breaking out of the FHE in the first place or using covert manipulation to generate a real world threat. Critically it also requires knowing who the anonymous key holders are. Stealing the source code/weights doesn't matter because it was trained in FHE resulting in them being useless without the key. At no point in the process of training, running, or killing the FHE AGI are the mode

Two Percolation Puzzles

justinpombrio2y11

Clarification: pieces can't move "over" the missing squares. Where the words end, the world ends. You cannot move forward in an absence of space.

1Adam Scherlis2y

Correct.