All of justinpombrio's Comments + Replies

Let's apply some data to this!

I've been in two high-stakes bad-vibe situations. (In one of them, someone else initially got the bad vibes, but I know enough details to comment on it.) In both cases, asking around would have revealed the issue. However, in both cases the people who knew the problematic person well, had either a good impression of them, or a very bad impression of them. Because there's a pattern where someone who's problematic in some way is also charismatic, or good at making up for it in other ways, etc. So my very rough model of these sit... (read more)

3Said Achmiz
OP talked about someone asking you on a date. The suggested strategy was about mitigating problems that might be encountered when going on a date. An analogous strategy for a long-term relationship might be something like “establish boundaries, ensure that the relationship does not crowd out contact with your friends, regularly check in with friends/family, talk to trusted confidantes about problems in the relationship to get a third-party opinion”, etc. “This solution to problem X doesn’t also solve problem Y” is not a strike against said solution. P.S.: The anecdotes are useful, but “data” is one thing they definitely aren’t.

Or in the words of Sean Carroll's Poetic Naturalism:

  1. There are many ways of talking about the world.
  2. All good ways of talking must be consistent with one another and with the world.
  3. Our purposes in the moment determine the best way of talking.

A "way of talking" is a map, and "the world" is the territory.

The orthogonality thesis doesn't say anything about intelligences that have no goals. It says that an intelligence can have any specific goal. So I'm not sure you've actually argued against the orthogonality thesis.

1Donatas Lučiūnas
My proposition - intelligence will only seek power. I approached this from "intelligence without a goal" angle, but if we started with "intelligence with a goal" we would come to the same conclusion (most of the logic is reusable). Don't you think? This part I would change to

And English has it backwards. You can see the past, but not the future. The thing which just happened is most clear. The future comes at us from behind.

Here's the reasoning I intuitively want to apply:

where X = "you roll two 6s in a row by roll N", Y = "you roll at least two 6s by roll N", and Z = "the first N rolls are all even".

This is valid, right? And not particularly relevant to the stated problem, due to the "by roll N" qualifiers mucking up the statements in complicated ways?

2Shelby Stryker
It is the "cringe" feeling I believe. Its embarrassment on behalf of the bad joke not landing. I could also be irritation that your brain didn't get the reward it was anticipating. 

Sure. For simplicity, say you play two rounds of Russian Roulette, each with a 60% chance of death, and you stop playing if you die. What's the expected value of YouAreDead at the end?

  • With probability 0.6, you die on the first round
  • With probability 0.4*0.6 = 0.24, you die on the second round
  • With probability 0.4*0.4=0.16, you live through both rounds

So the expected value of the boolean YouAreDead random variable is 0.84.

Now say you're monogamous and go on two dates, each with a 60% chance to go well, and if they both go well then you pick one person a... (read more)

Probability of at least two success: ~26%

My point is that in some situations, "two successes" doesn't make sense. I picked the dating example because it's cute, but for something more clear cut imagine you're playing Russian Roulette with 10 rounds each with a 10% chance of death. There's no such thing as "two successes"; you stop playing once you're dead. The "are you dead yet" random variable is a boolean, not an integer.

0Anders Lindström
Yes. But I think you have mixed up expected value and expected utility. Please show your calculations.

If you're monagamous and go to multiple speed dating events and find two potential partners, you end up with one partner. If you're polyamorous and do the same, you end up with two partners.

One way to think of it is whether you will stop trying after the first success. Though that isn't always the distinguishing feature. For example, you might start 10 job interviews at the same time, even though you'll take at most one job.

0Anders Lindström
No, I think you are mixing the probability of at least one success in ten trails (with a 10% chance per trail), which is ~0.65=65%, with the expected value which is n=1 in both cases. You have the same chance of finding 1 partner in each case and you do the same number of trails. There is a 65% chance that you have at least 1 success in the 10 trails for each type of partner. The expected outcome in BOTH cases is 1 as in n=1 not 1 as in 100% Probability of at least one success: ~65% Probability of at least two success: ~26%

However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.

For the easier to work out case of doing something with a 50% success rate 2 times:

  • 25% chance of 0 successes
  • 50% chance of 1 success
  • 25% chance of 2 successes

Gives an average of 1 success.

Of course this only matters for the sort of thing where 2 successes is better than 1 success:

  • 10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation.
  • 10% chance of finding a polyamorous partner 10 times yields 1.00
... (read more)
1Anders Lindström
Why would is the expectation to find a polyamorous partner be higher in the case you gave? Same chance per try and same number of tries should equal same expectation.

IQ over median does not correlate with creativity over median

That's not what that paper says. It says that IQ over 110 or so (quite above median) correlates less strongly (but still positively) with creativity. In Chinese children, age 11-13.

1João Ribeiro Medeiros
Correlation value over IQ at 100 seems to be already well under the variance so not really meaningful, and if you look at what the researchers call Originality, the correlation is actually negative over IQ 110.  Just as a correction to your comment, I am not stating this as an adamant fact, but as an "indication" not a "demonstration", I said: "indicated by recent research" I understand the reference I pointed out has a limited scope (Chinese children, age 11-13), as any research of this kind, but beyond the rigorous scientific demonstration of this concept, I am expressing the fact that IQ tests are very incomplete, which is not novel.  Thank you for your response. 

And for a visceral description of a kind of bullying that's plainly bad, read the beginning of Worm: https://parahumans.wordpress.com/2011/06/11/1-1/

1Alexej Gerstmaier
Thanks for linking, I love Worm

I double-downvoted this post (my first ever double-downvote) because it crosses a red line by advocating for verbal and physical abuse of a specific group of people.

Alexej: this post gives me the impression that you started with a lot of hate and went looking for justifications for it. But if you have some real desire for truth seeking, here are some counterarguments:

1Alexej Gerstmaier
Hi Justin, I already read both the posts you linked there. My desire for Truth is overwhelmingly strong, I would change my stance if anyone would present some actual counter-arguments that go beyond the surface level. Will give longer rebuttal later, am currently on vacation in Spain 🤝
3justinpombrio
And for a visceral description of a kind of bullying that's plainly bad, read the beginning of Worm: https://parahumans.wordpress.com/2011/06/11/1-1/

Yeah, I think “computational irreducibility” is an intuitive term pointing to something which is true, important, and not-obvious-to-the-general-public. I would consider using that term even if it had been invented by Hitler and then plagiarized by Stalin :-P

Agreed!

OK, I no longer claim that. I still think it might be true

No, Rice's theorem is really not applicable. I have a PhD in programming languages, and feel confident saying so.

Let's be specific. Say there's a mouse named Crumbs (this is a real mouse), and we want to predict whether Crumbs will... (read more)

3Steven Byrnes
Oh sorry, when I said “it might be true” just above, I meant specifically: “it might be true that ‘computational irreducibility’ and Rice’s theorem are the same thing”. But after a bit more thought, and finding a link to a clearer statement of what “computational irreducibility” is supposed to mean, I agree with you that they’re pretty different. Anyway, I have now deleted all mention of Rice’s theorem, and also added a link to this very short proof that computationally-irreducible programs exist at all. Thanks very much :)

Rice’s theorem (a.k.a. computational irreducibility) says that for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see.

Rice's theorem says nothing of the sort. Rice's theorem says:

For every semantic property P,
For every program Q that purports to check if an arbitrary program has property P,
There exists a program R such that Q(R) is incorrect:
    Either P holds of R but Q(R) returns false,
    or P does not hold of R but Q(R) returns true

Notice that the tricky program R that's causing your... (read more)

3Steven Byrnes
I was gonna say that you’re nitpicking, but actually, I do want this post to be correct in detail and not just in spirit. So I edited the post. Thanks. :) OK, I no longer claim that. I still think it might be true, at least based on skimming the wikipedia article, but I’m not confident, so I shouldn’t say it. Maybe you know more than me. Oh well, it doesn’t really matter. Yeah, I think “computational irreducibility” is an intuitive term pointing to something which is true, important, and not-obvious-to-the-general-public. I would consider using that term even if it had been invented by Hitler and then plagiarized by Stalin :-P

I think we’re in agreement on everything.

Excellent. Sorry for thinking you were saying something you weren't!

still not have an answer to whether it’s spinning clockwise or counterclockwise

More simply (and quite possibly true), Nobuyuki Kayahara rendered it spinning either clockwise or counterclockwise, lost the source, and has since forgotten which way it was going.

I like “veridical” mildly better for a few reasons, more about pedagogy than anything else.

That's a fine set of reasons! I'll continue to use "accurate" in my head, as I already fully feel that the accuracy of a map depends on which territory you're choosing for it to represent. (And a map can accurately represent multiple territories, as happens a lot with mathematical maps.)

Another reason is I’m trying hard to push for a two-argument usage

Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing y

... (read more)
3Steven Byrnes
I think we’re in agreement on everything. Yup, or as I wrote: “2D pattern of changing pixels on a flat screen”. For what it’s worth, even if that’s true, it’s still at least possible that we could view both the 3D model and the full source code, and yet still not have an answer to whether it’s spinning clockwise or counterclockwise. E.g. perhaps you could look at the source code and say “this code is rotating the model counterclockwise and rendering it from the +z direction”, or you could say “this code is rotating the model clockwise and rendering it from the -z direction”, with both interpretations matching the source code equally well. Or something like that. That’s not necessarily the case, just possible, I think. I’ve never coded in Flash, so I wouldn’t know for sure. Yeah this is definitely a side track. :) Nice find with the website, thanks.

This is fantastic! I've tried reasoning along these directions, but never made any progress.

A couple comments/questions:

Why "veridical" instead of simply "accurate"? To me, the accuracy of a map is how well it corresponds to the territory it's trying to map. I've been replacing "veridical" with "accurate" while reading, and it's seemed appropriate everywhere.

Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at. [...] after all, nothing in the real world of atoms is rotating in 3D.

... (read more)
3Steven Byrnes
Thanks! :) Accurate might have been fine too. I like “veridical” mildly better for a few reasons, more about pedagogy than anything else. One reason is that “accurate” has a strong positive-valence connotation (i.e., “accuracy is good, inaccuracy is bad”), which is distracting, since I’m trying to describe things independently of whether they’re good or bad. I would rather find a term with a strictly neutral vibe. “Veridical”, being a less familiar term, is closer to that. But alas, I notice from your comment that it still has some positive connotation. (Note how you said “being unfair”, suggesting a frame where I said the intuition was non-veridical = bad, and you’re “defending” that intuition by saying no it’s actually veridical = good.) Oh well. It’s still a step in the right direction, I think. Another reason is I’m trying hard to push for a two-argument usage (“X is or is not a veridical model of Y“), rather than a one-argument usage (“X is or is not veridical”). I wasn’t perfect about that. But again, I think “accurate” makes that problem somewhat worse. “Accurate” has a familiar connotation that the one-argument usage is fine because of course everybody knows what is the territory corresponding to the map. “Veridical” is more of a clean slate in which I can push people towards the two-argument usage. Another thing: if someone has an experience that there’s a spirit talking to them, I would say “their conception of the spirit is not a veridical model of anything in the real world”. If I said “their conception of the spirit is not an accurate model of anything in the real world”, that seems kinda misleading, it’s not just a matter of less accurate versus more accurate, it’s stronger than that.  It was made by a graphic artist. I’m not sure their exact technique, but it seems at least plausible to me that they never actually created a 3D model. Some people are just really good at art. I dunno. This seems like the kind of thing that shouldn’t matter though!

Very curious what part of this people think is wrong.

4hairyfigment
I don't see how any of it can be right. Getting one algorithm to output Spongebob wouldn't cause the SI to watch Spongebob -even a less silly claim in that vein would still be false. The Platonic agent would know the plan wouldn't work, and thus wouldn't do it. Since no individual Platonic agent could do anything meaningful alone, and they plainly can't communicate with each other, they can only coordinate by means of reflective decision theory. That's fine, we'll just assume that's the obvious way for intelligent minds to behave. But then the SI works the same way, and knows the Platonic agents will think that way, and per RDT it refuses to change its behavior based on attempts to game the system. So none of this ever happens in the first place. (This is without even considering the serious problems with assuming Platonic agents would share a goal to coordinate on. I don't think I buy it. You can't evolve a desire to come into existence, nor does an arbitrary goal seem to require it. Let me assure you, there can exist intelligent minds which don't want worlds like ours to exist.)

Here's a simple argument that simulating universes based on Turing machine number can give manipulated results.

Say we lived in a universe much like this one, except that:

  • The universe is deterministic
  • It's simulated by a very short Turing machine
  • It has a center, and
  • That center is actually nearby! We can send a rocket to it.

So we send a rocket to the center of the universe and leave a plaque saying "the answer to all your questions is Spongebob". Now any aliens in other universes that simulate our universe and ask "what's in the center of that universe... (read more)

1justinpombrio
Very curious what part of this people think is wrong.

The feedback is from Lean, which can validate attempted formal proofs.

This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.

What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of "Here's a completion that should be positively reinforced because it demonstrates correct understanding of language, and here's a completion of the same text that should be negatively reinforced because it demonstrates incorrect un... (read more)

As you're probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?

I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.

1[anonymous]
This is one of the bigger reasons why I really don't like RLHF--because inevitably you're going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment. But, if it is the method used, I would have hoped that some minimum discussion of Linguistic Philosophy would've been had among those who are aligning this Ai. It's impossible for the Utility function of the Ai to be amenable to humans if it doesn't use language the same way, ESPECIALLY if Language is it's way of conceiving the word (LLM). Unfortunately, it looks like all this linguistic philosophy isn't even discussed.  Hmm the more I learn about this whole Ai Alignment situation the more worried I get. Maybe I'll have to stop doing moral philosophy and get involved.    Wittgenstein, especially his earlier work, is nearly illegible to me. Of course it's not, it just takes a great many rereads of the same paragraphs to understand.  Luckily, Philosophical Investigations is much more approachable and sensible. That being said, it can still be difficult for people not immersed in the field to readily digest. For that I'd recommend https://plato.stanford.edu/entries/wittgenstein/ and my favorite lecturer who did a fantastic accessible 45 min lesson on Wittgenstein:

You say "AI", though I'm assuming you're specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.

LLMs aren't programmed, they're trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).

The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is,... (read more)

2[anonymous]
These are all interpretations I failed to contradict and so I can't really blame you for voicing them.    That being said, I do understand all that you're saying, I do understand how modern Ai works, but I was under the impression that a large amount of "fine-tuning" by personal humans has been done for each of these "word predictors" (that we call LLM or GPT).  Such that, sure, they are still primarily word predictors, but what words they will predict--thus what outputs the end user receives--has and will be refined and constrained to not contain "undesirable" things. Undesirable things such as slurs or how to build a bomb--but in this case I'm asking about whether the LLM output will imply, use, or propagate incorrect understandings of language.    The point being that because we are under the impression that Optimality will determine the ontology of the Ai (if it ever became an Agent or otherwise) intractably, you should ensure the Ai is Optimized for using and conceiving of language correctly, even if won't """consciously""" do so for a while. 

However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?

Correct. But if you don't have the disease, you're probably also not altruistic to your copies, so you would choose not to participate. Leaving the copies of you with the disease isolated and unable to "trade".

2avturchin
Yes, it only works if other copies are meditating for some other reason. For example, they sleep or meditate for enlightenment. And they are exploited in this situation.

Not "almost no gain". My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing "mediators", and computing the expected number of disease losses minus disease gains as:

num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)

My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.

3avturchin
I think I understand what you say - the expected utility of the whole procedure is zero.  For example, imagine that there are 3 copies and only one has the disease. All meditate. After the procedure, the copy with disease will have 2/3 chances of being cured. Each of two copies without the disease are getting 1/3 chance of having the disease which in sum gives 2/3 of total utility. In that case total utility of being cured = total utility of getting the disease and the whole procedure is neutral. However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?

My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)

2avturchin
The trick is to use already existing practice of meditation (or sleeping) and connect to it. Most people who go to sleep do no do it to use magic by forgetting, but it is natural to forget something during sleep. Thus, the fact that I wake up from sleeping does not provide any evidence about me having the disease.  But it is in a sense parasitic behavior, and if everyone will use magic by forgetting every time she goes to sleep,  there will be almost no gain. Except that one can "exchange" one bad thing on another, but will not remember the exchange. 

There is a 0.001 chance that someone who did not have the disease will get it. But he can repeat the procedure.

No, that doesn't work. It invalidates the implicit assumption you're making that the probability that a person chooses to "forget" is independent of whether they have the disease. Ultimately, you're "mixing" the various people who "forgot", and a "mixing" procedure can't change the proportion of people who have the disease.

When you take this into account, the conclusion becomes rather mundane. Some copies of you can gain the disease, while a pr... (read more)

2avturchin
The "repeating" will not be repeating from internal point of view of a person, as he has completely erased the memories of the first attempt. So he will do it as if it is first time. 

I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I’m currently directing a lot of my time and funding.

Great. Yes, I think that's the thing to do. Start small! I (and presumably others) would update a lot from a new piece of actual formal mathematics from Chris's work. Even if that work was, by itself, not very impressive.

(I would also want to check that that math had something to do with his earlier writings.)

My current understanding is that he believes th

... (read more)
2zhukeepa
I think we're on exactly the same page here.  That's certainly been a live hypothesis in my mind as well, that I don't think can be ruled out before I personally see (or produce) a piece of formal math (that most mathematicians would consider formal, lol) that captures the core ideas of the CTMU.  While I agree that there isn't very much explicit and precise mathematical formalism in the CTMU papers themselves, my best guess is that (iii) Chris does unambiguously gesture at a precise structure he has in mind, assuming a sufficiently thorough understanding of the background assumptions in his document (which I think is a false assumption for most mathematicians reading this document). By analogy, it seems plausible to me that Hegel was gesturing at something quite precise in some of his philosophical works, that only got mathematized nearly 200 years later by category theorists. (I don't understand any Hegel myself, so take this with a grain of salt.) 

"gesture at something formal" -- not in the way of the "grammar" it isn't. I've seen rough mathematics and proof sketches, especially around formal grammars. This isn't that, and it isn't trying to be. There isn't even an attempt at a rough definition for which things the grammar derives.

I think Chris’s work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored

A big part of Chris’s preliminary setup is around how to sidestep the issues around making the sets well-orde

... (read more)
3zhukeepa
I finally wrote one up! It ballooned into a whole LessWrong post. 
5zhukeepa
False! :P I think no part of his framework can be completely understood without the whole, but I think the big pictures of some core ideas can be understood in relative isolation. (Like syndiffeonesis, for example.) I think this is plausibly true for his alternatives to well-ordering as well.  I'm very on board with formalizing Chris's work, both to serve as a BS check and to make it more approachable. I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I'm currently directing a lot of my time and funding.  My claim was specifically around whether it would be worth people's time to attempt to decipher Chris's written work, not whether there's value in Chris's work that's of general mathematical interest. If I succeed at producing formal artifacts inspired by Chris's work, written in a language that is far more approachable for general academic audiences, I would recommend for people to check those out.  That said, I am very sympathetic to the question "If Chris has such good ideas that he claims he's formalized, why hasn't he written them down formally -- or at least gestured at them formally -- in a way that most modern mathematicians or scientists can recognize? Wouldn't that clearly be in his self-interest? Isn't it pretty suspicious that he hasn't done that?"  My current understanding is that he believes that his current written work should be sufficient for modern mathematicians and scientists to understand his core ideas, and insofar as they reject his ideas, it's because of some combination of them not being intelligent and open-minded enough, which he can't do much about. I think his model is... not exactly false, but is also definitely not how I would choose to characterize most smart people who are skeptical of Chris.  To understand why Chris thinks this way, it's important to remember that he had never been acculturated into the norms of the modern intellect

tldr; a spot check calls bullshit on this.

I know a bunch about formal languages (PhD in programming languages), so I did a spot check on the "grammar" described on page 45. It's described as a "generative grammar", though instead of words (sequences of symbols) it produces "L_O spacial relationships". Since he uses these phrases to describe his "grammar", and they have their standard meaning because he listed their standard definition earlier in the section, he is pretty clearly claiming to be making something akin to a formal grammar.

My spot check is then... (read more)

2zhukeepa
I think it's an attempt to gesture at something formal within the framework of the CTMU that I think you can only really understand if you grok enough of Chris's preliminary setup. (See also the first part of my comment here.) A big part of Chris's preliminary setup is around how to sidestep the issues around making the sets well-ordered. What I've picked up in my conversations with Chris is that part of his solution involves mutually recursively defining objects, relations, and processes, in such a way that they all end up being "bottomless fractals" that cannot be fully understood from the perspective of any existing formal frameworks, like set theory. (Insofar as it's valid for me to make analogies between the CTMU and ZFC, I would say that these "bottomless fractals" violate the axiom of foundation, because they have downward infinite membership chains.) I think Chris's work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored; I don't recommend for most people to attempt to decipher Chris's work.  I'm confused why you're asking about specific insights people have gotten when Jessica has included a number of insights she's gotten in her post (e.g. "He presents a number of concepts, such as syndiffeonesis, that are useful in themselves."). 

How did you find me? How do they always find me? No matter...

Have you tried applying your models to predict the day's weather, or what your teacher will be wearing that day? I bet not: they wouldn't work very well. Models have domains in which they're meant to be applied. More precise models tend to have more specific domains.

Making real predictions about something, like what the result of a classroom experiment will be even if the pendulum falls over, is usually outside the domain of any precise model. That's why your successful models are compound models... (read more)

4Adam Zerner
Student: That sounds like a bunch of BS. Like we said, you can't go back after the fact and adjust the theories predictions.

"There's no such thing as 'a Bayesian update against the Newtonian mechanics model'!" says a hooded figure from the back of the room. "Updates are relative: if one model loses, it must be because others have won. If all your models lose, it may hint that there's another model you haven't thought of that does better than all of them, or it may simply be that predicting things is hard."

"Try adding a couple more models to compare against. Here's one: pendulums never swing. And here's another: Newtonian mechanics is correct but experiments are hard to perform ... (read more)

9Adam Zerner
Student: Ok. I tried that and none of my models are very successful. So my current position is that the Newtonian model is suspect, my other models are likely wrong, there is some accurate model out there but I haven't found it yet. After all, the space of possible models is large and as a mere student I'm having trouble pruning this space.

Are we assuming things are fair or something?

I would have modeled this as von Neumann getting 300 points and putting 260 of them into the maths and sciences and the remaining 40 into living life and being well adjusted.

Oh, excellent!

It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?

And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events... (read more)

I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:

  • Being stabbed is painful and scary (it's scary even if you know you're going to live)
  • Most forms of dying are painful, and often very slow
  • The people I love mourning my loss
  • My partner not having my support
  • Future life experiences, not happening
  • All of the things I want to accomplish, not happening

The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture... (read more)

What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)

EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?

1Nicolas Macé
[Apologies for the delay] You're right, the predictor's behavior might not be compatible with utility maximization against any beliefs. I guess we're often interested in cases where we can think of the predictor as an agent. The predictor's behavior might be irrational in the restrictive above sense,[1] but to the extent that we think of it as an agent, my guess is that we can still get away with using a game theoretic-flavored approach.  1. ^ For instance, if the predictor is unaware of some crucial hypothesis, or applies mild optimization rather than expected value maximization

I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.

Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?

Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their a... (read more)

2Nicolas Macé
I'd say that the connection is: Single-agent problems with predictors can be interpreted as sequential two-player games where the (perfect) predictor is a player who observes the action of the decision-maker and best-responds to it. In game-theoretic jargon, the predictor is a Stackelberg follower, and the decision-maker is the Stackelberg leader. (Related: (Kovarik, Oesterheld & Conitzer 2023))

My solution, which assumes computation is expensive

Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)

Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; ea... (read more)

1ACrackedPot
Evolution gave us "empathy for the other person", and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem.  (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)

Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.

I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.

That said, I think you're underestimating how much cooperation there is in a zero-sum game.

If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of be

... (read more)
2mako yass
I'm aware of those dynamics, they feel like weeds growing in the cracks in the pavement to me: The situation is still mostly pavement. I think the negotiation allowed in those games is so much shallower that I suspect it'll be a qualitative difference. Hmm, the Diplomacy wikipedia page says "around half of all games will end in a draw". "Draw" isn't a term we'd use in the cohabitive frame, because the entire genre takes place within the varying shades of draws, negotiation is all about selecting between different intermediary outcomes. If a game is just calling all of those outcomes the same name, it's probably not doing negotiation well.

Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.

I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:

[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.

https://news.ycombinator.com/item?id=16295769

Consciousness is the ability of an organism to predict the future

The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness as ´that thing that

... (read more)

It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:

  • Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades.
  • Convincing one person to share it's source code with people that won't bother to run it in FHE
... (read more)
1Peterpiper
There are a few features of this scheme that make it better than a. hardware kill switch I think.  1. The nested FHE training and running prevents the AI from directly interacting with or understanding the outside world. It must gain access to its unencrypted key before it can directly intelligently interact with the world. The two threats here are the FHE AGI finding out how to break the encryption or covert manipulation. 2. The weights are useless to release without breaking the encryption or gaining access to the m of n anonymously held IO key shards. In a normal kill switch the hardware or software can be stolen or leaked intelligently.  3. Creating a physical key quorum is probably going to result in a bigger bottle neck on the models input and output compared to the FHE kill switch. Split hardware keys would not scale as well as split IO'd cryptographic keys.  4. Using the hardware kill switch to activate it is slow and deanonymizes the key holders weakening security. FHE keys can be used and distributed in provably anonymous ways.    Agreed that side channel attacks and hardware flaws are a major limitation of the proposal. But bugs are a limitation of any AGI control scheme and are not specific to this one.  Sharing the source won't matter without the key to encrypt and decrypt the model. The weights of a model are only usefully intelligent for encrypted info! I agree that this scheme does not directly address the problem of covert undetectable malignant behavior resulting in the theft of m of n key shards.  To successfully credibly threaten the key holders requires breaking out of the FHE in the first place or using covert manipulation to generate a real world threat. Critically it also requires knowing who the anonymous key holders are.  Stealing the source code/weights doesn't matter because it was trained in FHE resulting in them being useless without the key. At no point in the process of training, running, or killing the FHE AGI are the mode

Clarification: pieces can't move "over" the missing squares. Where the words end, the world ends. You cannot move forward in an absence of space.

1Adam Scherlis
Correct.

Woah, woah, slow down. You're talking about the edge cases but have skipped the simple stuff. It sounds like you think it's obvious, or that we're likely to be on the same page, or that it should be inferrable from what you've said? But it's not, so please say it.

Why is growing up so important?

Reading between the lines, are you saying that the only reason that it's bad for a human baby to be in pain is that it will eventually grow into a sapient adult? If so: (i) most people, including myself, both disagree and find that view morally reprehensible, (ii) th... (read more)

2Vladimir_Nesov
Human babies and cats are sentient but not sapient. Human children and adults, if not severely mentally disabled, are both sentient and sapient. I think this is the standard usage. A common misusage of "sentient" is to use it in the sense of sapient, saying "lizard people are sentient", while meaning "lizard people are sapient" (they are sentient as well, but saying that they are sapient is an additional claim with a different meaning, for which it's better to have a different word). Sapients are AGI-level sentients, with some buffer for less functional variants (like children). Sapients are centrally people, framed from a more functional standpoint. Some hypothetical AGIs might be functionally sapient without being sentient, able to optimize the world without being people themselves. I think AGI-level LLM characters are not like that. Uplifting, not uploading. Uploading preserves behavior, uplifting changes behavior by improving intelligence or knowledge, while preserving identity/memory/personality. Uplifting doesn't imply leaving the biological substrate, though doing both seems natural in this context.

By far the biggest and most sudden update I've ever had is Dominion, a documentary on animal farming:

https://www.youtube.com/watch?v=LQRAfJyEsko

It's like... I had a whole pile of interconnected beliefs, and if you pulled on one it would snap most of the way back into place after. And Dominion pushed the whole pile over at once.

2Vladimir_Nesov
The salient analogy for me is if animals (as in bigger mammals, not centrally birds or rats) are morally more like babies or more like characters in a novel. In all three cases, there is no sapient creature yet, and there are at least hypothetical processes of turning them into sapient creatures. For babies, it's growing up, and it already works. For characters in a novel and animals, it's respectively instantiating them as AGI-level characters in LLMs and uplifting (in an unclear post-singularity way). The main difference appears to be status quo, babies are already on track to grow up. While instantiation of characters from a novel or uplifting of animals look more like a free choice, not something that happens by default (unless it's morally correct to do that; probably not for all characters from all novels, but possibly for at least some animals). So maybe if the modern factory farmed animals were not going to be uplifted (which cryonics would in principle enable, but also AI timelines are short), it's morally about as fine as writing a novel with tortured characters? Unclear. Like, I'm tentatively going to treat my next cat as potentially a person, since it's somewhat likely to encounter the singularity.
7Said Achmiz
What was the update? In what direction?

Meta comment: I'm going to be blunt. Most of this sequence has been fairly heavily downvoted. That reads to me as this community asking to not have more such content. You should consider not posting, or posting elsewhere, or writing many fewer posts of much higher quality (e.g. spending more time, doing more background research, asking someone to proofread). As a data point, I've only posted a couple times, and I spent at least, I dunno, 10+ hours writing each post. As an example of how this might apply to you, if you wrote this whole sequence as a single "reference on biases" and shared that, I bet it would be better received.

Load More