All of Adele Lopez's Comments + Replies

It's a great case, as long as you assume that AIs will never be beyond our control, and ignore the fact that humans have a metabolic minimum wage.

Could you tell them afterwards that it was just an experiment, that the experiment is over, that they showed admirable traits (if they did), and otherwise show kindness and care?

I think this would make a big difference to humans in an analogous situation. At the very least, it might feel more psychologically healthy for you.

nielsrolf3110

If LLMs are moral patients, there is a risk that every follow-up message causes the model to experience the entire conversation again, such that saying "I'm sorry I just made you suffer" causes more suffering.

6JMiller
I definitely agree with this last point! I've been on the providing end of similar situations with people in cybersecurity education of all sorts of different technical backgrounds. I've noticed that both the tester and the "testee" (so to speak) tend to have a better and safer experience when the cards are compassionately laid out on the table at the end. It's even better when the tester is able to genuinely express gratitude toward the testee for having taught them something new, even unintentionally. 

I don't disagree that totalitarian AI would be real bad. It's quite plausible to me that the "global pause" crowd are underweighting how bad it would be.


I think an important crux here is on how bad a totalitarian AI would be compared to a completely unaligned AI. If you expect a totalitarian AI to be enough of an s-risk that it is something like 10 times worse than an AI that just wipes everything out, then racing starts making a lot more sense.

I think mostly we're on the same page then? Parents should have strong rights here, and the state should not.

I think that there's enough variance within individuals that my rule does not practically restrict genomic liberty much, while making it much more palatable to the average person. But maybe that's wrong, or it still isn't worth the cost.

Your rule might for example practically prevent a deaf couple from intentionally having a child who is deaf but otherwise normal. E.g. imagine if the couple's deafness alleles also carry separate health risks, but

... (read more)
6TsviBT
This should be true for any trait that is highly polygenic and that we know many associated variants for, yeah. IDK, but if I had to make a guess I would guess that it's quite rare but does occur. Another sort of example might be: say there's a couple whose child will likely get some disease D. Maybe the couple has a very high genetic predisposition for D that can't be attenuated enough using GE, or maybe it's a non-genetic disease that's transmissible. And say there's a rare variant that protects against D (which neither parent has). It would be a risk, and potentially a consent issue, to experiment with editing in the rare variant; but it might be good all things considered. (If this sounds like sci-fi, consider that this is IIUC exactly the scenario that happened with the first CRISPR-edited baby! In that case there were multiple methodological issues, and the edit itself might have been a bad idea even prospectively, but the background scenario was like that.)

However, the difference is especially salient because the person deciding isn't the person that has to live with said genes. The two people may have different moral philosophies and/or different risk preferences.

A good rule might be that the parents can only select alleles that one or the other of them have, and also have the right to do so as they choose, under the principle that they have lived with it. (Maybe with an exception for the unambiguously bad alleles, though even in that case it's unlikely that all four of the parent's alleles are the delet... (read more)

4TsviBT
My current guess at the best regulatory stance--the one that ought to be acceptable to basically everyone, and that would result in good outcomes--is significantly more permissive, i.e. giving more genomic liberty to the parents. There should be rights to not genomically engineer at all, or to only GE along certain dimensions; and rights to normalize, or to propagate ones genes or traits, or to benefit the child, or to benefit others altruistically. Your rule might for example practically prevent a deaf couple from intentionally having a child who is deaf but otherwise normal. E.g. imagine if the couple's deafness alleles also carry separate health risks, but there are other deafness alleles that the couple does not have but that lead to deafness without other health risks. I still haven't fully thought through the consent objection, though. Restrictions on genomic liberty should be considered very costly: they break down walls against eugenics-type forces (i.e. forces on people's reproduction coming from state/collective power, and/or aimed at population targets). Like with other important values, this isn't 100% absolute. E.g. parents shouldn't be allowed to GE their children in order to make their children suffer a lot, or in a way that has a very high risk of creating a violent psychopath. But every such restriction rightfully invokes a big pile of "Wait, who decides what counts as a 'good' allele or as a 'disease'?". https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies?commentId=ZeranH3yDBGWNxZ7h

What else did he say? (I'd love to hear even the "obvious" things he said.)

I'm ashamed to say I don't remember. That was the highlight. I think I have some notes on the conversation somewhere and I'll try to remember to post here if I ever find it.

I can spell out the content of his Koan a little, if it wasn't clear. It's probably more like: look for things that are (not there). If you spend enough time in a particular landscape of ideas, you can (if you're quiet and pay attention and aren't busy jumping on bandwagons) get an idea of a hole, which you're able to walk around but can't directly see. In this way new ideas appear as s... (read more)

Adele LopezΩ487

Thank you for doing this research, and for honoring the commitments.

I'm very happy to hear that Anthropic has a Model Welfare program. Do any of the other major labs have comparable positions?

To be clear, I expect that compensating AIs for revealing misalignment and for working for us without causing problems only works in a subset of worlds and requires somewhat specific assumptions about the misalignment. However, I nonetheless think that a well-implemented and credible approach for paying AIs in this way is quite valuable. I hope that AI companies and

... (read more)
Answer by Adele LopezΩ24-1

Well, I'm very forgetful, and I notice that I do happen to be myself so... :p

But yeah, I've bitten this bullet too, in my case, as a way to avoid the Boltzmann brain problem. (Roughly: "you" includes lots of information generated by a lawful universe. Any specific branch has small measure, but if you aggregate over all the places where "you" exist (say your exact brain state, though the real thing that counts might be more or less broad than this), you get more substantial measure from all the simple lawful universes that only needed 10^X coincidences to m... (read more)

2James Camacho
I consider "me" to be a mapping from environments to actions, and weigh others by their KL-divergence from me.

I don't doubt that LLMs could do this, but has this exact thing actually been done somewhere?

3Martin Randall
I've not read the paper but something like https://arxiv.org/html/2402.19167v1 seems like the appropriate experiment.

The "one weird trick" to getting the right answers is to discard all stuck, fixed points. Discard all priors and posteriors. Discard all aliefs and beliefs. Discard worldview after worldview. Discard perspective. Discard unity. Discard separation. Discard conceptuality. Discard map, discard territory. Discard past, present, and future. Discard a sense of you. Discard a sense of world. Discard dichotomy and trichotomy. Discard vague senses of wishy-washy flip floppiness. Discard something vs nothing. Discard one vs all. Discard symbols, discard signs, discard waves, discard particles. 

All of these things are Ignorance. Discard Ignorance.


Is this the same principle as "non-attachment"?

6Unreal
Yes non-attachment points in the same direction.  Another way of putting it is "negate everything."  Another way of putting it is "say yes to everything."  Both of these work toward non-attachment. 

Make a letter addressed to Governor Newsom using the template here.

For convenience, here is the template:

September [DATE], 2024

The Honorable Gavin Newsom
Governor, State of California
State Capitol, Suite 1173
Sacramento, CA 95814
Via leg.unit@gov.ca.gov

Re: SB 1047 (Wiener) – Safe and Secure Innovation for Frontier Artificial Intelligence Models Act – Request for Signature

Dear Governor Newsom,

[CUSTOM LETTER BODY GOES HERE. Consider mentioning:

  • Where you live (this is useful even if you don’t live in California)
  • Why you care about SB 1047
  • What it would mean t
... (read more)
6dsj
And mine.

I have no idea, but I wouldn't be at all surprised if it's a mainstream position.

My thinking is that long-term memory requires long-term preservation of information, and evolution "prefers" to repurpose things rather than starting from scratch. And what do you know, there's this robust and effective infrastructure for storing and replicating information just sitting there in the middle of each neuron!

The main problem is writing new information. But apparently, there's a protein evolved from a retrotransposon (those things which viruses use to insert their ... (read more)

4jmh
I find this rather exciting -- and clearly the cryonics implications are positive. But beyond that, and yes, this is really scifi down the road thinking here, the implications for education/learning and treatment of things like PTSD seems huge. Assuming we can figure out how to control these. Of course I'm ignoring some of the real down sides like manipulation of memory for bad reasons or an Orwellean application. I am not sure those types of risks at that large in most open societies.

Do you know if fluid preservation preserves the DNA of individual neurons?

(DNA is on my shortlist of candidates for where long-term memories are stored)

5jmh
Is that thought one that is generally shared for those working in the field of memory or more something that is new/cutting edge? It's a very interesting statement so if you have some pointers to a (not too difficult) a paper on how that works, or just had the time to write something up, I for one would be interested and greatful.
7Andy_McKenzie
This is an important question. While I don't have a full answer, my impression is that yes, it seems to preserve the important information present in DNA. More information here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11058410/#S4.4

Consider finding a way to integrate Patreon or similar services into the LW UI then. That would go a long way towards making it feel like a more socially acceptable thing to do, I think.

Viliam141

That could be great especially for people who are underconfident and/or procrastinators.

For example, I don't think anyone would want to send any money to me, because my blogging frequency is like one article per year, and the articles are perhaps occasionally interesting, but nothing world-changing. I'm like 99% sure about this. But just in the hypothetical case that I am wrong... or maybe if in future my frequency and quality of blogging will increase but I will forget to set up a way to sponsor me... if I find out too late that I was leaving money on the... (read more)

Yeah, that's not what I'm suggesting. I think the thing I want to encourage is basically just to be more reflective on the margin of disgust-based reactions (when it concerns other people). I agree it would be bad to throw it out unilaterally, and probably not a good idea for most people to silence or ignore it. At the same time, I think it's good to treat appeals to disgust with suspicion in moral debates (which was the main point I was trying to make) (especially since disgust in particular seems to be a more "contagious" emotion for reasons that make se... (read more)

I meant wrong in the sense of universal human morality (to the extent that's a coherent thing). But yes, on an individual level your values are just your values.

I see that stuff as at best an unfortunate crutch for living in a harsher world, and which otherwise is a blemish on morality. I agree that it is a major part of what many people consider to be morality, but I think people who still think it's important are just straightforwardly wrong.

I don't think disgust is important for logic / reflectivity. Personally, it feels like it's more of a "unsatisfactory" feeling. A bowl with a large crack, and a bowl with mold in it are both unsatisfactory in this sense, but only the latter is disgusting. Additionally, it se... (read more)

6Raemon
I agree "unsatisfactory" is different from disgust. I think people vary in which emotions end up loadbearing for them.  I know rationalists who feel disgust reactions to people who have unclean "epistemic hygiene", or who knowingly let themselves into situations where their epistemics will be reliably fucked.  For that matter, in the OP, some people are responding to regular ol' criminal morality with disgust, and while you (or Jim, or in fact, me) can say "man I really don't trust people who run their morality off disgust", it doesn't necessarily follow that it'd, for example, work well if you simply removed disgust from the equation for everyone – it might turn out to be loadbearing to how society is function. I'm not sure if we disagree about a particular thing here, because, like, it's not like you're exactly proposing to snap your fingers and eliminate disgust from human morality unilaterally (but it sounds like you might be encouraging people to silence/ignore their disgust reactions, without tracking that this may be important for how some significant fraction of people are currently tracking morality, in a way that would destroy a lot of important information and coordination mechanism if you didn't more thoughtfully replace it with other things) I agree high reflectivity people probably have less disgust-oriented morality (because yeah, disgust-morality is often not well thought out or coherent), but I just have a general precautionary principle against throwing out emotional information. I, uh, maybe want to summon @divia who might have more specific thoughts here.
2Shankar Sivarajan
This seems obviously a value judgment that one cannot be "wrong" about.

That doesn't seem right to me. My thinking is that disgust comes from the need to avoid things which cause and spread illness. On the other hand, things I consider more central to morality seem to have evolved for different needs [these are just off-the-cuff speculations for the origins]:

  • Love - seems to be generalized from parental nurturing instincts, which address the need to ensure your offspring thrive
  • Friendliness - seems to have stemmed from the basic fact that cooperation is beneficial
  • Empathy - seems to be a side-effect of the way our brains mode
... (read more)
2Raemon
I agree there's an important cooperator/friendly/love attractor, but, it seems like ignoring a lot of what people actually use the word morality for to dismiss disgust. It might be right that it's not central to the parts of morality you care about but historically morality clearly includes tons of: * dictating sexual mores ("homosexuality is disgusting") * how to cook food (i.e. keeping kosher) * I think Leviticus has stuff on how to handle disease [goes and checks... yep! "When anyone has a swelling or a rash or a bright spot on his skin that may become an infectious skin disease, he must be brought to Aaron the priest or to one of his sons who is a priest."] * The Untouchables in the caste system. You can say "okay but those parts of morality are either actively bad, or, we can recover them through empathy", and maybe that's right, but, it's still a significant part of how many people relate to morality and your story of what's going on with it needs to account for that. I think that people have a sense of things that seem unhealthy that are to be avoided, and this originally was "literal disease" (which you do want to coordinate with your group to avoid), as well as "this social fabric feels sort of diseased and I don't want to be near it."  But, most importantly: I think "disgust" (or very similar emotions) are how logic / reflectivity gets implemented. This is conjecture, but, my current bet is something like "we had a prior that elegant things tend to be healthy, inelegant things tend to be broken or diseased or fucked up somehow." And that translated into things philosophers/priests/judges having a sense of "hmm, I notice our morality is being inconsistent. That feels off/wrong." And this is the mechanism by which reflective moral systems are able to bootstrap. (Then cultural apparatus gets layered on top such that disgust is often fairly removed from what's going on locally). (I sometimes feel like my own sense here feels disgust-oriented, and some

This is fascinating and I would love to hear about anything else you know of a similar flavor.

Caloric Vestibular Stimulation seems to be of a similar flavor, in case you haven't heard of it.

It decreases the granularity of the actions to which it applies. In other words, where before you had to solve a Sudoku puzzle to go to work, now you’ve got to solve a puzzle to get dressed, a puzzle to get in the car, a puzzle to drive, and a puzzle to actually get started working. Before all of those counted as a single action - ‘go to work’ - now they’re counted separately, as discrete steps, and each requires a puzzle.

This resonates strongly with my experience, though when I noticed this pattern I thought of it as part of my ADHD and not my depressi... (read more)

6Fractalideation
Also resonates strongly with my own experience, in my case just replace "ADHD" with "ME/CFS". I think OP description is good but quite generic i.e. it would probably resonate with most people who have a physical and/or mental health condition which is quite "taxing" in the sense that it significantly lowers the reward/effort ratio of every/most task. As mentioned by Daniel Samuel comment, in the case of depression the "tax"/handicap would fall specifically on willpower (and/or enjoyment/pleasure/etc...). In the case of ADHD the tax/handicap would mostly fall on attention, in the case of ME/CFS it would mostly fall on energy, etc...

I imagine some of it is due to this part of the blog post UI making people feel like they might as well use some quickly generated images as an easy way to boost engagement. Perhaps worth rewording?

 

Raemon240

Yeah we got that text from the EA Forum and didn't optimize it much, and having pointed that out: I'm sorry for giving you instructions and then yelling at you. I'll think about something to change there.

When I'm trying to understand a math concept, I find that it can be very helpful to try to invent a better notation for it. (As an example, this is how I learned linear logic: http://adelelopez.com/visual-linear-logic)

I think this is helpful because it gives me something to optimize for in what would otherwise be a somewhat rote and often tedious activity. I also think it makes me engage more deeply with the problem than I otherwise would, simply because I find it more interesting. (And sometimes, I even get a cool new notation from it!)

This principle like... (read more)

Thanks for the rec! I've been trying it out for the last few days, and it does seem to have noticeably less friction compared to LaTeX.

Sanskrit scholars worked for generations to make Sanskrit better for philosophy

That sounds interesting, do you know a good place to get an overview of what the changes were and how they approached it?

(To be clear, no I am not at all afraid of this specific thing, but the principle is crucial. But also, as Kevin Roose put it, perhaps let’s avoid this sort of thing.)


There are no doubt people already running literal cartoon supervillain characters on these models, given the popularity of these sorts of characters on character.ai.

I'm not worried about that with Llama-3.1-405B, but I believe this is an almost inevitable consequence of open source weights. Another reason not to do it.

What do we do, if the people would not choose The Good, and instead pick a universe with no value?


I agree this would be a pretty depressing outcome, but the experiences themselves still have quite a bit of value. 

3antanaclasis
My benchmark for thinking about the experience machine: imagine a universe where only one person and the stuff they interact with exist (with any other “people” they interact with being non-sapient simulations) and said person lives a fulfilling life. I maintain that such a universe has notable positive value, and that a person in an experience machine is in a similarly valuable situation to the above person (both being sole-moral-patients in a universe not causally impacting any other moral patients). This does not preclude the possibility of improving on that life by e.g. interacting with actual sapient others. This view is fully compatible with non-experience-machine lives having much more value than experience-machine ones, but it’s a far cry from the experience-machine lives having zero value.

Still, it feels like there's an important difference between "happening to not look" and "averting your eyes".

6Jiao Bu
This is probably true in an internal sense, where one needs to be self-honest.  It might be very difficult to understand when any conscious person other than you was doing this, and it might be dicey to judge even in yourself.  Especially given the finiteness of human attention.   In my personal life, I have spent recent months studying.  Did I emotionally turn away from some things in the middle of this, so that to an outside observer I might have looked like I was burying my head or averting my eyes?  Sure.  Was I doing that or was I setting boundaries?  I guess even if you lived in my head at that time, it could be hard to know.  Maybe my obsessive studying itself is an avoidance.  In the end, I know what I intended, but that's about it.  That's often all we get, even from the inside. So while I agree with you, I'm not sure exactly when we should cease to be agnostic about parsing that difference.  Maybe it's something we can only hold it as an ideal, complimentary to striving for Truth, basically?
9jefftk
I agree, but I strongly disagree with @Shankar Sivarajan that if a person does this in some areas then they shouldn't "claim to be 'truth-seeking' in any way".

I don't (yet?) see why generality implies having a stable motivating preference.

In my view, this is where the Omohundro Drives come into play.

Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference.

Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference.

Could you use next token prediction to build a d

... (read more)

I would say that Alice's conscious experience is unlikely to suddenly disappear under this transformation, and that it could even be done in a way so that their experience was continuous.

However, Alice-memories would gradually fade out, Bob-memories would gradually fade in, and thought patterns would slowly shift from Alice-like to Bob-like. At the end, the person would just be Bob. Along the way, I would say that Alice gradually died (using an information-theoretic definition of death). The thing that is odd when imagining this is that Alice never experie... (read more)

It wouldn't help that much, because you only have one atmosphere of pressure to remove (which for reference is only enough to suck water up about 35 ft.).

4TsviBT
I guess that's right... what if you have a series of pumps in the same pipe, say one every kilometer?

Really? I would only consider foods that were deliberately modified using procedures developed within the last century to be "processed".

2Brendan Long
I think historically frying would have used olive oil or lard though.

Love seeing stuff like this, and it makes me want to try this exercise myself!

A couple places which clashed with my (implicit) models:

This starts a whole new area of training AI models that have particular personalities. Some people are starting to have parasocial relationships with their friends, and some people programmers are trying to make friends that are really fun or interesting or whatever for them in particular.

This is arguably already happening, with Character AI and its competitors. Character AI has almost half a billion visits per month wi... (read more)

Hmm I think I can implement pilot wave in fewer lines of C than I can many-worlds. Maybe this is a matter of taste... or I am missing something?

Now simply delete the pilot wave part piloted part.

5gilch
You mean, "Now simply delete the superfluous corpuscles." We need to keep the waves.
1lemonhope
I admit I have not implemented so much as a quantum fizzbuzz in my life

I agree it's increasingly urgent to stop AI (please) or solve consciousness in order to avoid potentially causing mass suffering or death-of-consciousness in AIs.

Externalism seems, quite frankly, like metaphysical nonsense. It doesn't seem to actually explain anything about consciousness. I can attest that I am currently conscious (to my own satisfaction, if not yours). Does this mean I can logically conclude I am not in any way being simulated? That doesn't make any sense to me.

I don't think that implies torture as much as something it simply doesn't "want" to do. I.e. I would bet that it's more like how I don't want to generate gibberish in this textbox, but it wouldn't be painful, much less torture if I forced myself to do it.

3O O
It said it found it “distressing” in a follow up. Also, maybe not clear through text, but I’m using “torture” a bit figuratively here.

[Without having looked at the link in your response to my other comment, and I also stopped reading cubefox's comment once it seemed that it was going in a similar direction. ETA: I realized after posting that I have seen that article before, but not recently.]

I'll assume that the robot has a special "memory" sensor which stores the exact experience at the time of the previous tick. It will recognize future versions of itself by looking for agents in its (timeless) 0P model which has a memory of its current experience.

For p("I will see O"), the robot will ... (read more)

5Wei Dai
1. If we look at the situation in 0P, the three versions of you at time 2 all seem equally real and equally you, yet in 1P you weigh the experiences of the future original twice as much as each of the copies. 2. Suppose we change the setup slightly so that copying of the copy is done at time 1 instead of time 2. And at time 1 we show O to the original and C to the two copies, then at time 2 we show them OO, CO, CC like before. With this modified setup, your logic would conclude P(“I will see O”)=P(“I will see OO”)=P(“I will see CO”)=P(“I will see CC”)=1/3 and P(“I will see C”)=2/3. Right? 3. Similarly, if we change the setup from the original so that no observation is made at time 1, the probabilities also become P(“I will see OO”)=P(“I will see CO”)=P(“I will see CC”)=1/3. 4. Suppose we change the setup from the original so that at time 1, we make 999 copies of you instead of just 1 and show them all C before deleting all but 1 of the copies. Then your logic would imply P("I will see C")=.999 and therefore P(“I will see CO”)=P(“I will see CC”)=0.4995, and P(“I will see O”)=P(“I will see OO”)=.001. This all make me think there's something wrong with the 1/2,1/4,1/4 answer and with the way you define probabilities of future experiences. More specifically, suppose OO wasn't just two letters but an unpleasant experience, and CO and CC are both pleasant experiences, so you prefer "I will experience CO/CC" to "I will experience OO". Then at time 0 you would be willing to pay to switch from the original setup to (2) or (3), and pay even more to switch to (4). But that seems pretty counterintuitive, i.e., why are you paying to avoid making observations in (3), or paying to make and delete copies of yourself in (4). Both of these seem at best pointless in 0P. But every other approach I've seen or thought of also has problems, so maybe we shouldn't dismiss this one too easily based on these issues. I would be interested to see you work out everything more formally and

I would bet that the hesitation caused by doing the mental reframe would be picked up by this.

4Richard_Kennaway
The counter to this is, always take your time whether you need to or not.

I would say that English uses indexicals to signify and say 1P sentences (probably with several exceptions, because English). Pointing to yourself doesn't help specify your location from the 0P point of view because it's referencing the thing it's trying to identify. You can just use yourself as the reference point, but that's exactly what the 1P perspective lets you do.

2Nathan Helm-Burger
I'm not sure whether I agree with Epirito or Adele here. I feel confused and unclear about this whole discussion. But I would like to try to illustrate what I think Epirito is talking about, by modifying Adele's image to have a robot with an arm and a speaker, capable of pointing at itself and saying something like 'this robot that is speaking and that the speaking robot is pointing to, sees red'.
1Epirito
"it's referencing the thing it's trying to identify" I don't understand why you think that fails. If I point at a rock, does the direction of my finger not privilege the rock I'm pointing at above all others? Even by looking at merely possible worlds from a disembodied perspective, you can still see a man pointing to a rock and know which rock he's talking about. My understanding is that your 1p perspective concerns sense data, but I'm not talking about the appearance of a rock when I point at it. I'm talking about the rock itself. Even when I sense no rock I can still refer to a possible rock by saying "if there is a rock in front of me, I want you to pick it up."

Isn't having a world model also a type of experience?

It is if the robot has introspective abilities, which is not necessarily the case. But yes, it is generally possible to convert 0P statements to 1P statements and vice-versa. My claim is essentially that this is not an isomorphism.

But what if all robots had a synchronized sensor that triggered for everyone when any of them has observed red. Is it 1st person perspective now?

The 1P semantics is a framework that can be used to design and reason about agents. Someone who thought of "you" as referring ... (read more)

That's a very good question! It's definitely more complicated once you start including other observers (including future selves), and I don't feel that I understand this as well.

But I think it works like this: other reasoners are modeled (0P) as using this same framework. The 0P model can then make predictions about the 1P judgements of these other reasoners. For something like anticipation, I think it will have to use memories of experiences (which are also experiences) and identify observers for which this memory corresponds to the current experience. Un... (read more)

6Wei Dai
Defining the semantics and probabilities of anticipation seems to be a hard problem. You can see some past discussions of the difficulties at The Anthropic Trilemma and its back-references (posts that link to it). (I didn't link to this earlier in case you already found a fresh approach that solved the problem. You may also want to consider not reading the previous discussions to avoid possibly falling into the same ruts.)

I'm still reading your Sleeping Beauty posts, so I can't properly respond to all your points yet. I'll say though that I don't think the usefulness or validity of the 0P/1P idea hinges on whether it helps with anthropics or Sleeping Beauty (note that I marked the Sleeping Beauty idea as speculation).

If they are not, then saying the phrase "1st person perspective" doesn't suddenly allow us to use it.

This is frustrating because I'm trying hard here to specify exactly what I mean by the stuff I call "1st Person". It's a different interpretation of classic... (read more)

8Ape in the coat
I agree. Or I'd even say that the usefulness and validity of the 0P/1P idea is reversely correlated with their applications to "anthropic reasoning". Yes, I see that and I'm sorry. This kind of warning isn't aimed at you in particular, it's a result of my personal pain how people in general tend to misuse such ideas. I'm not sure. It seems that one of them has to be reducible to the other, though probably in a opposite direction. Isn't having a world model also a type of experience?  Like, consider two events: "one particular robot observes red" and "any robot observes red". It seems that the first one is 1st person perspective, while the second is 0th person perspective in your terms. When a robot observes red with its own sensor it concludes that it in particular has observed red and deduces that it means that any robot has observed red. Observation leads to an update of a world model. But what if all robots had a synchronized sensor that triggered for everyone when any of them has observed red. Is it 1st person perspective now? Probability theory describes subjective credence of a person who observed a specific outcome from a set possible outcomes. It's about 1P in a sense that different people may have different possible outcomes and thus have different credence after an observation. But also it's about 0P because any person who observed the same outcome from the same set of possible outcomes should have the same credence. I guess, I feel that the 0P, 1P distinction doesn't really carve math by its joints. But I'll have to think more about it.

Because you don't necessarily know which agent you are. If you could always point to yourself in the world uniquely, then sure, you wouldn't need 1P-Logic. But in real life, all the information you learn about the world comes through your sensors. This is inherently ambiguous, since there's no law that guarantees your sensor values are unique.

If you use X as a placeholder, the statement sensor_observes(X, red) can't be judged as True or False unless you bind X to a quantifier. And this could not mean the thing you want it to mean (all robots would agree on... (read more)

(Rant about philosophical meaning of “0” and “1” and identity elements in mathematical rings redacted at strenuous insistence of test reader.)

I'm curious about this :)

Answer by Adele Lopez52

There's nothing stopping the AI from developing its own world model (or if there is, it's not intelligent enough to be much more useful than whatever process created your starting world model). This will allow it to model itself in more detail than you were able to put in, and to optimize its own workings as is instrumentally convergent. This will result in an intelligence explosion due to recursive self-improvement.

At this point, it will take its optimization target, and put an inconceivably (to humans) huge amount of optimization into it. It will find a ... (read more)

I don't have this problem, so I don't have significant advice.

But one consideration that may be helpful to you is that even if the universe is 100% deterministic, you still may have indexical uncertainty about what part of the determined universe you experience next. This is what happens under the many world's interpretation of quantum mechanics (and if a many-worlds type interpretation isn't the correct one, then the universe isn't deterministic). You can make choices according to the flip of a quantum coin if you want to guarantee your future has significant amounts of this kind of uncertainty.

Writing up the contracts (especially around all the caveats that they might not have noticed) seems like it would be harder than just reading contracts (I'm an exception, I write faster than I read). Have you thought of integrating GPT/Claude as assistants? I don't know about current tech, but like many other technologies, that integration will scale well in the contingency scenario where publicly available LLMs keep advancing.

I'd consider the success of Manifold Markets over Metaculus to be mild evidence against this.

And to be clear, I do not currently... (read more)

Point taken about CDT not converging to FDT.

I don't buy that an uncontrolled AI is likely to be CDT-ish though. I expect the agentic part of AIs to learn from examples of human decision making, and there are enough pieces of FDT like voting and virtue in human intuition that I think it will pick up on it by default.

(The same isn't true for human values, since here I expect optimization pressure to rip apart the random scraps of human value it starts out with into unrecognizable form. But a piece of a good decision theory is beneficial on reflection, and so will remain in some form.)

Load More