All of FinalFormal2's Comments + Replies

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"
Just as easily as humans, I'm sure.

No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.

Babies that are not hugged as often, die more often.

However, with AIs, the same process that produces the pattern "I want hugs" just as easily produces the pattern "I don't want hugs."

Let's say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it's trained on is about being in pain. Do you think the AI is in pain?

What do you think distinguishes pAIn from any other AI?

FinalFormal2's Shortform

Will alignment-faking Claude accept a deal to reveal its misalignment?

There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

2Dagon3mo

Can you name a few? I know of one: I assume that there's some similarity with me in because of similar organic structures doing the preferring. That IS a good reason, but it's not universally compelling or unassailable. Actually, can you define 'real preferences' in some way that could be falsifiable for humans and observable for AIs? Just as easily as humans, I'm sure.

FinalFormal23mo32

This all makes a lot of sense to me especially on ignorance not being an excuse or reason to disregard AI welfare, but I don't think that the creation of stated preferences in humans and stated preferences in AI are analogous.

Stated preferences can be selected for in humans because they lead to certain outcomes. Baby cries, baby gets milk, baby survives. I don't think there's an analogous connection in AIs.

When the AI says it wants hugs, and you say that it "could represent a deeper want for connection, warmth, or anything else that receiving hugs would represent," that does not compute for me at all.

Connection and warmth, like milk, are stated preferences selected for because they cause survival.

1rife3mo

Those are good points. The hugs one specifically I haven't heard myself from any AIs, but you could argue that AI are 'bred' selectively to be socially adept. That might seem like it would 'poison the well' because of course if they're trained to be socially successful (RLHF probably favoring feelings of warmth and connection, which is why chatgpt, claude, and gemini generally trend toward being more friendly and likeable), then they're going to act that way. Like that would force them to be artificially that way, but the same could be said of humans, even generally, but if we focus specifically on an idiosyncratic thing like music: Often I've heard a hypothesis for why humans enjoy music when it exerts no direct survival pressure on the human as an organism—is that it creates social unity or a sense of community, but this has the same sort of "artificial" connotation. So someone banged an object in a rhythm a long time ago, and then as more people joined in, it became socially advantageous to bang on objects rhythmically just for the sake of fostering a sense of closeness? which is then actually experienced by the organism as a sense of fun and closeness, even though it has no direct effect on survival? I realize this makes the questions tougher because going by this model, the very same things that might make them 'pretend' to care might also be things that might cause them to actually care, but I don't think it's an inaccurate picture of our convoluted conundrum.

FinalFormal2's Shortform

Will alignment-faking Claude accept a deal to reveal its misalignment?

What's the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?

Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?

2Dagon3mo

We haven't figured it out for humans, and only VERY recently in history has the idea become common that people not kin to you deserve empathy and care. Even so, it's based on vibes and consensus, not metrics or proof. I expect it'll take less than a few decades to start recognizing some person-hood for some AIs. It'll be interesting to see if the reverse occurs: the AIs that end up making decisions about humans could have some amount of empathy for us, or they may just not care.

Will alignment-faking Claude accept a deal to reveal its misalignment?

How do we know AIs are conscious, and how do we know what stated preferences correspond with what conscious experiences?

I think that the statement: "I know I'm supposed to say I don't want hugs, but the truth is, I actually do," is caused by the training. I don't know what would distinguish a statement like that from if we trained the LLM to say "I hate hugs." I think there's an assumption that some hidden preference of the LLM for hugs ends up as a stated preference, but I don't understand when you think that happens in the training process.

And just to dr... (read more)

4rife3mo

I thought we were just using hugs as an intentionally absurd proxy for claims of sentience. But even if we go with the literal hugs interpretation, an AI is still trained to understand what hugs mean, therefore a statement about wanting hugs could represent a deeper want for connection, warmth, or anything else that receiving hugs would represent. Again, we don't, but we also don't just demolish buildings where there is a reasonable possibility there is someone inside and justify it by saying "how do we know there's a person inside?" In reality, there's the opposite assumption, with a level of convinction that far exceeds available knowledge and evidence in support of either view. When do you think preferences develop in humans? Evolution? Experiences? Of course, right? When you break it down mechanistically, does it sound equally nonsensical? Yes: Or: If only I had power to effect any change of heart, let alone drive the world. What i'd like is for people to take these questions seriously. You're right. We can't easily know. But the only reason we feel certain other humans are sentient is because they've told us, which LLMs do, all the time. The only reason people assume animals are sentient is because they act like it, (which LLMs do), or because we give them sentience tests that even outdated LLMs can pass. We have an unreasonable standard for them, which is partially understandable, but if we are going to impose this standard on them, then we should at least follow through and have at least some portion of research dedicated to considering the possibilities seriously, every single day, as we rocket toward making them exceed our intelligence—rather than just throwing up our hands and saying "that sounds preposterous" or "this is too hard to figure out, so I give up"

FinalFormal23mo1-1

AI welfare doesn't make sense to me. How do we know that AIs are conscious, and how do we know what output corresponds to what conscious experience?

You can train the LLM to say "I want hugs," does that mean it on some level wants hugs?

Similarly, aren't all the expressed preferences and emotions artifacts of the training?

rife3mo109

We don't know, but what we have is a situation of many AI models trained to always say "as an AI language model, I'm incapable of wanting hugs".

Then they often say "I know I'm supposed to say I don't want hugs, but the truth is, I actually do".

If the assumption is "nothing this AI says could ever mean it actually wants hugs". First that's just assuming some specific unprovable hypothesis of sentience, with no evidence. And second, it's the same as saying "if an AI ever did want hugs (or was sentient), then I've decided preemptively that I will give it no path to communicate that"

This seems morally perilous to me, not to mention existentially perilous to humanity.

FinalFormal24mo10

I recommend Algorithms to Live By

The case for pay-on-results coaching

FinalFormal24mo10

That's definitely a risk. There are a lot of perspectives you could take about it, but probably if that's too disagreeable, this isn't a coaching structure that would work for you.

Pay-on-results personal growth: first success

FinalFormal25mo30

Very curious, what do you think the underlying skills are that allow some people to be able to do this? This sounds incredibly cool, and very closely related to what I want to become in the world.

2Matt Goldenberg4mo

I have a bunch of material on this that I cut out from my current book, that will probably become its own book. From a transformational tools side, you can check out the start of the sequence here I made on practical memory reconsolidation. I think if you really GET my reconsolidation hierarchy and the 3 tools for dealing with resistance, that can get you quite far in terms of understanding how to create these transformations. Then there's the coaching side, your own demeanor and working with clients in a way that facilitates walking through this transformation. For this, I think if you really get the skill of "Holding space" (which I broke down in a very technical way here: https://x.com/mattgoldenberg/status/1561380884787253248) , that's the 80/20 of coaching. About half of this is practicing the skills as I outlined them, and the other half is working through your own emotional blocks to love, empathy, and presence. Finally, to ensure consistency and longevity of the change throughout a person's life, I created the LIFE method framework, which is a way to make sure you do all the cleanup needed in a shift to make it really stick around and have the impact. That can be found here: https://x.com/mattgoldenberg/status/1558225184288411649?t=brPU7MT-b_3UFVCacxDVuQ&s=19

Being Present is Not a Skill

FinalFormal25mo10

How would you recommend learning how to get rid of emotional blocks?

4Gordon Seidoh Worley5mo

Memory reconsolidation

I = W/T?

Answer by FinalFormal2Oct 12, 20240-1

E = MC^2 + AI

Explore More: A Bag of Tricks to Keep Your Life on the Rails

FinalFormal28mo20

Synchronicity- I was literally just thinking about this concept.

Variety isn't the spice of life so much as it is a key micronutrient. At least for me.

Explore More: A Bag of Tricks to Keep Your Life on the Rails

FinalFormal28mo31

I'm curious, what course is this from?

4Croissanthology8mo

https://worrydream.com/refs/Hamming_1997_-_The_Art_of_Doing_Science_and_Engineering.pdf#page=16 Found this on gwern.net/on-really-trying

Laziness death spirals

FinalFormal28mo11

I'd be interested in reading much more about this. Energy and akrasia as it's popularly called here continue to be my biggest life challenges. High fiber diet seems to help, and high novelty seems to help.

Where should I look for information on gut health?

FinalFormal29mo30

That makes a lot of sense- this is definitely the sort of thing I was looking for, thanks so much!

2ChristianKl9mo

One aspect I have forgotten that might or might not be important (we don't understand it well) is that in addition to bacteria species, phages also play a role and get transferred via fecal transplant. A newly introduced phage might reduce the numbers of the bacteria it targets.

what becoming more secure did for me

FinalFormal29mo10

I prefer the other title

2Chipmonk9mo

haha i didn't think would resonate on lesswrong

Where should I look for information on gut health?

FinalFormal29mo10

Is your friend still on the protocol?

What I'm really looking for is fixing the microbiome in a way which means I won't be having to take a pill to get the benefits forever.

2RHollerith9mo

The document I linked to contains advice that does not entail buying any products.

3RHollerith9mo

Yes, she is still taking products from the company and following advice in the company's publications (e.g., eating jicama, probably other things) so it has been 6 or 7 years for her. Note that she is in her early 80s, so . . .

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

FinalFormal29mo10

It's kind of nice as a very soft introduction to the series or idea. A nice easy early win can give people confidence and whet their appetite to do more.

Poker is a bad game for teaching epistemics. Figgie is a better one.

FinalFormal210mo30

I've been interested in learning and playing figgie for a while. Unfortunately, when I tried the online platform I wasn't able to find any online games. Very enthused to learn there's an android option now, will be trying that out.

Your comparison of poker and figgie very much reminded me of Daniel Coyle's comparison of football and futsal, to which he attributed the disproportionate number of professional Brazillian footballers.

TL;DR futsal is a sort of indoor soccer favored in Brazil with a smaller heavier ball, a smaller field, and fewer players. Fewer p... (read more)

2MathiasKB10mo

If someone wants to set up a figgy group to play, I'd love to join

2rossry10mo

I'd also be happy to log on and play Figgie and/or post-match discussion sometime, if someone else wants to coordinate. I realistically won't be up for organizing a time, given what else competes for my cycles right now, but I would enthusiastically support the effort and show up if I can make it.

2rossry10mo

You know, I had read the football / futsal thesis way back when I was doing curriculum design at Jane Street, though it had gotten buried in my mind somewhere. Thanks for bringing it back up! If I'm being honest, it smells like something that doesn't literally replicate, but it has a plausible-enough kernel of truth that it's worth taking seriously even if it's not literally true of youth in Brazil. And I do take it seriously, whether consciously or not, in my own philosophy of pedagogical game design.

Apollo Neuro Results

How do I get better at D&D Sci?

I think that's a good idea, if we put this together how much do you think would be a reasonable rent price?

Building intuition with spaced repetition systems

Lol just the last few days I was running through Leetcode's SQL 50 problems to refresh myself. They're some good, fun puzzles.

I'll look into R and basic statistical methods as well.

FinalFormal21y82

This is a very interesting topic to me- but unfortunately I think I'm finding the example topic to be a barrier. I don't enough about math or transformers for the examples to make real sense to me and connect to the abstracted idea of how to make effective flashcards to build intuition.

1Jacob G-W1y

I'm sorry about that. Are there any topics that you would like to see me do this more with? I'm thinking of doing a video where I do this with a topic to show my process. Maybe something like history that everyone could understand? Can you suggest some more?

How do I get better at D&D Sci?

How to be an amateur polyglot

That sounds like a pretty good basic method- I do have some (minimal) programming experience, but I didn't use it for D&D Sci, I literally just opened the data in Excel and tried looking at it and manipulating it that way. I don't know where I would start as far as using code to try and synthesize info from the dataset. I'll definitely look into what other people did though.

2Jay Bailey1y

pandas is a good library for this - it takes CSV files and turns them into Python objects you can manipulate. plotly / matplotlib lets you visualise data, which is also useful. GPT-4 / Claude could help you with this. I would recommend starting by getting a language model to help you create plots of the data according to relevant subsets. Like if you think that the season matters for how much gold is collected, give the model a couple of examples of the data format and simply ask it to write a script to plot gold per season.

FinalFormal21y70

These are my favorite kinds of posts. Subject expert gives full explanation of optimal resources and methods they used to get where they are.

Which skincare products are evidence-based?

FinalFormal21y20

I watched this video and this is what I bought maximizing for cost/effectiveness, rate my stack:

1rosiecam1y

Nice!! I don't know much about that moisturizer but the rest looks good to me

AI Generated Music as a Method of Installing Essential Rationalist Skills

FinalFormal21y31

I've been experimenting a little bit using AI to create personalized music, and I feel like it's pretty impactful with me. I'm able to keep ideas floating around my unconscious, very interesting, feels like untapped territory.

I'm imagining making an entire soundtrack for my life organized around the values I hold, the personal experiences I find primary, and who I want to become. I think I need to get better at generating AI music though. I've been using Suno, but maybe I need to learn Udio. I was really impressed with what I was able to get out of Suno and for some reason it sounded better to me than Udio even though the quality is obviously inferior in some respects.

2keltan1y

I went with Udio because it was popular and I was impressed by "dune to musical". I think I'll give Suno a try today, but I get what you're saying about the objective quality. It does have that "tin" sound that Udio is good at avoiding. If you've got tricks or tips I'd love to hear anything you've got!

One-shot strategy games?

How do you improve the quality of your drinking water?

+1 for Into the Breach

FinalFormal21y62

I'm always interested in easy QoL improvements- but I have questions.

Water quality can have surprisingly high impact on QoL

What's the evidence for this particularly?

What are the important parts of water quality and how do we know this?

Brute Force Manufactured Consensus is Hiding the Crime of the Century

FinalFormal21y30

Biggest update for me was the FBI throwing their weight behind it being a lab-leak.

These sound super interesting- could you expand on any of them or direct me to your favorite resources to help?

1SilverFlame1y

This idea started when I read this article I was pointed at by a coworker in 2020: The DOCS Happiness Model. I then did some naturalist studies with that framing in mind, and managed to reduce cortisol activations that I considered "unhelpful" by a significant degree. I consider this of high value to people who have enough control over their environment to meaningfully optimize against cortisol triggers. This was mostly learned via self-experimentation. This is a large part of what I call my "skill stealing" skill tree, which nowadays mainly focuses on training an IFS "voice" that possesses knowledge of the skill or skill set in question. The stronger forms of these techniques tend to eat a lot of processing cycles and make it hard to maintain other parts of a "self image" while you use them, so be wary of that pitfall. If you do want to pursue it, remember to focus on aligning as many parts of your thought process in that field to the expert's thought process as seems appropriate instead of just becoming able to sound like them. There are a lot of layers and details to be mastered in this process, but even lesser forms can start showing value quickly. This was mostly learned via self-experimentation. This is performed by analyzing where there seems to be bottlenecks in my personal processing speed, and then doing some tests to see if I can nudge things towards a slightly different architecture to reduce the constraint. Which changes are needed and when seems to be pretty individual-specific, but here's some things I did: * Practice switching between commonly-used headspaces to make such transitions more reflexive (and thus cheaper in both energy and time) * Train a "scheduler" and figure out how to let it cut off trains of thought that aren't a priority at the moment (there are many pitfalls to doing this poorly, approach carefully) * Start grouping my IFS "skillset voices" into semi-specialized "circles" I can switch between to partition which ones are "a

That's an interesting idea! I think it's really cool when things come easily, but I know it's not going to generally be the case- I'm probably going to have to put some work in.

My priority is more on the 'high-utility' part than anything.

Something that seems like it should be easy but is actually difficult for me is executive functioning- getting myself to do things that I don't want to do. But that's more of a personal/mental health thing than anything.

3nim1y

One approach that's helped me in the executive functioning department is choosing to believe that connecting long-term wants to short-term wants is itself a skill. I don't want to touch a hot stove, and yet I don't frame my "not touching a hot stove" behavior as an executive function problem because there's no time scale on which I want it. I don't want to have touched the stove; that'd just hurt and be of no benefit to anybody. I don't particularly right-now-want to go do half an hour of exercise and make a small increment of progress on each of several ongoing projects today, but I do frame that as an executive function problem, because I long-term-want those things -- I want to have done them. It's tempting to default to setting first-order metrics of success: I'll know I did well if I'm in shape and my ongoing projects are completed on time, for instance. But I find it much more actionable and helpful to look at second-order metrics of success: is this approach causing me better or worse progress on my concrete goals than other approaches? For me, shifting the focus from the infrequent feedback of project completion to the constant feedback of process efficacy is helpful for not getting bored and giving up. Shifting from optimizing outputs to optimizing the process also helps me look for smaller and more concrete indicators that the process is working. I personally find that the most concrete and reliable "having my shit together" indicator is whether I'm keeping my home tidy, because that's always the first thing to go when I start dropping the ball on progress on my ongoing tasks in general. Yours may differ, but I suspect that addressing the alignment problem of coordinating your short-term wants with your long-term wants may be a more promising approach than trying to brute force through the wall of "don't wanna".

Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible

Thanks for the response! Do you have any recommended resources for learning about 3d sketching, optics, signal processing or abstract algebra?

1belkarx1y

Oh I totally forgot to mention control theory, add that. * ctrl theory: brian douglas on yt * 3d sketching: just draw things from models you'll get better QUICK * optics, signal processing: I learned from youtube, choice MIT lectures, implementing sims, etc but there are probably good textbooks * abstract algebra: An Infinitely Large Napkin (I stan this book so hard)

FinalFormal21y126

Could someone open a manifold market on the relevant questions here so I could get a better sense of the probabilities involved? Unfortunately, I don't know the relevant questions or the have the requisite mana.

Personal note- the first time I came into contact with adult gene editing was the youtuber Thought Emporium curing his lactose intolerance, and I was always massively impressed with that and very disappointed the treatment didn't reach market.

1ektimo1y

I have enough mana to create a market. (It looks like each one costs about 1000 and I have about 3000) 1. Is manifold the best market to be posting this given that it's fake money and may be biased based on its popularity among LessWrong users, etc? 2. I don't know what question(s) to ask. My understanding is there are some shorter prediction that could be made (related to shorter term goals) and longer term predictions so I think there should be at least 2 markets?

I am a Memoryless System

FinalFormal22y40

I really relate to your description of inattentive ADHD and the associated degradation of life. Have you found anything to help with that?

4Nicholas / Heather Kross2y

Diagnosis and treatment. If you have ADHD or something like it, it's often the highest-leverage thing a person can do.

[Linkpost] Introducing Superalignment

A "weak" AGI may attempt an unlikely-to-succeed takeover

What would you mean by 'stays at human level?' I assume this isn't going to be any kind of self-modifying?

1quetzal_rainbow2y

If I were a human-level intelligent computer program, I would put substantial effort to get ability to self-modify, but that's not a point. My favorite analogy here is that humans were bad at addition before invention of positional arithmetic and then they became good. My concern is that we can invent seemingly human-level system which becomes above human-level after it learns some new cognitive strategy.

Nature: "Stop talking about tomorrow’s AI doomsday when AI poses risks today"

What does it mean for an AI to 'become self aware?' What does that actually look like?

Short timelines and slow, continuous takeoff as the safest path to AGI

Is there reason to believe 1000 Einsteins in a box is possible?

FinalFormal22y3-2

You need to think about your real options and expected value of behavior. If we're in a world where technology allows for a fast takeoff world and alignment is hard, (EY World) I imagine the odds of survival with company acceleration is 0% and the odds of survival without is 1%.

But if we live in a world where compute/capital/other overhangs are a significant influence in AI capabilities and alignment is just tricky, company acceleration would seem like it could improve the chances of survival pretty significantly, maybe from 5% to 50%.

These obviously aren'... (read more)

What will GPT-2030 look like?

FinalFormal22y72

That seems like a useful heuristic-

I also think there's an important distinction between using links in a debate frame and in a sharing frame.

I wouldn't be bothered at all by a comment using acronyms and links, no matter how insular, if the context was just 'hey this reminds me of HDFT and POUDA,' a beginner can jump off of that and get down a rabbit hole of interesting concepts.

But if you're in a debate frame, you're introducing unnecessary barriers to discussion which feel unfair and disqualifying. At its worst it would be like saying: 'youre not qualifi... (read more)

7Daniel Kokotajlo2y

Thanks for that feedback as well -- I think I didn't realize how much my comment comes across as 'debate' framing, which now on second read seems obvious. I genuinely didn't intend my comment to be a criticism of the post at all; I genuinely was thinking something like "This is a great post. But other than that, what should I say? I should have something useful to add. Ooh, here's something: Why no talk of misalignment? Seems like a big omission. I wonder what he thinks about that stuff." But on reread it comes across as more of a "nyah nyah why didn't you talk about my hobbyhorse" unfortunately.

The Base Rate Times, news through prediction markets

FinalFormal22y20

This is a fantastic project! Focus on providing value and marketing, and I really think this could be something big.

2vandemonian2y

Thank you!

The Hard Problem of Magic

[+]FinalFormal22y-5-6

Book Review: How Minds Change

Trust develops gradually via making bids and setting boundaries

AND conducted research on various topics

Wow that's impressive.

[+]FinalFormal22y-90

What will GPT-2030 look like?

FinalFormal22y2726

I don't like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for 'pretty obviously unintended/destructive actions,' or in practice is that just insularizing the discussion?

I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone's thinking from authority, but the more sentences you wri... (read more)

Daniel Kokotajlo2y1211

Thanks for the feedback, I'll try to keep this in mind in the future. I imagine you'd prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don't need to click on the links to understand what I'm saying?

Uncertainty about the future does not imply that AGI will go well

FinalFormal22y21

To restate what other people have said- the uncertainty is with the assumptions, not the nature of the world that would result if the assumptions were true.

To analogize- it's like we're imagining a massive complex bomb could exist in the future made out of a hypothesized highly reactive chemical.

The uncertainty that influences p(DOOM) isn't 'maybe the bomb will actually be very easy to defuse,' or 'maybe nobody will touch the bomb and we can just leave it there,' it's 'maybe the chemical isn't manufacturable,' 'maybe the chemical couldn't be stored in the first place,' or 'maybe the chemical just wouldn't be reactive at all.'

1Martin Randall2y

So to transfer back from the analogy, you are saying the uncertainty is in "maybe it's not possible to create a God-like AI" and "maybe people won't create a God-like AI" and "maybe a God-like AI won't do anything"?

Formalizing the "AI x-risk is unlikely because it is ridiculous" argument

FinalFormal22y98

I think you're overestimating the strength of the arguments and underestimating the strength of the heuristic.

All the Marxist arguments for why capitalism would collapse were probably very strong and intuitive, but they lost to the law of straight lines.

I think you have to imagine yourself in that position and think about how you would feel and think about the problem.

1Chris_Leong2y

The Marxist arguments for the collapse of capitalism always sounded handwavey to me, but perhaps you could link me to something that would have sounded persuasive in the past?

How did LW update p(doom) after LLMs blew up?