All of Nate Showell's Comments + Replies

Rationality Quotes - Fall 2024

How does this model handle horizontal gene transfer? And what about asexually reproducing species? In those cases, the dividing lines between species are less sharply defined.

2Yudhister Kumar6mo

I'm less interested in what existing groups of things we call "species" and more interested in what the platonic ideal of a species is & how we can use it as an intuition pump. This is also why I restrict "species" in the blogpost to "macrofauna species", which have less horizontal gene transfer & asexual reproduction.

Another argument against utility-centric alignment paradigms

The ideas of the Cavern are the Ideas of every Man in particular; we every one of us have our own particular Den, which refracts and corrupts the Light of Nature, because of the differences of Impressions as they happen in a Mind prejudiced or prepossessed.

Francis Bacon, Novum Organum Scientarum, Section II, Aphorism V

Alexander Gietelink Oldenziel's Shortform

The reflective oracle model doesn't have all the properties I'm looking for -- it still has the problem of treating utility as the optimization target rather than as a functional component of an iterative behavior reinforcement process. It also treats the utilities of different world-states as known ahead of time, rather than as the result of a search process, and assumes that computation is cost-free. To get a fully embedded theory of motivation, I expect that you would need something fundamentally different from classical game theory. For example, it pro... (read more)

2Noosphere896mo

Re treating utility as the optimization target, I think this isn't properly speaking an embedded agency problem, but rather an empirical problem of what the first AIs that automate everything will look like algorithmically, as there are algorithms that are able to be embedded in reality that do optimize the utility/reward like MCTS, and TurnTrout limits the post to the model-free policy gradient case like PPO and REINFORCE. TurnTrout is correct to point out that not all RL algorithms optimize for the reward, and reward isn't what the agent optimizes for by definition, but I think that it's too limited in describing when RL does optimize for the utility/reward. So I think the biggest difference between @TurnTrout and people like @gwern et al is whether or not model-based RL that does plan or model-free RL policy gradient algorithms come to dominate AI progress over the next decade. Agree that the fact that it treats utilities of different world states as known and that the cost of computation is free makes it a very unrealistic model for human beings, and while something like the reflective oracle model is a possibility if we warped the laws of physics severely enough, such that we don't have to care about the cost of computation at all, which then allows us to go from treating utilities as unknown to known in 1 step, this is an actual reason why I don't expect the reflective oracle model to transfer to reality at all.

Another argument against utility-centric alignment paradigms

Why are you a realist about the Solomonoff prior instead of treating it as a purely theoretical construct?

Nate Showell6mo30

A theory of embedded world-modeling would be an improvement over current predictive models of advanced AI behavior, but it wouldn't be the whole story. Game theory makes dualistic assumptions too (e.g., by treating the decision process as not having side effects), so we would also have to rewrite it into an embedded model of motivation.

Cartesian frames are one of the few lines of agent foundations research in the past few years that seem promising, due to allowing for greater flexibility in defining agent-environment boundaries. Preferably, we would ... (read more)

2Noosphere896mo

It turns out in an idealized model of intelligent AI, we can remove the dualistic assumptions of game theory by instead positing a reflective oracle, and the reflective oracle is allowed randomness in the territory (it is not just uncertainty in the map) to prevent paradoxes, and in particular the reflective oracle's randomized answers are exactly the Nash-Equilibria of game theory, because there is a one-to-one function between a reflective oracle and a Nash-equilibrium. Of course, whether it can transfer to our reality at all is pretty sketchy at best, but at least there is a solution at all: https://arxiv.org/abs/1508.04145

Another argument against utility-centric alignment paradigms

Nate Showell7mo50

And this is where the fundamental AGI-doom arguments – all these coherence theorems, utility-maximization frameworks, et cetera – come in. At their core, they're claims that any "artificial generally intelligent system capable of autonomously optimizing the world the way humans can" would necessarily be well-approximated as a game-theoretic agent. Which, in turn, means that any system that has the set of capabilities the AI researchers ultimately want their AI models to have, would inevitably have a set of potentially omnicidal failure modes.

This is my cru... (read more)

3Thane Ruthenis7mo

I agree that the agent-foundations research has been somewhat misaimed from the start, but I buy this explanation of John's regarding where it went wrong and how to fix it. Basically, what we need to figure out is a theory of embedded world-modeling, which would capture the aspect of reality where the universe naturally decomposes into hierarchically arranged sparsely interacting subsystems. Our agent would then be a perfect game-theoretic agent, but defined over that abstract (and lazy) world-model, rather than over the world directly. This would take care of agents needing to be "bigger" than the universe, counterfactuals, the "outside-view" problem, the realizability and the self-reference problems, the problem of hypothesis spaces, and basically everything else that's problematic about embedded agency.

Wei Dai's Shortform

Nate Showell7mo111

Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.

5Wei Dai7mo

"Signal group membership" may be true of the fields you mentioned (political philosophy and philosophy of religion), but seems false of many other fields such as philosophy of math, philosophy of mind, decision theory, anthropic reasoning. Hard to see what group membership someone is signaling by supporting one solution to Sleeping Beauty vs another, for example.

5Joey KL7mo

I don’t think this is accurate, I think most philosophy is done under motivated reasoning but is not straightforwardly about signaling group membership

Nate Showell's Shortform

Nate Showell9mo10

Bug report: when I'm writing an in-line comment on a quoted block of a post, and then select text within my comment to add formatting, the formatting menu is displayed underneath the box where I'm writing the comment. For example, this prevents me from inserting links into in-line comments.

Scalable oversight as a quantitative rather than qualitative problem

Nate Showell9mo40

In particular, if the sample efficiency of RL increases with large models, it might turn out that the optimal strategy for RLing early transformative models is to produce many fewer and much more expensive labels than people use when training current systems; I think people often neglect this possibility when thinking about the future of scalable oversight.

This paper found higher sample efficiency for larger reinforcement learning models (see Fig. 5 and section 5.5).

2Buck9mo

Thanks! That's a multi-agent setup but still handy.

How are you preparing for the possibility of an AI bust?

Nate Showell9mo20

I picked the dotcom bust as an example precisely because it was temporary. The scenarios I'm asking about are ones in which a drop in investment occurs and timelines turn out to be longer than most people expect, but where TAI is still developed eventually. I asked my question because I wanted to know how people would adjust to timelines lengthening.

Matt Goldenberg's Short Form Feed

Nate Showell10mo10

Then what do you mean by "forces beyond yourself?" In your original shortform it sounded to me like you meant a movement, an ideology, a religion, or a charismatic leader. Creative inspiration and ideas that you're excited about aren't from "beyond yourself" unless you believe in a supernatural explanation, so what does the term actually refer to? I would appreciate some concrete examples.

2Matt Goldenberg10mo

One way that think about "forces beyond yourself" is pointing to what it feels like to operate from a right-hemisphere dominant mode, as defined by Ian McGilcrist. The language is deliberately designed to evoke that mode - so while I'll get more specific here, know that to experience the thing I'm talking about you need to let go of the mind that wants this type of explanation in order to experience what I'm talking about. When I'm talking about "Higher Forces" I'm talking about states of being that feel like something is moving through you - you're not a head controlling a body but rather you're first connecting to, then channeling, then becoming part of a larger universal force. In my coaching work, I like to use Phil Stutz's idea of "Higher forces" like Infinite Love, Forward Motion, Self-Expression, etc, as they're particularly suited for the modern Western Mind. Here's how Stutz defines the higher force of Self-Expression on his website: "The Higher Force You’re Invoking: Self-Expression The force of Self-Expression allows us to reveal ourselves in a truthful, genuine way—without caring about others' approval. It speaks through us with unusual clarity and authority, but it also expresses itself nonverbally, like when an athlete is "in the zone." In adults, this force gets buried in the Shadow. Inner Authority, by connecting you to the Shadow, enables you to resurrect the force and have it flow through you." Of course, religions also have names for these type of special states, calling them Muses, Jhanas, Direct Connection to God. All of these states (while I can and do teach techniques, steps, and systems to invoke them) ultimately can only be accessed through surrender to the moment, faith in what's there, and letting go of a need for knowing.

Matt Goldenberg's Short Form Feed

Nate Showell10mo0-3

There are more than two options for how to choose a lifestyle. Just because the 2000s productivity books had an unrealistic model of motivation doesn't mean that you have to deceive yourself into believing in gods and souls and hand over control of your life to other people.

3Matt Goldenberg10mo

It's precisely when handing your life to forces beyond yourself (not Gods, thats just handing your life over to someone else) that you can avoid giving your life over to others/society. Souls is metaphorical of course, not some essential unchanging part of yourself - just a thing that actually matters, that moves you

Two easy things that maybe Just Work to improve AI discourse

Nate Showell10mo10

That's not as bad, since it doesn't have the rapid back-and-forth reward loop of most Twitter use.

Two easy things that maybe Just Work to improve AI discourse

Nate Showell10mo10

The time expenditure isn't the crux for me, the effects of Twitter on its user's habits of thinking are the crux. Those effects also apply to people who aren't alignment researchers. For those people, trading away epistemic rationality for Twitter influence is still very unlikely to be worth it.

Two easy things that maybe Just Work to improve AI discourse

Nate Showell10mo4220

I strongly recommend against engaging with Twitter at all. The LessWrong community has been significantly underestimating the extent to which it damages the quality of its users' thinking. Twitter pulls its users into a pattern of seeking social approval in a fast-paced loop. Tweets shape their regular readers' thoughts into becoming more tweet-like: short, vague, lacking in context, status-driven, reactive, and conflict-theoretic. AI alignment researchers, more than perhaps anyone else right now, need to preserve their ability to engage in high-quality thinking. For them especially, spending time on Twitter isn't worth the risk of damaging their ability to think clearly.

4Ben Pace10mo

I think this is personally right for me. I do not twit, and I have fully blocked the platform for (I think) more than half of the last two years. I sometimes go there and seek out the thoughts of people I respect. When I do, I commonly find them arguing against positions they're uninterested in and with people who seem to pick their positions for political reasons (rather than their assessment of what's true), and who bring low-quality arguments and discussion norms. It's not where I want to spend my time thinking. I think some people go there to just have fun and make friends, which is quite a different thing.

2ryan_b10mo

While I do not use the platform myself, what do you think of people doing their thinking and writing offline, and then just using it as a method of transmission? I think this is made even easier by the express strategic decision to create an account for AI-specific engagement. For example, when I look at tweets at all it is largely as links to completed threads or off-twitter blogs/articles/papers.

8Severin T. Seehrich10mo

Interesting arguments going on on the e/acc Twitter side of this debate: https://x.com/khoomeik/status/1799966607583899734

3Bird Concept10mo

I dont think the numbers really check out on your claim. Only a small proportion of people reading this are alignment researchers. And for remaining folks many are probably on Twitter anyway, or otherwise have some similarly slack part of their daily scheduling filled with sort of random non high opportunity cost stuff. Historically there sadly hasn't been scalable ways for the average LW lurker to contribute to safety progress; now there might be a little one.

mako yass10mo2415

I think yall will be okay if you make sure your twitter account isn't your primary social existence, and you don't have to play twitter the usual way. Write longform stuff. Retweet old stuff. Be reasonable and conciliatory while your opponents are being unreasonable and nasty, that's how you actually win.

Remember that the people who've fallen in deep and contracted twitter narcissism are actually insane, It's not an adaptive behavior, they're there to lose. Every day they're embarrassing themselves and alienating people and all you have to do is hang around, occasionally point it out, and be the reasonable alternative.

7Thomas Kwa10mo

I don't anticipate being personally affected by this much if I start using Twitter.

4trevor10mo

Can you expand the list, go into further detail, or list a source that goes into further detail?

1the gears to ascension10mo

agreed, and this is why I don't use it; however, probably not so much a thing that must be avoided at nearly all costs for policy people. For them, all I know how to suggest is "use discretion".

The case for stopping AI safety research

Nate Showell11mo60

AI safety research is speeding up capabilities. I hope this is somewhat obvious to most.

This contradicts the Bitter Lesson, though. Current AI safety research doesn't contribute to increased scaling, either through hardware advances or through algorithmic increases in efficiency. To the extent that it increases the usability of AI for mundane tasks, current safety research does so in a way that doesn't involve making models larger. Fears of capabilities externalities from alignment research are unfounded as long as the scaling hypothesis continues to hold.

1RussellThor11mo

Doesn't the whole concept of takeoff contradict the Bitter Lesson according to some uses of it? That is our present hardware could be much more capable if we had the right software.

1Yonatan Cale11mo

Scaling matters, but it's not all that matters. For example, RLHF

William_S's Shortform

Nate Showell1y6-8

The lack of leaks could just mean that there's nothing interesting to leak. Maybe William and others left OpenAI over run-of-the-mill office politics and there's nothing exceptional going on related to AI.

gwern1y*285

Rest assured, there is plenty that could leak at OA... (And might were there not NDAs, which of course is much of the point of having them.)

For a past example, note that no one knew that Sam Altman had been fired from YC CEO for similar reasons as OA CEO, until the extreme aggravating factor of the OA coup, 5 years later. That was certainly more than 'run of the mill office politics', I'm sure you'll agree, but if that could be kept secret, surely lesser things now could be kept secret well past 2029?

Jackson Silver1y137

At least one of them has explicitly indicated they left because of AI safety concerns, and this thread seems to be insinuating some concern - Ilya Sutskever's conspicuous silence has become a meme, and Altman recently expressed that he is uncertain of Ilya's employment status. There still hasn't been any explanation for the boardroom drama last year.

If it was indeed run-of-the-mill office politics and all was well, then something to the effect of "our departures were unrelated, don't be so anxious about the world ending, we didn't see anything alarming at ... (read more)

David Udell's Shortform

The concept of "the meaning of life" still seems like a category error to me. It's an attempt to apply a system of categorization used for tools, one in which they are categorized by the purpose for which they are used, to something that isn't a tool: a human life. It's a holdover from theistic worldviews in which God created humans for some unknown purpose.

The lesson I draw instead from the knowledge-uploading thought experiment -- where having knowledge instantly zapped into your head seems less worthwhile acquiring it more slowly yourself -- is th... (read more)

On green

Nate Showell1y32

Spoilers for Fullmetal Alchemist: Brotherhood:

Father is a good example of a character whose central flaw is his lack of green. Father was originally created as a fragment of Truth, but he never tries to understand the implications of that origin. Instead, he only ever sees God as something to be conquered, the holder of a power he can usurp. While the Elric brothers gain some understanding of "all is one, one is all" during their survival training, Father never does -- he never stops seeing himself as a fragile cloud of gas inside a flask, obsessivel

... (read more)

Ratios's Shortform

0th Person and 1st Person Logic

Mostly the first reason. The "made of atoms that can be used for something else" piece of the standard AI x-risk argument also applies to suffering conscious beings, so an AI would be unlikely to keep them around if the standard AI x-risk argument ends up being true.

Nate Showell1y1-10

It's worth noting that no reference to preferences has yet been made. That's interesting because it suggests that there are both 0P-preferences and 1P-preferences. That intuitively makes sense, since I do care about both the actual state of the world, and what kind of experiences I'm having.

Believing in 0P-preferences seems to be a map-territory confusion, an instance of the Tyranny of the Intentional Object. The robot can't observe the grid in a way that isn't mediated by its sensors. There's no way for 0P-statements to enter into the robot's decision loo... (read more)

1cubefox1y

It would be more precise to say the robot would prefer to get evidence which raises its degree of belief that a square of the grid is red.

shortplav

Nate Showell1y30

What's your model of inflation in an AI takeoff scenario? I don't know enough about macroeconomics to have a good model of what AI takeoff would do to inflation, but it seems like it would do something.

3niplav1y

Oh, yeah, I completely forgot inflation. Oops. If I make another version I'll add it.

Richard_Kennaway's Shortform

Richard_Kennaway's Shortform

You're underestimating how hard it is to fire people from government jobs, especially when those jobs are unionized. And even if there are strong economic incentives to replace teachers with AI, that still doesn't address the ease of circumvention. There's no surer way to make teenagers interested in a topic than to tell them that learning about it is forbidden.

Choosing My Quest (Part 2 of "The Sense Of Physical Necessity")

All official teaching materials would be generated by a similar process. At about the same time, the teaching profession as we know it today ceases to exist. "Teachers" become merely administrators of the teaching system. No original documents from before AI are permitted for children to access in school.

This sequence of steps looks implausible to me. Teachers would have a vested interest in preventing it, since their jobs would be on the line. A requirement for all teaching materials to be AI-generated would also be trivially easy to circumvent, either by... (read more)

2Richard_Kennaway1y

That will only put a brake on how fast the frog is boiled. Artists have a vested interest against the use of AI art, but today, hardly anyone else thinks twice about putting Midjourney images all through their postings, including on LessWrong. I'll be interested to see how that plays out in the commercial art industry.

Why do you ordinarily not allow discussion of Buddhism on your posts?

Also, if anyone reading this does a naturalist study on a concept from Buddhist philosophy, I'd like to hear how it goes.

8LoganStrohl1y

I ordinarily do not allow discussions of Buddhism on my posts because I hate moderating them. I haven't worked out what exactly it is about Buddhism, but it seems to cause things to go wonky in a way that's sort of similar to politics. Also, my way of thinking and writing and doing things in general seems to bring out a lot of people who want to talk about Buddhism, and I want my work discussed mostly on its own terms, without it being immediately embroiled in whatever thing it is that tends to happen when people start talking about Buddhism.

Nate Showell's Shortform

Phallocentricity in GPT-J's bizarre stratified ontology

An edgy writing style is an epistemic red flag. A writing style designed to provoke a strong, usually negative, emotional response from the reader can be used to disguise the thinness of the substance behind the author's arguments. Instead of carefully considering and evaluating the author's arguments, the reader gets distracted by the disruption to their emotional state and reacts to the text in a way that more closely resembles a trauma response, with all the negative effects on their reasoning capabilities that such a response entails. Some examples of authors who do this: Friedrich Nietzsche, Grant Morrison, and The Last Psychiatrist.

1StartAtTheEnd1y

It's a natural tendency to taunting, which is meant to motivate the reader to attack the author, who is frustrated at the lack of engagement. The more sure you are of yourself, the more provocative you tend to be, especially if you're eager to put your ideas to the test. A thing which often follows edginess/confidence, and the two may even be a cause of eachother, is mania. Even hypomanic moods has a strong effect on ones behaviour. I believe this is what happened to Kanye West. If you read Nietzsche's Zarathustra, you might find that it seems to contain a lot of mood-swings, and it was written in just 10 days as far as I know (and periods of high productivity are indeed a characteristic of mania) I think it makes for great reading, and while such people have a higher risk of being wrong, I also think they have more interesting ideas. But I will admit that I'm a little biased on this topic as I've made myself a little edgy (confidence has a positive effect on mood)

3Carl Feynman1y

Allow me to quote from Lem’s novel “Golem XIV”, which is about a superhuman AI named Golem: May not this method also be employed by human writers?

3Ben Pace1y

One thing to do here is to re-write their arguments in your own (ideally more neutral) language, and see whether it still seems as strong.

Nate Showell1y40

OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology... but why this primordial ontological role for the penis?

"Penis" probably has more synonyms than any other term in GPT-J's training data.

2mwatkins1y

Quite possibly it does, but I doubt very many of these synonyms are tokens.

Dreams of AI alignment: The danger of suggestive names

Nate Showell1y3210

I particularly wish people would taboo the word "optimize" more often. Referring to a process as "optimization" papers over questions like:

What feedback loop produces the increase or decrease in some quantity that is described as "optimization?" What steps does the loop have?
In what contexts does the feedback loop occur?
How might the effects of the feedback loop change between iterations? Does it always have the same effect on the quantity?
What secondary effects does the feedback loop have?

There's a lot hiding behind the term "optimization," and I think a ... (read more)

A sketch of acausal trade in practice

The "pure" case of complete causal separation, as with civilizations in separate regions of a multiverse, is an edge case of acausal trade that doesn't reflect what the vast majority of real-world examples look like. You don't need to speculate about galactic-scale civilizations to see what acausal trade looks like in practice: ordinary trade can already be modeled as acausal trade, as can coordination between ancestors and descendants. Economic and moral reasoning already have elements of superrationality to the extent that they rely on concepts such as i... (read more)

Decaeneus's Shortform