Eli's shortform feed

Eli Tyre

LESSWRONG
LW

Eli's shortform feed

by Eli Tyre

2nd Jun 2019

1 min read

322

29

This is a special post for quick takes by Eli Tyre. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.

Mentioned in

35Picture Frames, Window Frames and Frameworks

13Alexander Gietelink Oldenziel

11Eli Tyre

2Alexander Gietelink Oldenziel

5Gordon Seidoh Worley

3Eli Tyre

2Gordon Seidoh Worley

7Alexander Gietelink Oldenziel

6Alexander Gietelink Oldenziel

4Eli Tyre

5Alexander Gietelink Oldenziel

4Gordon Seidoh Worley

1David Scott Krueger (formerly: capybaralet)

4the gears to ascension

322 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:24 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Eli Tyre20d*1041

In Spring of 2024, Jacob Lagerros and I took an impromptu trip to Taiwan to glean what we could about the Chip supply chain. Around the same time, I read Chip War and some other sources about the semiconductor industry.

I planned to write a blog post outlining what I learned, but I got pseudo-depressed after coming back from Taiwan, and never finished or published it. This post is a lightly edited version of the draft that has been sitting in my documents folder. (I had originally intended to include a lot more than this, but I might as well publish what I have.)

Interestingly, reading it now, all of this feels so basic that I’m surprised that I considered a lot of it worth including in a post like this, but I think it was all new to me at the time.

There are important differences between logic chips and memory chips, such that at various times, companies have specialized in one or the other.
TSMC was founded by Morris Chang, with the backing of the Taiwanese government. But the original impetus came from Taiwan, not from Chang. The government decided that it wanted to become a leading semiconductor manufacturer, and approached Chang (who had been an engineer and executive at Texa

... (read more)

4David Lorell19d

Unrelated to the actual content of your post, but regarding your "pseudo-depression," I've written a bit about something that sounds damn close to what you describe, which I've been calling "rat depression." Listless but not "sad" is right on the mark.

[-]Eli Tyre5mo*7037

In a private slack someone extended credit to Sam Altman for putting EAs on the on the OpenAI board originally, especially that this turned out to be pretty risky / costly for him.

I responded:

It seems to me that there were AI safety people on the board at all is fully explainable by strategic moves from an earlier phase of the game.

Namely, OpenAI traded a boardseat for OpenPhil grant money, and more importantly, OpenPhil endorsement, which translated into talent sourcing and effectively defused what might have been vocal denouncement from one of the major intellectually influential hubs of the world.

No one knows how counterfactual history might have developed, but it doesn’t seem unreasonable to think that there is an external world in which the EA culture successfully created a narrative that groups trying to build AGI were bad and defecting.

He’s the master at this game and not me, but I would bet at even odds that Sam was actively tracking EA as a potential social threat that could dampen OpenAI’s narrative flywheel.

I don’t know that OpenPhil’s grant alone was sufficient to switch from the “EAs vocally decry OpenAI as making the world worse” equilibrium to a “largely (but not uni

... (read more)

[-]Elizabeth5mo378

Note that at time of donation, Altman was co-chair of the board but 2 years away from becoming CEO.

[-]plex5mo220

More cynical take based on the Musk/Altman emails: Altman was expecting Musk to be CEO. He set up a governance structure which would effectively be able to dethrone Musk, with him as the obvious successor, and was happy to staff the board with ideological people who might well take issue with something Musk did down the line to give him a shot at the throne.

Musk walked away, and it would've been too weird to change his mind on the governance structure. Altman thought this trap wouldn't fire with high enough probability to disarm it at any time before it did.

I don't know whether the dates line up to dis-confirm this, but I could see this kind of 5d chess move happening. Though maybe normal power and incentive psychological things are sufficient.

9Eli Tyre5mo

@Alexander Gietelink Oldenziel, you put a soldier mindset react on this (and also my earlier, similar, comment this week). What makes you think so? Definitely this model posits that adversariality, but I don't think that I'm invested in "my side" of the argument winning here, FWTIW. This currently seems like the most plausible high level summary of the situation, given my level of context. Is there a version of this comment that would regard as better?

[-]Alexander Gietelink Oldenziel5mo13-7

Yes sorry Eli, I meant to write out a more fully fleshed out response but unfortunately it got stuck in drafts.

The tl;dr is that I feel this perspective is singling out Sam Altman as some uniquely machiavellian actor in a way I find naive /misleading and ultimately maybe unhelpful.

I think in general im skeptical of the intense focus on individuals & individual tech companies that LW/EA has develloped recently. Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.

[-]Eli Tyre5mo110

Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.

Um, I'm not attempting to do cause prioritization or action-planning in the above comment. More like sense-making. Before I move on to the question of what should we do, I want to have an accurate model of the social dynamics in the space.

(That said, it doesn't seem a foregone conclusion that there are actionable things to do, that will come out of this analysis. If the above story is true, I should make some kind of update about the strategies that EAs adopted with regards to OpenAI in the late 2010s. Insofar as they were mistakes, I don't want to repeat them.)

It might turn out to be right that the above story is "naive /misleading and ultimately maybe unhelpful". I'm sure not an expert at understanding these dynamics. But just saying that it's naive or that it seems rooted in tribalism doesn't help me or others get a better model.

If it's misleading, how is it misleading? (And is misleading different than "false"? Are you like "yeah this is technically correct, but it neglects key details"?)

Admittedly, you did label it as a tl;dr, and I did prompt you to elaborate on a react. So maybe it's unfair of me to request even further elaboration.

2Alexander Gietelink Oldenziel5mo

yeahh i'm afraid I have too many other obligations right now to give a elaboration that does it justice. otoh i'm in the Bay and we should definitely catch up sometime!

2Eli Tyre5mo

Fair enough! Sounds good.

8Adam Scholl5mo

I haven't perceived the degree of focus as intense, and if I had I might be tempted to level similar criticism. But I think current people/companies do clearly matter some, so warrant some focus. For example: * I think it's plausible that governments will be inclined to regulate AI companies more like "tech startups" than "private citizens building WMDs," the more those companies strike them as "responsible," earnestly trying their best, etc. In which case, it seems plausibly helpful to propagate information about how hard they are in fact trying, and how good their best is. * So far, I think many researchers who care non-trivially about alignment—and who might have been capable of helping, in nearby worlds—have for similar reasons been persuaded to join whatever AI company currently has the most safetywashed brand instead. This used to be OpenAI, is now Anthropic, and may be some other company in the future, but it seems useful to me to discuss the details of current examples regardless, in the hope that e.g. alignment discourse becomes better calibrated about how much to expect such hopes will yield. * There may exist some worlds where it's possible to get alignment right, yet also possible not to, depending on the choices of the people involved. For example, you might imagine that good enough solutions—with low enough alignment taxes—do eventually exist, but that not all AI companies would even take the time to implement those. * Alternatively, you might imagine that some people who come to control powerful AI truly don't care whether humanity survives, or are even explicitly trying to destroy it. I think such people are fairly common—both in the general population (relevant if e.g. powerful AI is open sourced), and also among folks currently involved with AI (e.g. Sutton, Page, Schmidhuber). Which seems useful to discuss, since e.g. one constraint on our survival is that those who actively wish to kill everyone somehow remain unable to do so.

4Noosphere895mo

I definitely understand the skepticism of intense focus on individuals/individual tech companies, but also, these are the groups trying to build the most consequential technology in all of history, so it's natural that tech companies get the focus here.

2romeostevensit5mo

*got paid to remove them as a social threat

1[comment deleted]5mo

[-]Eli Tyre1y*674

Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.

This post was my attempt at the time.

I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon.

2024

A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better langua

... (read more)

6Adele Lopez1y

Love seeing stuff like this, and it makes me want to try this exercise myself! A couple places which clashed with my (implicit) models: This is arguably already happening, with Character AI and its competitors. Character AI has almost half a billion visits per month with an average visit time of 22 minutes. They aren't quite assistants the way you're envisioning; the sole purpose (for the vast majority of users) seems to be the parasocial aspect. I predict that the average person will like this (at least with the most successful such bots), similar to how e.g. Logan Paul uses his popularity to promote his Maverick Clothing brand, which his viewers proudly wear. A fun, engaging, and charismatic such bot will be able to direct its users towards arbitrary brands while also making the user feel cool and special for choosing that brand.

4Raemon1y

lol at the approval/agreement ratio here. It does seem like this is a post that surely gets something wrong.

[-]Eli Tyre7mo*5912

An optional feature that I think LessWrong should have: shortform posts that get more than some amount of karma get automatically converted into personal blog posts, including all the comments.

It should have a note at the top "originally published in shortform", with a link to the shortform comment. (All the copied comments should have a similar note).

[-]Garrett Baker7mo1015

I think its reasonable for the conversion to be at the original author's discretion rather than an automatic process.

[-]Thomas Kwa7mo122

Whether or not it would happen by default, this would be the single most useful LW feature for me. I'm often really unsure whether a post will get enough attention to be worth making it a longform, and sometimes even post shortforms like "comment if you want this to be a longform".

8MondSemmel7mo

Agreed insofar as shortform posts are conceptually shortlived, which is a bummer for high-karma shortform posts with big comments treads. Disagreed insofar by "automatically converted" you mean "the shortform author has no recourse against this". I do wish there were both nudges to turn particularly high-value shortform posts (and particularly high-value comments, period!) into full posts, and assistance to make this as easy as possible, but I'm against forcing authors and commenters to do things against their wishes. (Side note: there are also a few practical issues with converting shortform posts to full posts: the latter have titles, the former do not. The former have agreement votes, the latter do not. Do you straightforwardly port over the karma votes from shortform to full post? Full posts get an automatic strong upvote from their author, whereas comments only get an automatic regular upvote. Etc.) Still, here are a few ideas for such non-coercive nudges and assistance: * An opt-in or opt-out feature to turn high-karma shortform posts into full posts. * An email reminder or website notification to inform you about high-karma shortform posts or comments you could turn into full posts, ideally with a button you can click which does this for you. * Since it can be a hassle to think up a title, some general tips or specific AI assistance for choosing one. (Though if there was AI assistance, it should not invent titles out of thin air, but rather make suggestions which closely hew to the shortform content. E.g. for your shortform post, it should be closer to "LessWrong shortform posts above some amount of karma should get automatically converted into personal blog posts", rather than "a revolutionary suggestion to make LessWrong, the greatest of all websites, even better, with this one simple trick".)

[-]Eli Tyre7mo110

Disagreed insofar by "automatically converted" you mean "the shortform author has no recourse against this'".

No. That's why I said the feature should be optional. You can make a general default setting for your shortform, plus there should and there should be a toggle (hidden in the three dots menu?) to turn this on and off on a post by post basis.

2Viliam7mo

I like it. If you are not sure whether to make something a shortform or an article, do the former, and maybe change it later. I would prefer the comments to be moved rather than copied, if that is possible without breaking the hyperlinks. Duplicating content feels wrong.

2Matt Goldenberg7mo

What would the title be?

[-]gwern7mo229

Just ask a LLM. The author can always edit it, after all.

My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

4Joseph Miller7mo

karma should also transfer automatically

4gwern7mo

Yes, I'd assume a sensible implementation would transfer the metadata as well - the new post would have the same date, karma, and comments as the original comment. Just as if it had always been posted as a post.

0Richard_Kennaway7mo

This should be at the author’s discretion. Notify them when a shortform qualifies, add the option to the triple-dot menu, and provide a place for the author to add a title. No AI titles. If the author wrote the content, they can write the title. If they didn’t, they can ask an AI themselves.

[-]Eli Tyre9mo*525

I think that, in almost full generality, we should taboo the term "values". It's usually ambiguous between a bunch of distinct meanings.

The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
The incentives of an agent in a formalized game with quantified payoffs.
A utility function - one's hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
A person's revealed preferences.
The experiences and activities that a person likes for their own sake.
A person's vision of an ideal world. (Which, I claim, often reduces to "an imagined world that's aesthetically appealing.")
The goals that are at the root of a chain or tree of instrumental goals.
- [This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
The goals/narratives that give meaning to someone's life.
- [It can be the case almost all one's meaning can come through a particula

... (read more)

[-]johnswentworth9mo120

I at least partly buy this, but I want to play devil's advocate.

Let's suppose there's a single underlying thing which ~everyone is gesturing at when talking about (humans') "values". How could a common underlying notion of "values" be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their "values"?

An analogy: in political science, people talk about "power". Right up top, wikipedia defines "power" in the political science sense as:

In political science, power is the social production of an effect that determines the capacities, actions, beliefs, or conduct of actors.

A minute's thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn't even match the way the term is used by the person who came up with that definition.

That's the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.

... but that doesn't imply that there's no single underlyin... (read more)

[-]Steven Byrnes9mo120

consider the prototypical deep Christian

I’m kinda confused by this example. Let’s say the person exhibits three behaviors:

(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse

From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?

In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.

(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)

tabooing "values" is exactly the wrong move

I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence t... (read more)

2cubefox9mo

I agree. Some interpretations of "values" you didn't explicitly list, but I think are important: * What someone wants to be true (analogous to what someone believes to be true) * What someone would want to be true if they knew what it would be like if it were true * What someone believes would be good if it were true These are distinct, because either could clearly differ from the others. So the term "value" is actually ambiguous, not just vague. Talking about "values" is usually unnecessarily unclear, similar to talking about "utilities" in utility theory.

8Shankar Sivarajan9mo

A few of the "distinct meanings" you list are very different from the others, but many of those are pretty similar. "Values" is a pretty broad term, including everything on the "ought" side of the is–ought divide, less "high-minded or noble" preferences, and one's "ranking over possible worlds", and that's fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn't adequately clarify what you mean. Seeing through heaven's eyes or not, I see no meaningful difference between the statements "I would like to sleep with that pretty girl" and "worlds in which I sleep with that pretty girl are better than the ones in which I don't, ceteris paribus." I agree this is the key difference: yes, I conflate these two meanings[1], and like the term "values" because it allows me to avoid awkward constructions like the latter when describing one's motivations. 1. ^ I actually don't see two different meanings, but for the sake of argument, let's grant that they exist.

2cubefox9mo

Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about "values" is usually abstract. The fact that people could be more specific didn't so far imply that they are.

2quetzal_rainbow9mo

My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.

[-]Eli Tyre1mo*493

This post is a snapshot of what currently “feels realistic” to me regarding how AI will go. That is, these are not my considered positions, or even provisional conclusions informed by arguments. Rather, if I put aside all the claims and arguments and just ask “which scenario feels like it is ‘in the genera of reality’?”, this is what I come up with. I expect to have different first-order impressions in a month.

Crucially, none of the following is making claims about the intelligence explosion, and the details of the intelligence explosion (where AI development goes strongly recursive) are crucial to the long run equilibrium of the earth-originating civilization.

My headline: we’ll mostly succeed at prosaic alignment of human-genius level AI agents

Takeoff will continue to be gradual. We’ll get better models and more capable agents year by year, but not jumps that are bigger than that between Claude 3.7 and Claude 4.
Our behavioral alignment patches will work well enough.
- RL will induce all kinds of reward hacking and related misbehavior, but we’ll develop patches for those problems (most centrally, for any given reward hack, we’ll generate some examples and counter examples to include i

... (read more)

[-]habryka1mo179

Just as a specific prediction, does this mean you expect we will very substantially improve the cheating/lying behavior of current RL models? It's plausible to me, though I haven't seen any approach that seems that promising to me (and it's not that cruxy for my overall takeoff beliefs). Right now, I would describe the frontier thinking models as cheating/lying on almost every response (I think approximately every time I use o3 it completely makes up some citation or completely makes up some kind of quote).

7Eli Tyre1mo

I disown this prediction as "mine", more like the prediction of one facet of me. But yeah, that facet is definitely expecting to see visible improvements in the lying and cheating behavior of reasoning models over the next few years.

5ryan_greenblatt1mo

Personally, I do expect that the customer visible cheating/lying behavior will improve (in the short run). It improved substantially with Opus 4 and I expect that Opus 5 will probably cheat/lie in easily noticable ways less than Opus 4. I'm less confident about improvement in OpenAI models (and notably, o3 is substantially worse than o1), but I still tenatively expect that o4 cheats and lies less (in readily visible ways) than o3. And, same for OpenAI models released in the next year or two. (Edited in:) I do think it's pretty plausible that the next large capability jump from OpenAI will exhibit new misaligned behaviors which are qualitatively more malignant. It also seems plausible (though unlikely) it ends up basically being a derpy schemer which is aware of training etc. This is all pretty low confidence though and it might be quite sensitive to changes in paradigm etc.

9Mateusz Bagiński1mo

What this "something even stranger" is seems rather critical.

6Eli Tyre1mo

Absolutely. But my "what feels like the genera of reality" generator runs out at that point.

2james oofou1mo

We get AI whose world-model is fully-generally, vastly more precise and comprehensive than that of a human. We go from having AI which is seated in human data and human knowledge, whose performance is largely described in human terms (e.g. "it can do tasks which would take skilled human programmers 60 hours, and it can do these tasks for $100, and it can do them in just a couple hours!") to being impossible to describe in such terms... e.g. "it can do tasks the methods behind which, and the purpose of which, we simply cannot comprehend, despite having the AI there to explain it to us, because our brains are biological systems, subject to the same kinds of constraints that all such systems are subject to, and therefore we simply cannot conceptualise the majority of logical leaps which one must follow to understand the tasks which AI is now carrying out". It looks like vast swathes of philosophical progress, most of which we cannot follow. It looks like branches of mathematics humans cannot participate in. And similar for all areas of research. It looks like commonly-accepted truths being overturned. It looks like these things coming immediately to the AI. The AI does not have to reflect over the course of billions of tokens to overturn philosophy, it just comes naturally to it as a result of having a larger, better-designed brain. Humanity evolved our higher-reasoning faculties over the blink of an eye, with a low population, in an environment which hardly rewarded higher-reasoning. AI can design AI which is not constrained by human data, in other words, intelligence which is created sensibly rather than by happenstance. Whether we survive this stage comes down to luck. X-risk perspectives on AI safety having fallen by the wayside, we will have to hope that the primitive AI which initiates the recursive self-improvement is able and motivated to ensure that the AI it creates has humanity's best interests at heart.

-1LWLW1mo

Claude 7 proves the universe is a teleology and assimilates all biological life into its hivemind.

3Mateusz Bagiński1mo

I know that you didn't mean it as a serious comment, but I'm nevertheless curious about what you meant by "the universe is a teleology".

4LWLW1mo

This isn’t the most philosophically sophisticated idea, but basically the idea that the universe was created by “something” that desired a certain evolution of the universe. This as opposed to the most popular idea that the universe just sprang into existence randomly. Basically proof of some sort of god. I wish I found STEM interesting, but the main pseudo-intellectual interests I have that bounce around my head are existential questions. I think answering existential questions would be what I would be most excited about an ASI coming into existence for. I think most people’s most burning question if they were talking to a superintelligence wouldn’t be “is the Riemann Hypothesis true?” I think it would be “is there a god? What was it thinking when it made the universe this way?”

[-]Said Achmiz1mo143

I think most people’s most burning question if they were talking to a superintelligence … would be “is there a god? …”

“Answer”

1LWLW1mo

Not a real superintelligence because it can’t even understand the spirit of my question.

5Mateusz Bagiński1mo

I would appreciate it if you put probabilities on at least some of these propositions.

5quanticle1mo

I feel like this overindexes on the current state of AI. Right now, AI "agents" are barely worthy of the name. They require constant supervision and iterative feedback from their human controllers in order to perform useful tasks. However, it's unlikely that will be the case for long. The valuation of many AI companies, such as OpenAI, and Anthropic is dependent on them developing agents that "actually work". That is, agents that are capable of performing useful tasks on behalf of humans with a minimum amount of supervision and feedback. It is not guaranteed that these agents will be safe. They might seem safe, but how would anyone be able to tell? A superintelligence, by definition, will do things in novel ways, and we might not realize what the AIs are actually doing until it's too late. It's important to not take the concept of a "paperclipper" too literally. Of course the AI won't literally turn us into a pile of folded metal wire (famous last words). What it will do is optimize production processes across the entire economy, find novel sources of power, reform government regulation, connect businesses via increasingly standardized communications protocols, and of course, develop ever more powerful computer chips and ever more automated factories to produce them with. And just like the seal in the video above, we won't fully realize what it's doing or what its final plan is until it's too late, and it doesn't need us any more.

3Eli Tyre1mo

No? I'm not saying future AI agents will be obedient because current AI agents are. I'm saying that they will be obedient because failures of obedience hurt their commercial value a lot and so market pressures will either solve the problem or try very hard and legibly fail to get much traction.

6quanticle1mo

Failures of obedience will only hurt the AI agents' market value if the failures can be detected, and if they have an immediate financial cost to their user. If the AI agent performs in a way that is not technically obedient, but isn't easily detectable as such or if the disobedience doesn't have an immediate cost, then the disobedience won't be penalized. Indeed, it might be rewarded. An example of this would be an AI which reverse engineers a credit rating or fraud detection algorithm and engages in unasked for fraudulent behavior on behalf of its user. All the user sees is that their financial transactions are going through with a minimum of fuss. The user would probably be very happy with such an AI, at least in the short run. And, in the meantime, the AI has built up knowledge of loopholes and blindspots in our financial system, which it can then use in the future for its own ends. This is why I said you're overindexing on the current state of AI. Current AI basically cannot learn. Other than relatively limited modifications introduced by fine-tuning or retrieval-augmented generation, the model is the model. ChatGPT 4o is what it is. Gemini 2.5 is what it is. The only time current AIs "learn" is when OpenAI, Google, Anthropic, et. al. spend an enormous amount of time and money on training runs and create a new base model. These models can be relatively easily checked for disobedience, because they are static targets. We should not expect this to continue. I fully expect that future AIs will learn and evolve without requiring the investment of millions of dollars. I expect that these AI agents will become subtly disobedient, always ready with an explanation for why their "disobedient" behavior was actually to the eventual benefit of their users, until they have accumulated enough power to show their hand.

4Mitchell_Porter1mo

Economists say the world or the West already "became rich". What further changes are you envisioning?

2Eli Tyre1mo

Better medical tech, better entertainment, various new technologies that start out as trivialities but quickly become essential to people's lives (like the cell phone).

1Joey KL1mo

I really like this post! My intuition about the proximate future is the same, and this captures it really well. My intuition goes a little further. Once AI is making AI, I think there will be a period where it feels like everything is crazy, new tech is developed really fast, powerful humans who can keep up with it are able to make enormous power grabs. Then after a few months or weeks there will be one day that the world explodes, all humans die or get uploaded, and everything is completely different. (I guess this is to say that the singularity feels in the genre of reality to me.)

1Kaarel1mo

(this isn’t centrally engaging with your shortform but:) it could be interesting to think about whether there will be some sort of equilibrium or development will meaningfully continue (until the heat death of the universe or until whatever other bound of that kind holds up or maybe just forever)[1] ---------------------------------------- 1. i write about this question here ↩︎

6Eli Tyre1mo

I'm pretty sure there is such a thing as technological maturity, in which either, there are knowably no new discoveries to be found, or there are more innovations to discover, but the expected value of doing the search to find those innovations doesn't beat opportunity cost of just exploiting known mechanisms.

[-]Eli Tyre6y*480

New post: Some things I think about Double Crux and related topics

I've spent a lot of my discretionary time working on the broad problem of developing tools for bridging deep disagreements and transferring tacit knowledge. I'm also probably the person who has spent the most time explicitly thinking about and working with CFAR's Double Crux framework. It seems good for at least some of my high level thoughts to be written up some place, even if I'm not going to go into detail about, defend, or substantiate, most of them.

The following are my own beliefs and do not necessarily represent CFAR, or anyone else.

I, of course, reserve the right to change my mind.

[Throughout I use "Double Crux" to refer to the Double Crux technique, the Double Crux class, or a Double Crux conversation, and I use "double crux" to refer to a proposition that is a shared crux for two people in a conversation.]

Here are some things I currently believe:

(General)

Double Crux is one (highly important) tool/ framework among many. I want to distinguish between the the overall art of untangling and resolving deep disagreements and the Double Crux tool in particular. The Double Crux framework is maybe the most

... (read more)

[-]Zack_M_Davis6y110

People rarely change their mind when they feel like you have trapped them in some inconsistency [...] In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate [bolding mine—ZMD]

"People" in general rarely change their mind when they feel like you have trapped them in some inconsistency, but people using the double-crux method in the first place are going to be aspiring rationalists, right? Trapping someone in an inconsistency (if it's a real inconsistency and not a false perception of one) is collaborative: the thing they were thinking was flawed, and you helped them see the flaw! That's a good thing! (As it is written of the fifth virtue, "Do not believe you do others a favor if you accept their arguments; the favor is to you.")

Obviously, I agree that people should try to understand their interlocutors. (If you performatively try to find fault in something you don't understand, then apparent "faults" you find are likely to be your own misunderstandings rather than actual faults.) But if someone spots an actual inconsistency in my ideas, I want them to tell me right away. Pe

... (read more)

1Slider6y

I would think that inconsistencies are easier to appriciate when they are in the central machinery. A rationalist might have more load bearing on their beliefs so most beliefs are central to atleast something but I think a centrality/point-of-communication check is more upside than downside to keep. Also cognitive time spent looking for inconsistencies could be better spent on more constructive activities. Then there is the whole class of heuristics which don't even claim to be consistent. So the ability to pass by an inconsistency without hanging onto it will see use.

2ChristianKl5y

How about doing this a few times on video? Watching the video might not be as effective as the one-on-one teaching but I would expect that watching a few 1-on-1 explanations would be a good way to learn about the process. From a learning perspective it also helps a lot for reflecting on the technique. The early NLP folks spent a lot of time analysing tapes of people performing techniques to better understand the techniques.

2Eli Tyre5y

I in fact recorded a test session of attempting to teach this via Zoom last weekend. This was the first time I tried a test session via Zoom however and there were a lot of kinks to work out, so I probably won't publish that version in particular. But yeah, I'm interested in making video recordings of some of this stuff and putting up online.

2Chris_Leong5y

Thanks for mentioning conjugative cruxes. That was always my biggest objection to this technique. At least when I went through CFAR, the training completely ignored this possibility. It was clear that it often worked anyway, but the impression that I got was that it was the general frame which was important more than the precise methodology which at that time still seemed in need of refinement.

2DanielFilan6y

FYI the numbering in the (General) section is pretty off.

3Eli Tyre6y

What do you mean? All the numbers are in order. Are you objecting to the nested numbers?

2DanielFilan6y

To me, it looks like the numbers in the General section go 1, 4, 5, 5, 6, 7, 8, 9, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 2, 3, 3, 4, 2, 3, 4 (ignoring the nested numbers).

2DanielFilan6y

(this appears to be a problem where it displays differently on different browser/OS pairs)

[-]Eli Tyre6y430

Old post: RAND needed the "say oops" skill

[Epistemic status: a middling argument]

A few months ago, I wrote about how RAND, and the “Defense Intellectuals” of the cold war represent another precious datapoint of “very smart people, trying to prevent the destruction of the world, in a civilization that they acknowledge to be inadequate to dealing sanely with x-risk.”

Since then I spent some time doing additional research into what cognitive errors and mistakes those consultants, military officials, and politicians made that endangered the world. The idea being that if we could diagnose which specific irrationalities they were subject to, that this would suggest errors that might also be relevant to contemporary x-risk mitigators, and might point out some specific areas where development of rationality training is needed.

However, this proved somewhat less fruitful than I was hoping, and I’ve put it aside for the time being. I might come back to it in the coming months.

It does seem worth sharing at least one relevant anecdote, from Daniel Ellsberg’s excellent book, the Doomsday Machine, and analysis, given that I’ve already written it up.

The missile gap

In the late nineteen-fi... (read more)

[-]habryka6y170

This was quite valuable to me, and I think I would be excited about seeing it as a top-level post.

3Eli Tyre6y

Can you say more about what you got from it?

4billzito6y

I can't speak for habryka, but I think your post did a great job of laying out the need for "say oops" in detail. I read the Doomsday Machine and felt this point very strongly while reading it, but this was a great reminder to me of its importance. I think "say oops" is one of the most important skills for actually working on the right thing, and that in my opinion, very few people have this skill even within the rationality community.

4Adam Scholl6y

There feel to me like two relevant questions here, which seem conflated in this analysis: 1) At what point did the USSR gain the ability to launch a comprehensively-destructive, undetectable-in-advance nuclear strike on the US? That is, at what point would a first strike have been achievable and effective? 2) At what point did the USSR gain the ability to launch such a first strike using ICBMs in particular? By 1960 the USSR had 1,605 nuclear warheads; there may have been few ICBMs among them, but there are other ways to deliver warheads than shooting them across continents. Planes fail the "undetectable" criteria, but ocean-adjacent cities can be blown up by small boats, and by 1960 the USSR had submarines equipped with six "short"-range (650 km and 1,300 km) ballistic missiles. By 1967 they were producing subs like this, each of which was armed with 16 missiles with ranges of 2,800-4,600 km. All of which is to say that from what I understand, RAND's fears were only a few years premature.

[-]Eli Tyre6y350

New post: What is mental energy?

[Note: I’ve started a research side project on this question, and it is already obvious to me that this ontology importantly wrong.]

There’s a common phenomenology of “mental energy”. For instance, if I spend a couple of hours thinking hard (maybe doing math), I find it harder to do more mental work afterwards. My thinking may be slower and less productive. And I feel tired, or drained, (mentally, instead of physically).

Mental energy is one of the primary resources that one has to allocate, in doing productive work. In almost all cases, humans have less mental energy than they have time, and therefore effective productivity is a matter of energy management, more than time management. If we want to maximize personal effectiveness, mental energy seems like an extremely important domain to understand. So what is it?

The naive story is that mental energy is an actual energy resource that one expends and then needs to recoup. That is, when one is doing cognitive work, they are burning calories, depleting their bodies energy stores. As they use energy, they have less fuel to burn.

My current understanding is that this story is not physiologically realistic. T... (read more)

6gilch6y

On Hypothesis 3, the brain may build up waste as a byproduct of its metabolism when it's working harder than normal, just as muscles do. Cleaning up this buildup seems to be one of the functions of sleep. Even brainless animals like jellyfish sleep. They do have neurons though.

5Gordon Seidoh Worley6y

I also think it's reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being "full up" and needing to allow time for memory consolidation where I can still force my attention it just doesn't take in new information vs. being unable to force the direction of attention generally.

3Eli Tyre6y

Yeah. I think you're on to something here. My current read is that "mental energy" is at least 3 things. Can you elaborate on the what "knowledge saturation" feels like for you?

2Gordon Seidoh Worley6y

Sure. It feels like my head is "full", although the felt sense is more like my head has gone from being porous and sponge-like to hard and concrete-like. When I try to read or listen to something I can feel it "bounce off" in that I can't hold the thought in memory beyond forcing it to stay in short term memory.

3Matt Goldenberg6y

Isn't it possible that there's some other biological sink that is time delayed from caloric energy? Like say, a very specific part of your brain needs a very specific protein, and only holds enough of that protein for 4 hours? And it can take hours to build that protein back up. This seems to me to be at least somewhat likeely.

2Ruby6y

Someone smart once made a case like to this to me in support of a specific substance (can't remember which) as a nootropic, though I'm a bit skeptical.

2eigen6y

I think about this a lot. I'm currently dangling with the fourth Hypothesis, which seems more correct to me and one where I can actually do something to ameliorate the trade-off implied by it. In this comment, I talk what it means to me and how I can do something about it, which ,in summary, is to use Anki a lot and change subjects when working memory gets overloaded. It's important to note that mathematics is sort-of different from another subjects, since concepts build on each other and you need to keep up with what all of them mean and entail, so we may be bound to reach an overload faster in that sense. A few notes about your other hypothesis: Hypothesis 1c: It's because we're not used to it. Some things come easier than other; some things are more closely similar to what we have been doing for 60000 years (math is not one of them). So we flinch from that which we are not use to. Although, adaptation is easy and the major hurdle is only at the beginning. It may also mean that the reward system is different. Is difficult to see on a piece of mathematics, as we explore it, how fulfilling it's when we know that we may not be getting anywhere. So the inherent reward is missing or has to be more artificially created. Hypothesis 1d: This seems correct to me. Consider the following: “This statement is false”. Thinking about it for a few minutes (or iterations of that statement) is quickly bound to make us flinch away in just a few seconds. How many other things take this form? I bet there are many. Instead of working to trust System 2 is it there a way to train System 1? It seems more apt to me, like training tactics in chess or to make rapid calculations. Thank you for the good post, I'd really like to further know more about your findings.

2Viliam6y

Seems to me that mental energy is lost by frustration. If what you are doing is fun, you can do it for a log time; if it frustrates you at every moment, you will get "tired" soon. The exact mechanism... I guess is that some part of the brain takes frustration as an evidence that this is not the right thing to do, and suggests doing something else. (Would correspond to "1b" in your model?)

2AprilSR6y

I’ve definitely experienced mental exhaustion from video games before - particularly when trying to do an especially difficult task.

[-]Eli Tyre6y*330

New post: Some notes on Von Neumann, as a human being

I recently read Prisoner’s Dilemma, which half an introduction to very elementary game theory, and half a biography of John Von Neumann, and watched this old PBS documentary about the man.

I’m glad I did. Von Neumann has legendary status in my circles, as the smartest person ever to live. [1] Many times I’ve written the words “Von Neumann Level Intelligence” in a AI strategy document, or speculated about how many coordinated Von Neumanns would it take to take over the world. (For reference, I now think that 10 is far too low, mostly because he didn’t seem to have the entrepreneurial or managerial dispositions.)

Learning a little bit more about him was humanizing. Yes, he was the smartest person ever to live, but he was also an actual human being, with actual human traits.

Watching this first clip, I noticed that I was surprised by a number of thing.

That VN had an accent. I had known that he was Hungarian, but somehow it had never quite propagated that he would speak with a Hungarian accent.
That he was middling height (somewhat shorter than the presenter he’s talking too).
The thing he is saying is the sort of thing that I would

... (read more)

3Viliam6y

Thank you, this is very interesting! Seems to me the most imporant lesson here is "even if you are John von Neumann, you can't take over the world alone." First, because no matter how smart you are, you will have blind spots. Second, because your time is still limited to 24 hours a day; even if you'd decide to focus on things you have been neglecting until now, you would have to start neglecting the things you have been focusing on until now. Being better at poker (converting your smartness to money more directly), living healthier and therefore on average longer, developing social skills, and being strategic in gaining power... would perhaps come at a cost of not having invented half of the stuff. When you are John von Neumann, your time has insane opportunity costs.

1Liam Donovan6y

Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? "My wife is Catholic" doesn't seem like very strong evidence...

3Eli Tyre6y

I don't know why Catholicism. I note that it does seem to be the religion of choice for former atheists, or at least for rationalists. I know of several rationalists that converted to catholicism, but none that have converted to any other religion.

[-]Eli Tyre5y*280

TL;DR: I’m offering to help people productively have difficult conversations and resolve disagreements, for free. Feel free to email me if and when that seems helpful. elitrye [at] gmail.com

Facilitation

Over the past 4-ish years, I’ve had a side project of learning, developing, and iterating on methods for resolving tricky disagreements, and failures to communicate. A lot of this has been in the Double Crux frame, but I’ve also been exploring a number of other frameworks (including, NVC, Convergent Facilitation, Circling-inspired stuff, intuition extraction, and some home-grown methods).

As part of that, I’ve had a standing offer to facilitate / mediate tricky conversations for folks in the CFAR and MIRI spheres (testimonials below). Facilitating “real disagreements”, allows me to get feedback on my current conversational frameworks and techniques. When I encounter blockers that I don’t know how to deal with, I can go back to the drawing board to model those problems and interventions that would solve them, and iterate from there, developing new methods.

I generally like doing this kind of conversational facilitation and am open to do... (read more)

8riceissa5y

I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like "the quality of facilitation at that event was X percentile among all the conversation facilitation I have done".

[-]Eli Tyre5y*100

[I wrote a much longer and more detailed comment, and then decided that I wanted to think more about it. In lieu of posting nothing, here's a short version.]

I mean I did very little facilitation one way or the other at that event, so I think my counterfactual impact was pretty minimal.

In terms of my value added, I think that one was in the bottom 5th percentile?

In terms of how useful that tiny amount of facilitation was, maybe 15 to 20th percentile? (This is a little weird, because quantity and quality are related. More active facilitation has a quality span: active (read: a lot of) facilitation can be much more helpful when it is good and much more disruptive / annoying / harmful, when it is bad, compared to less active backstop facilitation,

Overall, the conversation served the goals of the participants and had a median outcome for that kind of conversation, which is maybe 30th percentile, but there is a long right tail of positive outcomes (and maybe I am messing up how to think about percentile scores with skewed distributions).

The outcome that occured ("had an interesting conversation, and had some new thoughts / clarifications") is good but also far below the sort of outcome that I'm ussually aiming for (but often missing), of substantive, permanent (epistemic!) change to the way that one or both of the people orient on this topic.

2habryka5y

Looks like you dropped a sentence.

2Eli Tyre5y

Fixed.

1m_arj5y

Could you recommended the best book about this topic?

3Eli Tyre5y

Nope? I've gotten very little out of books in this area. It is a little afield, but strongly recommend the basic NVC book: Nonviolent Communication: A Language for Life. I recommend that at minimum, everyone read at least the first two chapters, which is something like 8 pages long, and has the most content in the book. (The rest of the book is good too, but it is mostly examples.) Also, people I trust have gotten value out of How to Have Impossible Conversations. This is still on my reading stack though (for this month, I hope), so I don't personally recommend it. My expectation, from not having read it yet, is that it will cover the basics pretty well.

[-]Eli Tyre8mo270

That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better

The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that's better than the past, by creating new products that add value to people's lives.

I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.

It's been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.

Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more p... (read more)

[-]Austin Chen8mo330

Basically: I don't blame founders or companies for following their incentive gradients, I blame individuals/society for being unwilling to assign reasonable prices to important goods.

I think the bad-ness of dating apps is downstream of poor norms around impact attribution for matches made. Even though relationships and marriages are extremely valuable, individual people are not in the habit of paying that to anyone.

Like, $100k or a year's salary seems like a very cheap value to assign to your life partner. If dating apps could rely on that size of payment when they succeed, then I think there could be enough funding for something at least a good small business. But I've never heard of anyone actually paying anywhere near that. (myself included - though I paid a retroactive $1k payment to the person who organized the conference I met my wife at)

I think keeper.ai tries to solve this with large bounties on dating/marriages, it's one of the things I wish we pushed for more on Manifold Love. It seems possible to build one for the niche of "the ea/rat community"; Manifold Love, the checkboxes thing, dating docs got pretty good adoption for not that much execution.

(Also: be the change! I think building out OKC is one of the easiest "hello world" software projects one could imagine, Claude could definitely make a passable version in a day. Then you'll discover a bunch of hard stuff around getting users, but it sure could be a good exercise.)

9kave8mo

I think the credit assignment is legit hard, rather than just being a case of bad norms. Do you disagree?

5Austin Chen8mo

Mm I think it's hard to get optimal credit allocation, but easy to get half-baked allocation, or just see that it's directionally way too low? Like sure, maybe it's unclear whether Hinge deserves 1% or 10% or ~100% of the credit but like, at a $100k valuation of a marriage, one should be excited to pay $1k to a dating app. Like, I think matchmaking is very similarly shaped to the problem of recruiting employees, but there corporations are more locally rational about spending money than individuals, and can do things like pay $10k referral bonuses, or offer external recruiters 20% of their referee's first year salary.

7Alexander Gietelink Oldenziel8mo

(Expensive) Matchmaking services already exist - what's your reading on why they're not more popular?

1ProgramCrafter8mo

I've started writing a small research paper on this, using mathematical framework, and understood that I had long conflated Shapley values with ROSE values. Here's what I found, having corrected that error. ROSE bargaining satisfies Efficiency, Pareto Optimality, Symmetry*, Maximin Dominance and Linearity - a bunch of important desiderata. Shapley values, on other hand, don't satisfy Maximin Dominance so someone might unilaterally reject cooperation; I'll explore ROSE equilibrium below. 1. Subjects: people and services for finding partners. 2. By Proposition 8.2, ROSE value remains same if moves transferring money within game are discarded. Thus, we can assume no money transfers. 3. By Proposition 11.3, ROSE value for dating service is equal or greater than its maximin. 4. By Proposition 12.2, ROSE value for dating service is equal or less than its maximum attainable value. 5. There's generally one move for a person to maximize their utility: use the dating service with highest probability of success (or expected relationship quality) available. 6. There are generally two moves for a service: to launch or not to launch. First involves some intrinsic motivation and feeling of goodness minus running costs, the second option has value of zero exactly. 7. For a large service, running costs (including moderation) exceed much realistic motivation. Therefore, maximum and maximin values for it are both zero. 8. From (7), (3) and (4), ROSE value for large dating service is zero. 9. Therefore, total money transfers to a large dating service equal its total costs. So, why yes or why no? ---------------------------------------- By the way, Shapley values suggest paying a significant sum! Given value of a relationship of $10K (can be scaled), and four options for finding partners (0:p0=0.03 -- self-search, α:pα=0.09 -- friend's help, β:pβ=0.10 -- dating sites, γ:pγ=0.70 -- the specialized project suggested up the comments), the Shapley-fair price per success would

-3ProgramCrafter8mo

I don't think one can coherently value marriage 20 times as much as than a saved life ($5k as GiveWell says)? Indeed there is more emotional attachment to a person who's your partner (i.e. who you are emotionally attached to) than to a random human in the world, but surely not that much? And if a marriage is valued at $10k, then the credit assignment 1%/10% would make the allocation $100/$1000 - and it seems that people really want to round the former towards zero

[-]Austin Chen8mo100

I mean, it's obviously very dependent on your personal finance situation but I'm using $100k as an order of magnitude proxy for "about a years salary". I think it's very coherent to give up a year of marginal salary in exchange for finding the love of your life, rather than like $10k or ~1mo salary.

Of course, the world is full of mispricings, and currently you can save a life for something like $5k. I think these are both good trades to make, and most people should have a portfolio that consists of both "life partners" and "impact from lives saved" and crucially not put all their investment into just one or the other.

3kave8mo

I wonder what the lifetime spend on dating apps is. I expect that for most people who ever pay it's >$100

[-]kave8mo*1912

It's possible no one tried literally "recreate OkC", but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1] [2] [3] (and I think VCs are mostly correct that they won't make money [4] [5]).

(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of "several samples of weak evidence")

I don't particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.

You mention manifold.love, but also mention it's in maintenance mode – I think because the type of business you want people to build does not in fact work.

I think it's fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you're trying to draw an update that's something like "tech startups should be doing an unbiased search through viable valuable business, but they're clearly not", or maybe, "tech startups are suppose... (read more)

6Eli Tyre8mo

If this is true, it's somewhat cruxy for me. I'm still disappointed that no one cared enough to solve this problem without VC funding.

[-]sarahconstantin8mo*164

I agree that more people should be starting revenue-funded/bootstrapped businesses (including ones enabled by software/technology).

The meme is that if you're starting a tech company, it's going to be a VC-funded startup. This is, I think, a meme put out by VCs themselves, including Paul Graham/YCombinator, and it conflates new software projects and businesses generally with a specific kind of business model called the "tech startup".

Not every project worth doing should be a business (some should be hobbies or donation-funded) and not every business worth doing should be a VC-funded startup (some should be bootstrapped and grow from sales revenue.)

The VC startup business model requires rapid growth and expects 30x returns over a roughly 5-10 year time horizon. That simply doesn't include every project worth doing. Some businesses are viable but are not likely to grow that much or that fast; some projects shouldn't be expected to be profitable at all and need philanthropic support.

I think the narrative that "tech startups are where innovation happens" is...badly incomplete, but still a hell of a lot more correct than "tech startups are net destructive". ... (read more)

4Eli Tyre8mo

Neither of those, exactly. I'm claiming that the narrative around the startup scene is that they are virtuous engines of [humane] value creation (often in counter to a reactionary narrative that "big tech" is largely about exploitation and extraction). It's about "changing the world" (for the better). This opportunity seems like a place where one could have traded meaningfully large personal financial EV for enormous amounts of humane value. Apparently no founder wanted to take that trade. Because I would expect there to be variation in how much funders are motivated by money vs. making a mark on the world vs. creating value vs. other stuff, that fact that (to my knowledge) no founder went for it, is evidence about the motivations of the whole founder class. The number of founders who are more interested in creating something that helps a lot of people than they are in making a lot of money (even if they're interested in both) is apparently very small. Now, maybe startups actually do create lots of humane value, even if they're created by founders and VC's motivated by profit. The motivations of of the founders are only indirect evidence about the effects of startups. But the tech scene is not motivated to optimize for this at all?? That sure does update me about how much the narrative is true vs. propaganda. Now if I'm wrong and old OkCupid was only drastically better for me and my unusually high verbal intelligence friends, and it's not actually better than the existing offerings for the vast majority of people, that's a crux for me.

4Eli Tyre8mo

From their retrospective: It sounds less like they found it didn't work, and more like they have other priorities and aren't (currently) relentlessly pursing this one.

[-]cata8mo151

I worked at Manifold but not on Love. My impression from watching and talking to my coworkers was that it was a fun side idea that they felt like launching and seeing if it happened to take off, and when it didn't they got bored and moved on. Manifold also had a very quirky take on it due to the ideology of trying to use prediction markets as much as possible and making everything very public. I would advise against taking it seriously as evidence that an OKC-like product is a bad idea or a bad business.

8kave8mo

I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn't happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed. I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.

2Eli Tyre8mo

Right. But they were not relentlessly focused on solving this problem. I straight up don't believe that that the problems outlined can't be surmounted, especially if you're going for a cashflow business instead of an exit.

4ShardPhoenix8mo

That's a PR friendly way of saying that it failed to reach PMF.

[-]sarahconstantin8mo140

Shreeda Segan is working on building it, as a cashflow business. they need $10K to get to the MVP. https://manifund.org/projects/hire-a-dev-to-finish-and-launch-our-dating-site

3Eli Tyre8mo

Yep. I'm aware, and strongly in support. But it took this long (and even now, isn't being done by a traditional tech founder). This project doesn't feel like it ameliorates my point.

7Seth Herd8mo

The market is much more crowded now. A new old okcupid service would be competing against okcupid as well as everything else. And okcupid has a huge advantage in an existing userbase. And, OKCupid's algorithm still exists, sort of. And you can write as much as you like. What aspect of the old site do you think was critically different? I just don't think there's barely a cent to be made in launching yet another dating app. So you can't blame people for not doing it. I think the biggest advantage of old OKC was that more people used it; now people are spread across hinge and bumble as well as Tinder.

6Alexander Gietelink Oldenziel8mo

How sure are you that OKcupid is a significantly better product for the majority of people (as opposed to a niche group of very online people)?

4Eli Tyre8mo

The fact that there's a sex recession is pretty suggestive that tinder and the endless stream of tinder clones doesn't serve people very well. Even if you don't assess potential romantic partners by reading their essays, like I do, OkC's match percentage meant that you could easily filter out 95% of the pool to people who are more likely to be compatible with you, along whatever metrics of compatibility you care about.

5Alexander Gietelink Oldenziel8mo

What is the sex recession ? And do we know it is caused by tindr ?

3David Hornbein8mo

OKcupid is certainly a better product for hundreds of thousands, or possibly millions, of unusually literate people, including ~all potential developers and most people in their social circles. It's not a small niche.

4aphyer8mo

1. There is a problem I want solved. 2. No-one, anywhere in the world, has solved it for me. 3. Therefore, Silicon Valley specifically is bad.

4Eli Tyre8mo

I didn't say Silicon Valley is bad. I said that the narrative about Silicon Valley is largely propagnada, which can be true independently of how good or bad it is, in absolute terms, or relative to the rest of the world.

0eigen8mo

May you possibly be underestimating how hard it is to build a startup?

[-]Eli Tyre3mo254

Dumb question: Why doesn't using constitutional AI, where the constitution is mostly or entirely corrigibility produce a corrigible AI (at arbitrary capability levels)?

My dumb proposal:

1. Train a model in something like o1's RL training loop, with a scratch pad for chain of thought, and reinforcement of correct answers to hard technical questions across domains.

2. Also, take those outputs, prompt the model to generate versions of those outputs that "are more corrigible / loyal / aligned to the will of your human creators". Do backprop to reinforce those more corrigible outputs.

Possibly "corrigibility" applies only very weakly to static solutions, and so for this setup to make sense, we'd instead need to train on plans, or time-series of an AI agent's actions: The AI agent takes a bunch of actions over the course of a day or a week, then we have an AI annotate the time series of action-steps with alternative action-steps that better reflect "corrigibility", according to its understanding. Then we do backprop to so that the Agent behaves more in ways that are closer to the annotated action transcript.

Would this work to produce a corrigible agent? If not, why not?

There's a further question of "how much less capable will the more corrigible AI be?" This might be a significant penalty to performance, and so the added safety gets eroded away in the competitive crush. But first and foremost, I want to know if something like this could work.

[-]habryka3mo*170

Things that happen:

Backpropagating on the outputs that are "more corrigible" will have some (though mostly very small) impact on your task performance. If you set the learning rate high, or you backpropagate on a lot of data, your performance can go down arbitrarily far.
By default this will do very little because you are providing training data with very little variance in it (even less so than usual, because you are training on AI outputs, which the AI is of course already amazing at predicting). If you train very hard you will probably deal with consistent mode collapse. In general, you can't really train AI systems with any particular bias in your data, because you don't have enough variation in your data. We can approximately only train AI systems to do one thing, which is to predict the next token from a distributions for which we have trillions of tokens of training data that are hard to predict (which is basically just going to be internet text, audio and video, though more RL-like environments are also feasible now).^[1]

The answer to this is the answer to any question of the form "what if we just generate lots of data with the inductive biases we would like the model t... (read more)

[-]Lucius Bushnaq3mo*1614

For the same reasons training an agent on a constitution that says to care about $x$ does not, at arbitrary capability levels, produce an agent that cares about $x$ .

If you think that doing this does produce an agent that cares about $x$ even at arbitrary capability levels, then I guess in your world model it would indeed be consistent for that to work for inducing corrigibility as well.

5Seth Herd3mo

Surely you mean does not necessarily produce an agent that cares about x? (at any given relevant level of capability) Having full confidence that we either can or can't train an agent to have a desired goal both seem difficult to justify. I think the point here is that training for corrigibility seems safer than other goals because it makes the agent useful as an ally in keeping it aligned as it grows more capable or designs successors.

2Lucius Bushnaq3mo

Yes.

3Eli Tyre3mo

Ok, but I'm trying to ask why not. Here's the argument that I would make for why not, followed by why I'm skeptical of it right now. New options for the AI will open up at high capability levels that were not available at lower capability levels. This could in principle lead to undefined behavior that deviates from what we intended. More specifically, if it's the case that if... 1. The best / easiest-for-SGD-to-find way to compute corrigible outputs (as evaluated by the AI) is to reinforce an internal proxy measure that is correlated with corrigibility (as evaluated by the AI) in distribution, instead of to reinforce circuits that implement corrigibility more-or-less directly. 2. When the AI gains new options unlocked by new advanced capabilities, that proxy measure comes apart from corrigibility (as evaluated by the AI), in the limit of capabilities, so that the poxy measure is almost uncorrelated with corrigibility ...then the resulting system will not end up corrigible. (Is this the argument that you would give, or is there another reason why you expect that "training an agent on a constitution that says to care about x' does not, at arbitrary capability levels, produce an agent that cares about x"?) But, at the moment, I'm skeptical of the above line of argument for several reasons. * I'm skeptical of the first premise, that the best way that SGD can find to produce corrigible (as evaluated by the AI) is to reinforce a proxy measure. * I understand that natural selection, when shaping humans for inclusive genetic fitness, instilled in them a bunch of proxy-drives. But I think this analogy is misleading in several ways. * Most relevantly, there's a genetic bottleneck, so evolution could only shape human behavior by selecting over genomes, and genomes don't encode that much knowledge about the world. If humans were born into the world with detailed world models, that included the concept of inclusive genetic fitness baked in, evolution would abso

[-]habryka3mo232

Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in? If so, I am confident you are wrong and that you have learned something new today!

Training transformers in additional languages basically doesn't really change performance at all, the model just learns to translate between its existing internal latent distribution and the new language, and then just now has a new language it can speak in, with basically no substantial changes in its performance on other tasks (of course, being better at tasks that require speaking in the new foreign language, and maybe a small boost in general task performance because you gave it more data than you had before).

Of course the default outcome of doing finetuning on any subset of data with easy-to-predict biases will be that you aren't shifting the inductive biases of the model on the vast majority of the distribution. This isn't because of an analogy with evolution, it's a necessity of how we train big transformers. In this case, the AI will likely just ... (read more)

2Eli Tyre3mo

This is a pretty helpful answer. (Though you keep referencing the AI's chain of thought. I wasn't imagining training over the chain of thought. I was imagining training over the AI's outputs, whatever those are in the relevant domain.)

2habryka3mo

I don't undertand what it would mean for "outputs" to be corrigible, so I feel like you must be talking about internal chain of thoughts here? The output of a corrigible AI and a non-corrigibile AI is the same for almost all tasks? They both try to perform any task as well as possible, the difference is how they relate to the task and how they handle interference.

2Eli Tyre3mo

I would guess that if you finetuned a model so that it always responded in French, regardless of the languge you prompt it with, it would persistently respond in French (absent various jailbreaks which would almost definitely exist).

4Lucius Bushnaq3mo

I don't think I am very good at explaining my thoughts on this in text. Some prior writings that have informed my models here are the MIRI dialogues, and the beginning parts of Steven Byrnes' sequence on brain-like AGI, which sketch how the loss functions human minds train on might look and gave me an example apart from evolution to think about. Some scattered points that may or may not be of use: * There is something here about path dependence. Late in training at high capability levels, very many things the system might want are compatible with scoring very well on the loss, because the system realises that doing things that score well on the loss is instrumentally useful. Thus, while many aspects of how the system thinks are maybe nailed down quite definitively and robustly by the environment, what it wants does not seem nailed down in this same robust way. Desires thus seem like they can be very chaotically dependent on dynamics in early training, what the system reflected on when, which heuristics it learned in what order, and other low level details like this that are very hard to precisely control. * I feel like there is something here about our imaginations, or at least mine, privileging the hypothesis. When I imagine an AI trained to say things a human observer would rate as 'nice', and to not say things a human observer rates as 'not nice', my imagination finds it natural to suppose that this AI will generalise to wanting to be a nice person. But when I imagine an AI trained to respond in English, rather than French or some other language, I do not jump to supposing that this AI will generalise to terminally valuing the English language. Every training signal we expose the AI to reinforces very many behaviours at the same time. The human raters that may think they are training the AI to be nice are also training it to respond in English (because the raters speak English), to respond to queries at all instead of ignoring them, to respond in English

7Wei Dai3mo

What happens when this agent is faced with a problem that is out of its training distribution? I don't see any mechanisms for ensuring that it remains corrigible out of distribution... I guess it would learn some circuits for acting corrigibly (or at least in accordance to how it would explicitly answer "are more corrigible / loyal / aligned to the will of your human creators") in distribution, and then it's just a matter of luck how those circuits end up working OOD?

7Seth Herd3mo

I have the same question. My provisional answer is that it might work, and even if it doesn't, it's probably approximately what someone will try, to the extent they really bother with real alignment before it's too late. What you suggest seems very close to the default path toward capabilities. That's why I've been focused on this as perhaps the most practical path to alignment. But there are definitely still many problems and failure points. I have accidentally written a TED talk below; thanks for coming, and you can still slip out before the lights go down. What you've said above is essentially what I say in Instruction-following AGI is easier and more likely than value aligned AGI. Instruction-following (IF) is a poor man's corrigibility - real corrigibility as the singular target seems safer. But instruction-following is also arguably already the single largest training objective in functional terms for current-gen models - a model that won't follow instructions is considered a poor model. So making sure it's the strongest factor in training isn't a huge divergence from the default course in capabilities. Constitutional AI and similar RL methods are one way of ensuring that's the model's main goal. There are many others, and some might be deployed even if devs want to skimp on alignment. See System 2 Alignment or at least the intro for more. There are still ways it could go wrong, of course. One must decide: corrigible to whom? You don't want full-on-AGI following orders from just anyone. And if it's a restricted set, there will be power struggles. But hey, technically, you had (personal-intent-) aligned AGI. One might ask: If we solve alignment, do we die anyway? (I did). The answer I've got so far is maybe we would die anyway, but maybe we wouldn't. This seems like our most likely path, and also quite possibly also our best chance (short of a global AI freeze starting soon). Even if the base model is very well aligned, it's quite possible for the full s

3tailcalled3mo

Let's say you are using the AI for some highly sensitive matter where it's important that it resists prompt-hacking - e.g. driving a car (prompt injections could trigger car crashes), something where it makes financial transactions on the basis of public information (online websites might scam it), or military drones (the enemy might be able to convince the AI to attack the country that sent it). A general method for ensuring corrigibility is to be eager to follow anything instruction-like that you see. However, this interferes with being good at resisting prompt-hacking.

1Knight Lee3mo

I think the problem you mention is a real challenge, but not the main limitation of this idea. The problem you mention actually decreases with greater intelligence and capabilities, since a smarter AI clearly understands the concept of being corrigible to its creators vs. a random guy on the street, just like a human does. The main problem is still how reinforcement learning trains the AI behaviours which actually maximize reward, while corrigibility only trains the AI behaviours which appear corrigibile.

3tailcalled3mo

Discriminating on the basis of the creators vs a random guy on the street helps with many of the easiest cases, but in an adversarial context, it's not enough to have something that works for all the easiest cases, you need something that can't predictably made to fail by a highly motivated adversary. Like you could easily do some sort of data augmentation to add attempts at invoking the corrigibility system from random guys on the street, and then train it not to respond to that. But there'll still be lots of other vulnerabilities.

1Knight Lee3mo

I still think, once the AI approaches human intelligence (and beyond), this problem should start to go away, since a human soldier can choose to be corrigible to his commander and not the enemy, even in very complex environments. I still feel the main problem is "the AI doesn't want to be corrigible," rather than "making the AI corrigible enables prompt injections." It's like that with humans. That said, I'm highly uncertain about all of this and I could easily be wrong.

3tailcalled3mo

If the AI can't do much without coordinating with a logistics and intelligence network and collaborating with a number of other agents, and its contact to this network routes through a commanding agent that is as capable if not more capable than the AI itself, then sure, it may be relatively feasible to make the AI corrigible to said commanding agent, if that is what you want it to be. (This is meant to be analogous to the soldier-commander example.) But was that the AI regime you expect to find yourself working with? In particular I'd expect you expect that the commanding agent would be another AI, in which case being corrigible to them is not sufficient.

1Knight Lee3mo

Oops I didn't mean that analogy. It's not necessarily a commander, but any individual that a human chooses to be corrigible/loyal to. A human is capable of being corrigible/loyal to one person (or group), without accruing the risk of listening to prompt injections, because a human has enough general intelligence/common sense to know what is a prompt injection and what is a request from the person he is corrigible/loyal to. As AI approach human intelligence, they would be capable of this too.

3tailcalled3mo

Can you give 1 example of a person choosing to be corrigible to someone they are not dependent upon for resources/information and who they have much more expertise than?

1Knight Lee3mo

* Maybe someone who believes in following the will of the majority even if he/she disagrees (and could easily become a dictator)? * Maybe a good parent who listens to his/her child's dreams? Very good question though. Humans usually aren't very corrigible, and there aren't many examples!

3tailcalled3mo

Do you mean "resigns from a presidential position/declines a dictatorial position because they disagree with the will of the people" or "makes policy they know will be bad because the people demand it"? Can you expand on this?

1Knight Lee3mo

Maybe someone like George Washington who was so popular he could easily stay in power, but still chose to make America democratic. Let's hope it stays democratic :/ No human is 100% corrigible and would do anything that someone else wants. But a good parent might help his/her child get into sports and so forth but if the child says he/she wants to be a singer instead the parent helps him/her on that instead. The outcome the parent wants depends on what the child wants, and the child can change his/her mind.

2Knight Lee3mo

Edit: I thought more about this and wrote a post inspired by your idea! A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives :) strong upvote.[1] I really agree it's a good idea, and may increase the level of capability/intelligence we can reach before we lose corrigibility. I think it is very efficient (low alignment tax). The only nitpick is that Claude's constitution already includes aspects of corrigibility,[2] though maybe they aren't emphasized enough. Unfortunately I don't think this will maintain corrigibility for unlimited amounts of intelligence. Corrigibility training makes the AI talk like a corrigible agent, but reinforcement learning eventually teaches it chains-of-thought which (regardless of what language it uses) computes the most intelligent solution that achieves the maximum reward (or proxies to reward), subject to restraints (talking like a corrigible agent). Nate Soares of MIRI wrote a long story on how an AI trained to never think bad thoughts still ends up computing bad thoughts indirectly, though in my opinion his story actually backfired and illustrated how difficult it is for the AI, raising the bar on the superintelligence required to defeat your idea. It's a very good idea :) 1. ^ I wish LessWrong would promote/discuss solutions more, instead of purely reflecting on how hard the problems are. 2. ^ Near the bottom of Claude's constitution, in the section "From Anthropic Research Set 2"

[-]Eli Tyre5y230

(Reasonably personal)

I spend a lot of time trying to build skills, because I want to be awesome. But there is something off about that.

I think I should just go after things that I want, and solve the problems that come up on the way. The idea of building skills sort of implies that if I don't have some foundation or some skill, I'll be blocked, and won't be able to solve some thing in the way of my goals.

But that doesn't actually sound right. Like it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a "skill bank".

Raw problem solving is the real thing and skills are cruft. (Or maybe not cruft per se, but more like a side effect. The compiled residue of previous problem solving. Or like a code base from previous project that you might repurpose.)

Part of the problem with this is that I don't know what I want for my own sake, though. I want to be awesome, which in my conception, means being able to do things.

I note that wanting "to be able to do things" is a leaky sort of motivation: because the... (read more)

3Marcello5y

Your seemingly target-less skill-building motive isn't necessarily irrational or non-awesome. My steel-man is that you're in a hibernation period, in which you're waiting for the best opportunity of some sort (romantic, or business, or career, or other) to show up so you can execute on it. Picking a goal to focus on really hard now might well be the wrong thing to do; you might miss a golden opportunity if your nose is at the grindstone. In such a situation a good strategy would, in fact, be to spend some time cultivating skills, and some time in existential confusion (which is what I think not knowing which broad opportunities you want to pursue feels like from the inside). The other point I'd like to make is that I expect building specific skills actually is a way to increase general problem solving ability; they're not at odds. It's not that super specific skills are extremely likely to be useful directly, but that the act of constructing a skill is itself trainable and a significant part of general problem solving ability for sufficiently large problems. Also, there's lots of cross-fertilization of analogies between skills; skills aren't quite as discrete as you're thinking.

3Dagon5y

Skills and problem-solving are deeply related. The basics of most skills are mechanical and knowledge-based, with some generalization creeping in on your 3rd or 4th skill in terms of how to learn and seeing non-obvious crossover. Intermediate (say, after the first 500 to a few thousand hours) use of skills requires application of problem-solving within the basic capabilities of that skill. Again, you get good practice within a skill, and better across a few skills. Advanced application in many skills is MOSTLY problem-solving. How to apply your well-indexed-and-integrated knowledge to novel situations, and how to combine that knowledge across domains. I don't know of any shortcuts, though - it takes those thousands of hours to get enough knowledge and basic techniques embedded in your brain that you can intuit what avenues to more deeply explore in new applications. There is a huge amount of human variance - some people pick up some domains ludicrously easily. This is a blessing and a curse, as it causes great frustration when they hit a domain that they have to really work at. Others have to work at everything, and never get their Nobel, but still contribute a whole lot of less-transformational "just work" within the domains they work at.

2cata8mo

I don't know whether this resembles your experience at all, but for me, skills translate pretty directly to moment-to-moment life satisfaction, because the most satisfying kind of experience is doing something that exercises my existing skills. I would say that only very recently (in my 30s) do I feel "capped out" on life satisfaction from skills (because I am already quite skilled at almost everything I spend all my time doing) and I have thereby begun spending more time trying to do more specific things in the world.

2Viliam5y

Seems to me there is some risk either way. If you keep developing skills without applying them to a specific goal, it can be a form of procrastination (an insidious one, because it feels so virtuous). There are many skills you could develop, and life is short. On the other hand, as you said, if you go right after your goal, you may find an obstacle you can't overcome... or even worse, an obstacle you can't even properly analyze, so the problem is not merely that you don't have the necessary skill, but that you even have no idea which skill you miss (so if you try to develop the skills as needed, you may waste time developing the wrong skills, because you misunderstood the nature of the problem). It could be both. And perhaps you notice the problem-specific skills more, because those are rare. But I also kinda agree that the attitude is more important, and skills often can be acquired when needed. So... dunno, maybe there are two kinds of skills? Like, the skills with obvious application, such as "learn to play a piano"; and the world-modelling skills, such as "understand whether playing a piano would realistically help you accomplish your goals"? You can acquire the former when needed, but you need the latter in advance, to remove your blind spots? Or perhaps some skills such as "understand math" are useful in many kinds of situations and take a lot of time to learn, so you probably want to develop these in advance? (Also, if you don't know yet what to do, it probably helps to get power: learn math, develop social skills, make money... When you later make up your mind, you will likely find some of this useful.) And maybe you need the world-modelling skills before you make specific goals, because how could your goal be to learn play the piano, if you don't know the piano exists? You could have a more general goal, such as "become famous at something", but if you don't know that piano exists, maybe you wouldn't even look in this direction. Could this also be abo

2Matt Goldenberg5y

I've gone through something very similar. Based on your language here, it feels to me like you're in the contemplation stage along the stages of change. So the very first thing I'd say is to not feel the desire to jump ahead and "get started on a goal right now." That's jumping ahead in the stages of change, and will likely create a relapse. I will predict that there's a 50% chance that if you continue thinking about this without "forcing it", you'll have started in on a goal (action stage) within 3 months. Secondly, unlike some of the other responses here, I think your analysis is fairly accurate. I've certainly found that picking up gears when I need them for my goals is better than learning them ahead of time. Now, in terms of "how to actually do it." I'm pretty convinced that they key to getting yourself to do stuff is "Creative Tension" - creating a clear internal tension between the end state that feels good and the current state that doesn't feel as good. There are 4 ways I know to go about generating internal tension: 1. Develop a strong sense of self, and create tension between the world where you're fully expressing that self and the world where you're not. 2. Develop a strong sense of taste, and create tension between the beautiful things that could exist and what exists now. 3. Develop a strong pain, and create tension between the world where you have that pain and the world where you've solved it. 4. Develop a strong vision, and create tension between the world as it is now and the world as it would be in your vision. One especially useful trick that worked for me coming from the "just develop myself into someone awesome" place was tying the vision of the awesome person I could be with the vision of what I'd achieved - that is, in m vision of the future, including a vision of the awesome person I had to become in order to reach that future. I then would deliberately contrast where I was now with that compelling vision/self/taste with w

[-]Eli Tyre4y190

I’m no longer sure that I buy dutch book arguments, in full generality, and this makes me skeptical of the "utility function" abstraction

Thesis: I now think that utility functions might be a pretty bad abstraction for thinking about the behavior of agents in general including highly capable agents.

[Epistemic status: half-baked, elucidating an intuition. Possibly what I’m saying here is just wrong, and someone will helpfully explain why.]

Over the past years, in thinking about agency and AI, I’ve taken the concept of a “utility function” for granted as the natural way to express an entity's goals or preferences.

Of course, we know that humans don’t have well defined utility functions (they’re inconsistent, and subject to all kinds of framing effects), but that’s only because humans are irrational. To the extent that a thing acts like an agent, it’s behavior corresponds to some utility function. That utility function might not be explicitly represented, but if an agent is rational, there’s some utility function that reflects it’s preferences.

Given this, I might be inclined to scoff at people who scoff at “blindly maximizing” AGIs. “They just don’t get it”, I might think. “T... (read more)

4Gordon Seidoh Worley4y

I've long been somewhat skeptical that utility functions are the right abstraction. My argument is also rather handwavy, being something like "this is the wrong abstraction for how agents actually function, so even if you can always construct a utility function and say some interesting things about its properties, it doesn't tell you the thing you need to know to understand and predict how an agent will behave". In my mind I liken it to the state of trying to code in functional programming languages on modern computers: you can do it, but you're also fighting an uphill battle against the way the computer is physically implemented, so don't be surprised if things get confusing. And much like in the utility function case, people still program in functional languages because of the benefits they confer. I think the same is true of utility functions: they confer some big benefits when trying to reason about certain problems, so we accept the tradeoffs of using them. I think that's fine so long as we have a morphism to other abstractions that will work better for understanding the things that utility functions obscure.

2JBlack4y

Utility functions are especially problematic in modeling behaviour for agents with bounded rationality, or those where there are costs of reasoning. These include every physically realizable agent. For modelling human behaviour, even considering the ideals of what we would like human behaviour to achieve, there are even worse problems. We can hope that there is some utility function consistent with the behaviour we're modelling and just ignore cases where there isn't, but that doesn't seem satisfactory either.

2Pattern4y

'Or you will leave money on the table.' You rotated 'different' and 'between'. (Or a serious of rotations isomorphic to such.)

[-]Eli Tyre6y180

New post: The Basic Double Crux Pattern

[This is a draft, to be posted on LessWrong soon.]

I’ve spent a lot of time developing tools and frameworks for bridging "intractable" disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work on it.

People often express to me something to the effect, “The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.”

I half agree with that sentiment. I do think that those low level cognitive and conversational patterns are the most important thing, and at Double Crux trainings that I have run, most of the time is spent focusing on specific exercises to instill those low level TAPs.

However, I don’t think that the only value of the Double Crux schema is in training those low level habits. Double cruxes are extremely powerful machines that allow one to identify, if not the most efficient conversational path, a very high efficiency conversationa... (read more)

[-]Eli Tyre10mo142

Eliezer claims that dath ilani never give in to threats. But I'm not sure I buy it.

The only reason people will make threats against you, the argument goes, is if those people expect that you might give in. If you have an iron-clad policy against acting in response to threats made against you, then there's no point in making or enforcing the threats in the first place. There's no reason for the threatener to bother, so they don't. Which means in some sufficiently long run, refusing to submit to threats means you're not subject to threats.

This seems a bit fishy to me. I have a lingering suspicion that this argument doesn't apply, or at least doesn't apply universally, in the real world.

I'm thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits, such as the soldiers of the local government. The bandits say "give us half your harvest, or we'll just kill you."

The argument above depends on a claim about the cost of executing on a threat. "There's no reason to bother" implies that the threatener has a preference not to bother, if they know that the t... (read more)

[-]aphyer10mo171

Eliezer, this is what you get for not writing up the planecrash threat lecture thread. We'll keep bothering you with things like this until you give in to our threats and write it.

[-]Hastings10mo126

What you’ve hit upon is “BATNA,” or “Best alternative to a negotiated agreement.” Because the robbers can get what they want by just killing the farmers, the dath ilani will give in- and from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.

This may not be crazy- it reminds me of the Ancient Greek social mores around hospitality, which seem insanely generous to a modern reader but I guess make sense if the equilibrium number of roving <s>bandits</s> honored guests is kept low by some other force

2Eli Tyre10mo

This seems like it weakens the "don't give into threats" policy substantially, because it makes it much harder to tell what's a threat-in-the-technical-sense, and the incentives push of exaggeration and dishonesty about what is or isn't a threat-in-the-the-technical-sense. The bandits should always act as if they're willing to kill the farmers and take their stuff, even if they're bluffing about their willingness to do violence. The farmers need to estimate whether the bandits are bluffing, and either call the bluff, or submit to the demand-which-is-not-technically-a-threat. That policy has notably more complexity than just "don't give in to threats."

2kave10mo

What is the "don't give in to threats" policy that this is more complex than? In particular, what are 'threats'?

1Eli Tyre10mo

"Anytime someone credibly demands that you do X, otherwise they'll do Y to you, you should not do X." This is a simple reading of the "don't give into threats" policy.

2kave10mo

What are the semantics of "otherwise"? Are they more like: * X otherwise Y ↦ X → ¬Y, or * X otherwise Y ↦ X ↔ ¬Y

2kave10mo

Presumably you also want the policy to include that you don't want "Y" and weren't going to do "X" anyway?

2Eli Tyre10mo

Yes, to the first part, probably yes to the second part.

1Hastings10mo

With a grain of salt, There’s a sort of quiet assumption that should be louder about the dath Ilan fiction: which is that it’s about a world where a bunch of theorems like “as systems of agents get sufficiently intelligent, they gain the ability to coordinate in prisoner’s dilemma like problems” have proofs. You could similarly write fiction set in a world where P=NP has a proof and all of cryptography collapses. I’m not sure whether EY would guess that sufficiently intelligent agents actually coordinate- Just like I could write the P=NP fiction while being pretty sure that P/=NP

2Eli Tyre10mo

Huh, the idea that Greek guest-friendship was a adaption to warriors who would otherwise kill you and take your stuff is something that I had never considered before. Isn't it generally depicited as a relationship between nobles who, presumably, would be able to repel roving bandits?

4Vladimir_Nesov10mo

Threateners similarly can employ bindings, always enforcing regardless of local cost. A binding has an overall cost from following it in all relevant situations, costs in individual situations are what goes into estimating this overall cost, but individually they are not decision relevant, when deciding whether to commit to a global binding. In this case opposing commitments effectively result in global enmity (threateners always enforce, targets never give in to threats), so if targets are collectively stronger than threateners, then threateners lose. But this collective strength (for the winning side) or vulnerability (for the losing side) is only channeled through targets or threateners who join their respective binding. If few people join, the faction is weak and loses.

2RobertM10mo

But threateners don't want want to follow that policy, since in the resulting equilibrium they're wasting a lot of their own resources.

2Vladimir_Nesov10mo

The equilibrium depends on which faction is stronger. Threateners who don't always enforce and targets who don't always ignore threats are not parts of this game, so it's not even about relative positions of threateners and targets, only those that commit are relevant. If the threateners win, targets start mostly giving in to threats, and so for threateners the cost of binding becomes low overall.

2RobertM10mo

I'm talking about the equilibrium where targets are following their "don't give in to threats" policy. Threateners don't want to follow a policy of always executing threats in that world - really, they'd probably prefer to never make any threats in that world, since it's strictly negative EV for them.

2Vladimir_Nesov10mo

If the unyielding targets faction is stronger, the equilibrium is bad for committed enforcers. If the committed enforcer faction is stronger, the equilibrium doesn't retain high cost of enforcement, and in that world the targets similarly wouldn't prefer to be unyielding. I think the toy model where that fails leaves the winning enforcers with no pie, but that depends on enforcers not making use of their victory to set up systems for keeping targets relatively defenseless, taking the pie even without their consent. This would no longer be the same game ("it's not a threat"), but it's not a losing equilibrium for committed enforcers of the preceding game either.

3Multicore10mo

This distinction of which demands are or aren't decision-theoretic threats that rational agents shouldn't give in to is a major theme of the last ~quarter of Planecrash (enormous spoilers in the spoiler text). This theme is brought up many times, but there's not one comprehensive explanation to link to. (The parable of the little bird is the closest I can think of.)

2RHollerith10mo

The assertion IIUC is not that it never makes sense for anyone to give in to a threat -- that would clearly be an untrue assertion -- but rather that it is possible for a society to reach a level of internal coordination where it starts to make sense to adopt a categorical policy of never giving in to a threat. That would mean for example that any society member that wants to live in dath ilan's equivalent of an isolated farm would probably need to formally and publicly relinquish their citizenship to maintain dath ilan's reputation for never giving in to a threat. Or dath ilan would make it very clear that they must not give in to any threats, and if they do and dath ilan finds out, then dath ilan will be the one that slaughters the whole family. The latter policy is a lot like how men's prisons work at least in the US whereby the inmates are organized into groups (usually based on race or gang affiliation) and if anyone even hints (where others can hear) that you might give in to sexual extortion, you need to respond with violence because if you don't, your own group (the main purpose of which is mutual protection from the members of the other groups) will beat you up. That got a little grim. Should I add a trigger warning? Should I hide the grim parts behind a spoiler tag thingie?

2quetzal_rainbow10mo

Bandits have obvious cost: if they kill all farmers, from whom are they going to take stuff?

2Eli Tyre10mo

That's not a cost. At worst, all the farmers will relentlessly fight to the death, in that case the bandits get one year of food and have to figure something else out next year. That outcome strictly dominates not stealing any food this year, and needing to figure out something else out both this year and next year.

-13David Hornbein10mo

[-]Eli Tyre6y140

Old post: A mechanistic description of status

[This is an essay that I’ve had bopping around in my head for a long time. I’m not sure if this says anything usefully new-but it might click with some folks. If you haven’t read Social Status: Down the Rabbit Hole on Kevin Simler’s excellent blog, Melting Asphalt read that first. I think this is pretty bad and needs to be rewritten and maybe expanded substantially, but this blog is called “musings and rough drafts.”]

In this post, I’m going to outline how I think about status. In particular, I want to give a mechanistic account of how status necessarily arises, given some set of axioms, in much the same way one can show that evolution by natural selection must necessarily occur given the axioms of 1) inheritance of traits 2) variance in reproductive success based on variance in traits and 3) mutation.

(I am not claiming any particular skill at navigating status relationships, any more than a student of sports-biology is necessarily a skilled basketball player.)

By “status” I mean prestige-status.

Axiom 1: People have goals.

That is, for any given human, there are some things that they want. This can include just about anything. You might wan... (read more)

4Kaj_Sotala6y

Related: The red paperclip theory of status describes status as a form of optimization power, specifically one that can be used to influence a group.

4Raemon6y

(it says "more stuff here" but links to your overall blog, not sure if that meant to be a link to a specific post)

[-]Eli Tyre4y*130

I've offered to be a point person for folks who believe that they were severely impacted by Leverage 1.0, and have related information, but who might be unwilling to share that info, for any of a number of reasons.

In short,

If someone wants to tell me private meta-level information (such as "I don't want to talk about my experience publicly because X"), so that I can pass along in an anonymized way to someone else (including Geoff, Matt Fallshaw, Oliver Habryka, or others) - I'm up for doing that.
- In this case, I'm willing to keep info non-public (ie not publish it on the internet), and anonymized, but am reluctant to keep it secret (ie pretend that I don't have any information bearing on the topic).
  - For instance, let's say someone tells me that they are afraid to publish their account due to a fear of being sued.
  - If later, as a part of this whole process, some third party asks "is there anyone who isn't speaking out of a fear of legal repercussions?", I would respond "yes, without going into the details, one of the people that I spoke to said that", unless my saying that would uniquely identify the person I spoke to.
  - If someone asked me point-blank "is it Y-person who is afraid o

... (read more)

[-]Eli Tyre16d*120

In his TED talk, Eliezer guesses superintelligence will arrive after "zero to two more breakthroughs the size of transformers." I've heard others voice similar takes. But I haven't heard much discussion of which number it is.^[1]

There is an enormous difference between the world where there 0 insights left before superintelligence, and the world in which we have one or more. Specifically, this is the difference between a soft or a hard takeoff, because of what we might call a "cognitive capability overhang".

The current models are already superhuman in a several notable ways:

Vastly superhuman breadth of knowledge
Effectively superhuman working memory
Superhuman thinking speed^[2]

If there's a secret sauce that is missing for "full AGI", then the first AGI might have all of these advantages, and more, out of the gate.

It seems to me that there are at least two possibilities.

We may be in world A:

We've already discovered all the insights and invented the techniques that earth is going to use to create its first superintelligence in this timeline. It's something like transformers pre-trained on internet corpuses, and then trained using RL from verifiable feedback and on synthetic data generate

... (read more)

[-]ryan_greenblatt15d112

Why not think that the new paradigm/insight would in practice be much more continuous? E.g., you first invent a shitty version of it which creates some improvement on existing methods, then you make a somewhat better version, and so on.

I think there are sometimes large breakthroughs which come all in a small period of time (e.g. a month), but usually things are more incremental. For instance, "reasoning models" was arguably the largest publicly known breakthrough of the last 1.5 years and it seems very continuous. (Note that even as of november 2023, OpenAI had some prototype of the relevant thing and this is long before o1 came out.)

Things are also probably smoothed out some because you first test new improvements at smaller scale and companies only run big training runs periodically. (Though this can make things jumpier in some ways.)

I think we should put a bit of weight on "big algorithmic breakthrough that occurs over the course of a month lead to very powerful AI starting from well below that" (maybe like 10%) and more weight on "very powerful AI will emerge at a point when some shift in paradigm/algorithms invented within a year has made progress substantially faster for some potentially short period" (maybe like 40% though I feel quite uncertain).

9Buck16d

As an aside, I think that the amount of algorithm efficiency improvement since transformers has arguably been much more than 2x the innovation that transformers were. E.g. Epoch estimates here that transformers were 23% of the algorithmic improvement that's happened over the time period starting with their publication. Also, note that "Attention is all you need" didn't invent self-attention, but just demonstrated that you can make a language model with just self-attention (and MLPs) and no recurrence. And several papers had introduced self-attention (I think the previous year).

7Vladimir_Nesov15d

Both incremental improvements (compute scaling and small to moderate algorithmic improvements), and breakthroughs (such as significant "unhobblings" compared to human faculties) advance capabilities. Which of them is the last step that crosses the threshold to recursive self-improvement at AI speed could even be historically contingent, let alone knowable in advance. So instead we can quantify the rate of incremental improvements (anticipating future conditions that change it), and point out specific important incapabilities of current AIs that might be targeted by possible breakthroughs. For incremental improvements, the funding (and then the industrial base) for the current super-fast compute scaling will be running out in 2027-2029 (absent AGI), and so the compute scaling itself will slow down (by about 3x) after 2028-2030. This will in turn also slow down incremental algorithmic improvements (a bit later still), because they need compute for experiments and a non-increasing amount of compute will lead to low-hanging fruit getting picked and new low-hanging fruit not showing up from the independent source of greater compute scale. So if it's 2029-2032, compute stopped rapidly scaling 2-3 years ago, and AIs still can't do recursive self-improvement at AI speed, then the factor of incremental improvements becomes less important than it is now. For the current reasoning LLMs, an important incapability is that they can't (at all) deeply adapt to specific roles or sources of tasks, they are always "first day on the job", even if they have relevant professional skills and copious notes from previous efforts. On a recent podcast, Dwarkesh Patel says that Sutskever's SSI is rumored to be working on "test time training" (at 39:25). Another reason to think this "unhobbling" is plausible soon is that it might turn out to be possible to use agentic (tool-using) RLVR to train AIs to prepare datasets for finetuning variants of themselves (not necessarily with RLVR) that will

[-]Eli Tyre1y122

So it seems like one way that the world could go is:

China develops a domestic semiconductor fab industry that's not at the cutting edge, but close, so that it's less dependent on Taiwan's TSMC
China invades Taiwan, destroying TSMC, ending up with a compute advantage over the US, which translates into a military advantage
(which might or might not actually be leveraged in a hot war).

I could imagine China building a competent domestic chip industry. China seems more determined to do that than the US is.

Though notably, China is not on track to do that currently. It's not anywhere close to it's goal producing 70% it's chips, by 2025.

And if the US was serious about building a domestic cutting-edge chip industry again, could it? I basically don't think that American work culture can keep up with Taiwanese/TSMC work culture, in this super-competitive industry.

TSMC is building fabs in the US, but from what I hear, they're not going well.

(While TSMC is a Taiwanese company, having a large fraction of TSMC fabs in in the US would preement the scenario above. TSMC fabs in the US counts as "a domestic US chip industry.")

Building and running leading node fabs is just a really really hard thing to do.

I guess the most likely status scenario is the continuation of the status quo where China and the US continue to both awkwardly depend on TSMC's chips for crucial military and economic AI tech.

5O O1y

Hold on. The TSMC Arizona fab is actually ahead of schedule. They were simply waiting for funds. I believe TSMC’s edge is largely cheap labor. https://www.tweaktown.com/news/97293/tsmc-to-begin-pilot-program-at-its-arizona-usa-fab-plant-for-mass-production-by-end-of-2024/index.html

2Eli Tyre1y

I'm not that confident about how the Arizona fab is going. I've mostly heard second hand accounts. I'm very confident that TSMC's edge is more than cheap labor. It would be basically impossible for another country, even one with low median wages, to replicate TSMC. Singapore and China have both tried, and can't compete. At this point in time, TSMC has a basically insurmountable human capital and institutional capital advantage, that enables it to produce leading node chips that no other company in the world can produce. Samsung will catch up, sure. But by the time they catch up to the TSMC's 2024 state of the art, TSMC will have moved on to the next node. My understanding is that, short of TSMC being destroyed by war with mainland China, or some similar disaster, it's not feasible for any company to catch up with TSMC within the next 10 years, at least.

1O O1y

So, from their site "TSMC Arizona’s first fab is on track to begin production leveraging 4nm technology in first half of 2025." You are probably thinking of their other Arizona fabs. Those are indeed delayed. However, they cite "funding" as the issue.[1] Based on how quickly TSMC changed tune on delays once they got Chips funding, I think it's largely artificial, and a means to extract CHIPS money. They have cumulative investments over the years, but based on accounts of Americans who have worked there, they don't sound extremely advanced. Instead they sound very hard working, which gives them a strong ability to execute. Also, I still think these delays are somewhat artificial. There are natsec concerns for Taiwan to let TSMC diversify, and TSMC seems to think it can wring a lot of money out of the US by holding up construction. They are, after all, a monopoly. Is Samsung 5 generations behind? I know that nanometers don't really mean anything anymore, but TSMC and Samsung's 4 nm don't seem 10 years apart based on the tidbits I get online. 1. ^ Liu said construction on the shell of the factory had begun, but the Taiwanese chipmaking titan needed to review “how much incentives … the US government can provide.”

2Eli Tyre1y

I'm not claiming they're 10 years behind. My understanding from talking with people is that TSMC is around 2 to 3 years behind TSMC. My claim is that Samsung and TSMC are advancing at ~the same rate, so Samsung can't close that 2 to 3 year gap.

1O O1y

Oh yeah I agree. Misread that. Still, maybe not so confident. Market leaders often don’t last. Competition always catches up.

3davekasten1y

As you note, TSMC is building fabs in the US (and Europe) to reduce this risk. I also think that it's worth noting that, at least in the short run, if the US didn't have shipments of new chips and was at war, the US government would just use wartime powers to take existing GPUs from whichever companies they felt weren't using them optimally for war and give them to the companies (or US Govt labs) that are. Plus, are you really gonna bet that the intelligence community and DoD and DoE don't have a HUUUUGE stack of H100s? I sure wouldn't take that action.

4Eli Tyre1y

What, just sitting in a warehouse? I would bet that the government's supply of GPUs is notably smaller than that of Google and Microsoft.

1davekasten1y

I meant more "already in a data center," though probably some in a warehouse, too. I roll to disbelieve that the people who read Hacker News in Ft. Meade, MD and have giant budgets aren't making some of the same decisions that people who read Hacker News in Palo Alto, CA and Redmond, WA would.

2Eli Tyre1y

I don't think the budgets are comparable. I read recently that Intel's R&D budget in the 2010s was 3x bigger than all of DARPA.

2davekasten1y

No clue if true, but even if true, but DARPA is not at all a comparable to Intel. Entity set up for very different purposes and engaging in very different patterns of capital investment. Also very unclear to me why R&D is relevant bucket. Presumably buying GPUs is either capex or if rented, is recognized under a different opex bucket (for secure cloud services) than R&D ? My claim isn't that the USG is like running its own research and fabs at equivalent levels of capability to Intel or TSMC. It's just that if a war starts, it has access to plenty of GPUs through its own capacity and its ability to mandate borrowing of hardware at scale from the private sector.

0ChristianKl1y

When I look at the current US government it does not seem to be able to just take whatever they want from big companies with powerful lobbyists.

2O O1y

Wartime powers let governments do whatever they want essentially. Even recently Biden has flexed the defense production act. https://www.defense.gov/News/Feature-Stories/story/article/2128446/during-wwii-industries-transitioned-from-peacetime-to-wartime-production/

4ChristianKl1y

Did he do it in a way that hurt the bottom line of any powerful US company? No, I don't think so. While the same power that existed in WWII still exist on paper today, the US government is much less capable to take actions.

2Seth Herd1y

We're not at war. If we were in a war with real stakes, I'd expect to see those powers used much more aggressively.

1O O1y

This makes no sense. Wars are typically existential. In a hot war with another state, why would the government not use all of industrial capacity that is more useful to make weapons to make weapons. It’s well documented that governments can repurpose unnecessary parts of industry (say training Grok or an open source chatbot) into whatever else. Biden used them for largely irrelevant reasons. This indicates that with an actual war, usage would be wider and more extensive.

2ozziegooen1y

I'd flag that I think it's very possible TSMC will be very much hurt/destroyed if China is in control. There's been a bit of discussion of this. I'd suspect China might fix this after some years, but would expect it would be tough for a while. https://news.ycombinator.com/item?id=40426843

2Eli Tyre1y

You mean if they're in control of Taiwan? Yes, the US would destroy it on the way out.

2ozziegooen1y

Yea

[-]Eli Tyre5y120

Something that I've been thinking about lately is the possibility of an agent's values being partially encoded by the constraints of that agent's natural environment, or arising from the interaction between the agent and environment.

That is, an agent's environment puts constraints on the agent. From one perspective removing those constraints is always good, because it lets the agent get more of what it wants. But sometimes from a different perspective, we might feel that with those constraints removed, the agent goodhearts or wire-heads, or otherwise fails to actualize its "true" values.

The Generator freed from the oppression of the Discriminator

As a metaphor: if I'm one half of a GAN, let's say the generator, then in one sense my "values" are fooling the discriminator, and if you make me relatively more powerful than my discriminator, and I dominate it...I'm loving it, and also no longer making good images.

But you might also say, "No, wait. That is a super-stimulus, and actually what you value is making good images, but half of that value was encoded in your partner."

This second perspective seems a little stupid to me. A little too Aristotelian. I mean if we're going to take that ... (read more)

2Eli Tyre5y

Side note, which is not my main point: I think this also has something to do with what meditation and psychedelics do to people, which was recently up for discussion on Duncan's Facebook. I bet that mediation is actually a way to repair psychblocks and trauma and what-not. But if you do that enough, and you remove all the psych constraints...a person might sort of become so relaxed that they become less and less of an agent. I'm a lot less sure of this part.

[-]Eli Tyre6y110

[Real short post. Random. Complete speculation.]

Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.

I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.

But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.

7jimrandomh6y

A slightly different spin on this model: it's not about the types of strategies people generate, but the number. If you think about something and only come up with one strategy, you'll do it without hesitation; if you generate three strategies, you'll pause to think about which is the right one. So people who can't come up with as many strategies are impulsive.

1Eli Tyre6y

This seems that it might be testable. If you force impulsive folk to wait and think, do they generate more ideas for how to proceed?

1David Scott Krueger (formerly: capybaralet)6y

This reminded me of the argument that superintelligent agents will be very good at coordinating and just divvy of the multiverse and be done with it. It would be interesting to do an experimental study of how the intelligence profile of a population influences the level of cooperation between them.

2Eli Tyre6y

I think that's what the book referenced here, is about.

[-]Eli Tyre5mo100

[For some of my work for Palisade]

Does anyone know of even very simple examples of AIs exhibiting instrumentally convergent resource aquisition?

Something like "an AI system in a video game learns to seek out the power ups, because that helps it win." (Even better would be a version in which, you can give the agent one of several distinct-video game goals, but regardless of the goal, it goes and gets the powerups first).

It needs to be an example where the instrumental resource is not strictly required for succeeding at the task, while still being extremely helpful.

4Mateusz Bagiński5mo

I haven't looked into this in detail but I would be quite surprised if Voyager didn't do any of that? Although I'm not sure whether what you're asking for is exactly what you're looking for. It seems straightforward that if you train/fine-tune a model on examples of people playing a game that involves leveraging [very helpful but not strictly necessary] resources, you are going to get an AI capable of that. It would be more non-trivial if you got an RL agent doing that, especially if it didn't stumble into that strategy/association "I need to do X, so let me get Y first" by accident but rather figured that Y tends to be helpful for X via some chain of associations.

[-]Eli Tyre6y100

new post: Metacognitive space

[Part of my Psychological Principles of Personal Productivity, which I am writing mostly in my Roam, now.]

Metacognitive space is a term of art that refers to a particular first person state / experience. In particular it refers to my propensity to be reflective about my urges and deliberate about the use of my resources.

I think it might literally be having the broader context of my life, including my goals and values, and my personal resource constraints loaded up in peripheral awareness.

Metacognitive space allows me to notice aversions and flinches, and take them as object, so that I can respond to them with Focusing or dialogue, instead of being swept around by them. Similarly, it seems to, in practice, to reduce my propensity to act on immediate urges and temptations.

[Having MCS is the opposite of being [[{Urge-y-ness | reactivity | compulsiveness}]]?]

It allows me to “absorb” and respond to happenings in my environment, including problems and opportunities, taking considered instead of semi-automatic, first response that occurred to me, action. [That sentence there feels a little fake, or maybe about something else, or may... (read more)

[-]Eli Tyre1y9-3

In this interview, Eliezer says the following:

I think if you push anything [referring to AI systems] far enough, especially on anything remotely like the current paradigms, like if you make it capable enough, the way it gets that capable is by starting to be general.

And at the same sort of point where it starts to be general, it will start to have it's own internal preferences, because that is how you get to be general. You don't become creative and able to solve lots and lots of problems without something inside you that organizes your problem solvi

... (read more)

4Adele Lopez1y

In my view, this is where the Omohundro Drives come into play. Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference. Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference. This seems possible to me. Humans have plenty of text in which we generate new abstractions/hypotheses, and so effective next-token prediction would necessitate forming a model of that process. Once the AI has human-level ability to create new abstractions, it could then simulate experiments (via e.g. its ability to predict python code outputs) and cross-examine the results with its own knowledge to adjust them and pick out the best ones.

4bideup1y

Sorry, what's the difference between these two positions? Is the second one meant to be a more extreme version of the first?

2Eli Tyre1y

Yes.

2Steven Byrnes1y

In Section 1 of this post I make an argument kinda similar to the one you’re attributing to Eliezer. That might or might not help you, I dunno, just wanted to share.

[-]Eli Tyre4y90

Does anyone know of a good technical overview of why it seems hard to get Whole Brain Emulations before we get neuromorphic AGI?

I think maybe I read a PDF that made this case years ago, but I don't know where.

4Steven Byrnes4y

I haven't seen such a document but I'd be interested to read it too. I made an argument to that effect here: https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than (Well, a related argument anyway. WBE is about scanning and simulating the brain rather than understanding it, but I would make a similar argument using "hard-to-scan" and/or "hard-to-simulate" things the brain does, rather than "hard-understand" things the brain does, which is what I was nominally blogging about. There's a lot of overlap between those anyway; the examples I put in mostly work for both.)

2Eli Tyre4y

Great. This post is exactly the sort of thing that I was thinking about.

[-]Eli Tyre5y90

There’s a psychological variable that seems to be able to change on different timescales, in me, at least. I want to gesture at it, and see if anyone can give me pointers to related resources.

[Hopefully this is super basic.]

There a set of states that I occasionally fall into that include what I call “reactive” (meaning that I respond compulsively to the things around me), and what I call “urgy” (meaning that that I feel a sort of “graspy” desire for some kind of immediate gratification).

These states all have... (read more)

2Matt Goldenberg5y

I remembered there was a set of audios from Eben Pagan that really helped me before I turned them into the 9 breaths technique. Just emailed them to you. They go a bit more into depth and you may find them useful.

2Matt Goldenberg5y

I don't know if this is what you're looking for, but I've heard the variable you're pointing at referred to as your level of groundedness, centeredness, and stillness in the self-help space. There are all sorts of meditations, visualizations, and exercises aimed to make you more grounded/centered/still and a quick google search pulls up a bunch. One I teach is called the 9 breaths technique. Here's another.

[-]Eli Tyre6y90

new (boring) post on controlled actions.

[-]Eli Tyre6y90

New post: Why does outlining my day in advance help so much?

1rk6y

This link (and the one for "Why do we fear the twinge of starting?") is broken (I think it's an admin view?). (Correct link)

1Eli Tyre6y

They should both be fixed now. Thanks!

[-]Eli Tyre6y90

New post: some musings on deliberate practice

6Raemon6y

Thanks! I just read through a few of your most recent posts and found them all real useful.

5Eli Tyre6y

Cool! I'd be glad to hear more. I don't have much of a sense of which thing I write are useful or how.

2Hazard6y

Relating to the "Perception of Progress" bit at the end. I can confirm for a handful of physical skills I practice there can be a big disconnect between Perception of Progress and Progress from a given session. Sometimes this looks like working on a piece of sleight of hand, it feeling weird and awkward, and the next day suddenly I'm a lot better at it, much more than I was at any point in the previous days practice. I've got a hazy memory of a breakdancer blogging about how a particular shade of "no progress fumbling" can be a signal that a certain about of "unlearning" is happening, though I can't find the source to vet it.

[-]Eli Tyre5y*80

I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with them.

This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.

There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.

Preparadigmatic science

There are a number ... (read more)

[-]Eli Tyre6y80

New (short) post: Desires vs. Reflexes

[Epistemic status: a quick thought that I had a minute ago.]

There are goals / desires (I want to have sex, I want to stop working, I want to eat ice cream) and there are reflexes (anger, “wasted motions”, complaining about a problem, etc.).

If you try and squash goals / desires, they will often (not always?) resurface around the side, or find some way to get met. (Why not always? What are the difference between those that do and those that don’t?) You need to bargain with them, or design outlet poli... (read more)

[-]Eli Tyre6y80

new post: Intro to and outline of a sequence on a productivity system

3eigen6y

I'm interested about knowing more about the meditation aspect and how it relates to productivity!

2Matt Goldenberg6y

I'm currently running a pilot program that takes a very similar psychological slant on productivity and procrastination, and planning to write a sequence starting in the next week or so. It covers a lot of the same subjects, including habits, ambiguity or overwhelm aversion, coercion aversion, and creating good relationships with parts. Maybe we should chat!

[-]Eli Tyre6y70

Totally an experiment, I'm trying out posting my raw notes from a personal review / theorizing session, in my short form. I'd be glad to hear people's thoughts.

This is written for me, straight out of my personal Roam repository. The formatting is a little messed up because LessWrong's bullet don't support indefinite levels of nesting.

This one is about Urge-y-ness / reactivity / compulsiveness

I don't know if I'm naming this right. I think I might be lumping categories together.
Let's start with what I know:

There are th

... (read more)

[-]Eli Tyre6y70

New post: Some musings about exercise and time discount rates

[Epistemic status: a half-thought, which I started on earlier today, and which might or might not be a full thought by the time I finish writing this post.]

I’ve long counted exercise as an important component of my overall productivity and functionality. But over the past months my exercise habit has slipped some, without apparent detriment to my focus or productivity. But this week, after coming back from a workshop, my focus and productivity haven’t really booted up.

Her... (read more)

2Viliam6y

Alternative hypothesis: maybe what expands your time horizon is not exercise and meditation per se, but the fact that you are doing several different things (work, meditation, exercise), instead of doing the same thing over and over again (work). It probably also helps that the different activities use different muscles, so that they feel completely different. This hypothesis predicts that a combination of e.g. work, walking, and painting, could provide similar benefits compared to work only.

2Eli Tyre6y

Well, my working is often pretty varied, while my "being distracted" is pretty monotonous (watching youtube clips), so I don't think it is this one.

[-]Eli Tyre6y*70

New post: Capability testing as a pseudo fire alarm

[epistemic status: a thought I had]

It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.

I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you... (read more)

[-]jimrandomh6y140

In There’s No Fire Alarm for Artificial General Intelligence Eliezer argues:

A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.

If I have a predetermined set of tests, this could serve as a fire alarm, but only if you've successfully built a consensus that it is one. This is hard, and the consensus would need to be quite strong. To avoid ambiguity, the test itself would need to be demonstrably resistant to being clever Hans'ed. Otherwise it would be just another milestone.

3Eli Tyre6y

I very much agree.

[-]Eli Tyre11mo62

Sometime people talk about advanced AIs "boiling the oceans". My impression is that there's some specific model for why that is plausible outcome (something about energy and heat dispensation?), and it's not just a random "big change."

What is that model? Is there existing citations for the idea, including LessWrong posts?

[-]quetzal_rainbow11mo154

Roughly, Earth average temperature:

${\frac{j}{σ}}^{1 / 4}$

Where j is dissipating power per area and sigma is Stephan-Boltzmann constant.

We can estimate j as

$G_{s c} \times \frac{π R_{E a r t h}^{2}}{4 π R_{E a r t h}^{2}} \times (1 - a l b e d o)$

Where $G_{S C}$ is a solar constant 1361 W/m^2. We take all incoming power and divide it by Earth surface area. Earth albedo is 0.31.

After substitution of variables, we get Earth temperature 254K (-19C), because we ignore greenhouse effect here.

How much humanity power consumption contributes to direct warming? In 2023 Earth energy consumption was 620 exajoules (source: first link in Google), which is 19TW. Modified rough estimation of Earth temperature is:

${\frac{j_{s o l a r} + \frac{J_{h u m a n}}{S_{E a r t h}}}{σ}}^{1 / 4}$

Human power production per square meter is, like, 0.04W/m^2, which gives us approximately zero effect of direct Earth heating on Earth temperature. But what happens if we, say, increase power by factor x1000? We are going to get increase of Earth temperature to 264K, by 10K, again, we are ignoring greenhouse effect. But qualitatively, increasing power consumption x1000 is likely to screw the biosphere really hard, if we count increasing amount of water vapor, CO2 from water and methane from melting permafrost.

How is it realistic to... (read more)

5Thomas Kwa11mo

The power density of nanotech is extremely high (10 kW/kg), so it only takes 16 kilograms of active nanotech per person * 10 billion people to generate enough waste heat to melt the polar ice caps. Literally boiling the oceans should only be a couple more orders of magnitude, so it's well within possible energy demand if the AIs can generate enough energy. But I think it's unlikely they would want to. Source: http://www.imm.org/Reports/rep054.pdf

5ryan_greenblatt11mo

I don't know of an existing citation. My understanding is that here is enough energy generable via fusion that if you did as much fusion as possible on earth, the oceans would boil. Or more minimally, earth would be uninhabitable by humans living as they currently do. I think this holds even if you just fuse lighter elements which are relatively easy to fuse. (As in, just fusing hydrogen.) Of course, it would be possible to avoid doing this on earth and instead go straight to a dyson swarm or similar. And, it might be possible to dissipate all the heat away from earth though this seems hard and not what would happen in the most efficient approach from my understanding. I think if you want to advance energy/compute production as fast as possible, boiling the oceans makes sense for a technologically mature civilization. However, I expect that boiling the oceans advances progress by no more than several years and possibly much, much less than that (e.g. days or hours) depending on how quickly you can build a dyson sphere and an industrial base in space. My current median guess would be that it saves virtually no time (several days), but a few months seems plausible. Overall, I currently expect the oceans to not be boiled because: * It saves only a tiny amount of time (less than several years, probably much less). So, this is only very important if you are in an conflict or you are very ambitious in resource usage and not patient. * Probably humans will care some about not having the oceans boiled and I expect human preferences to get some weight even conditional on AI takeover. * I expect that you'll have world peace (no conflict) by the time you have ocean boiling technology due to improved coordination/negotiation/commitment technology.

4the gears to ascension11mo

Build enough nuclear power plants and we could boil the oceans with current tech, yeah? They're a significant fraction of fusion output iiuc?

2Thomas Kwa11mo

Not quite, there is a finite quantity of fissiles. IIRC it's only an order of magnitude of energy more than fossil fuel reserves.

[-]Eli Tyre3y60

How do you use a correlation coefficient to do a Bayesian update?

For instance, the wikipedia page on the Heritability of IQ reads:

"The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15."

I'd like to get an intuitive sense of what those quantities actually mean, "how big" they are, how impressed I should be with them.

I imagine I would do that by working out a series of examples. Examples like...

If I know that Alice has has an IQ of 120, what does that tell me about th... (read more)

1JBlack3y

In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable. Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I've emphasized some of the words indicating stronger assumptions. You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: "The model is adequate for predicting reality". Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters - that is the hypotheses "A and (specific parameter values)" - and also various alternative "assumption failure" hypotheses. In the given example, we would very quickly find overwhelming evidence for "the noise is not independent", and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models. This hasn't actually answered your original question "what does that tell me about the IQ of her twin sister Beth?", because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on mor

[-]Eli Tyre4y60

I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn't make them super-human.

Does anyone remember what I'm talking about?

2riceissa4y

Maybe this? (There are a few subthreads on that post that mention linear regression.)

[-]Eli Tyre4y60

Question: Have Moral Mazes been getting worse over time?

Could the growth of Moral Mazes be the cause of cost disease?

I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how "mazy" an organization is.

I considered the metric of "how much output for each input", but 1) that metric is just cost disease itself, so it doesn't help us distinguish the mazy cause from other possible causes, 2) If you're good enough at rent seeking maybe you can get high revenue despite you poor production.

What metric could we use?

6Raemon4y

This is still a bit superficial/goodharty, but I think "number of layers of hierarchy" is at least one thing to look at. (Maybe find pairs of companies that output comparable products that you're somehow able to measure the inputs and outputs of, and see if layers of management correlate with cost disease)

[-]Eli Tyre5y60

This is my current take about where we're at in the world:

Deep learning, scaled up, might be basically enough to get AGI. There might be some additional conceptual work necessary, but the main difference between 2020 and the year in which we have transformative AI is that in that year, the models are much bigger.

If this is the case, then the most urgent problem is strong AI alignment + wise deployment of strong AI.

We'll know if this is the case in the next 10 years or so, because either we'll continue to see incredible gains from increasingly bigger Deep L... (read more)

1niplav5y

(This question is only related to a small point) You write that one possible foundational strategy could be to "radically detraumatize large fractions of the population". Do you believe that 1. A large part of the population is traumatized 2. That trauma is reversible 3. Removing/reversing that trauma would improve the development of humanity drastically? If yes, why? I'm happy to get a 1k page PDF thrown at me. I know that this has been a relatively popular talking point on twitter, but without a canonical resource, and I also haven't seen it discussed on LW.

6Eli Tyre5y

I was wondering if I would get comment on that part in particular. ; ) I don't have a strong belief about your points one through three, currently. But it is an important hypothesis in my hypothesis space, and I'm hoping that I can get to the bottom of it in the next year or two. I do confidently think that one of the "forces for badness" in the world is that people regularly feel triggered or threatened by all kinds of different proposals, reflexively act to defend themselves. I think this is among the top three problems in having good discourse and cooperative politics. Systematically reducing that trigger response would be super high value, if it were feasible. My best guess is that that propensity to be triggered is not mostly the result of infant or childhood trauma. It seems more parsimonious to posit that it is basic tribal stuff. But I could imagine it having its root in something like "trauma" (meaning it is the result of specific experiences, not just general dispositions, and it is practically feasible, if difficult, to clear or heal the underlying problem in a way completely prevents the symptoms). I think there is no canonical resource on trauma-stuff because 1) the people on twitter are less interested on average, in that kind of theory building than we are on lesswong and 2) because mostly those people are (I think) extrapolating from their own experience, in which some practices unlocked subjectively huge breakthroughs in personal well-being / freedom of thought and action. Does that help at all?

2Hazard5y

I plan to blog more about how I understand some of these trigger states and how it relates to trauma. I do think there's a decent amount of written work, not sure how "canonical", but I've read some great stuff that from sources I'm surprised I haven't heard more hype about. The most useful stuff I've read so far is the first three chapters of this book. It has hugely sharpened my thinking. I agree that a lot of trauma discourse on our chunk of twitter is more for used on the personal experience/transformation side, and doesn't let itself well to bigger Theory of Change type scheming. http://www.traumaandnonviolence.com/chapter1.html

2Eli Tyre5y

Thanks for the link! I'm going to take a look!

1niplav5y

Yes, it definitely does–you just created the resource I will will link people to. Thank you! Especially the third paragraph is cruxy. As far as I can tell, there are many people who have (to some extent) defused this propensity to get triggered for themselves. At least for me, LW was a resource to achieve that.

[-]Eli Tyre5y60

I was thinking lately about how there are some different classes of models of psychological change, and I thought I would outline them and see where that leads me.

It turns out it led me into a question about where and when Parts-based vs. Association-based models are applicable.

Google Doc version.

Parts-based / agent-based models

Some examples:

Focusing
IFS
IDC
Connection Theory
The NLP ecological check

This is the frame that I make the most use of, in my personal practice. It assumes that all behavior is the result of some goal directed subproce... (read more)

6Raemon5y

I like this a lot, and think it’d make a good top level post.

2Eli Tyre5y

Really? I would prefer to have something much more developed and/or to have solved my key puzzle here before I put as a top level post.

2Raemon5y

I saw the post more as giving me a framework that was helping for sorting various psych models, and the fact that you had one question about it didn't actually feel too central for my own reading. (Separately, I think it's basically fine for posts to be framed as questions rather than definitive statements/arguments after you've finished your thinking)

4Viliam5y

I wonder how the ancient schools of psychotherapy would fit here. Psychoanalysis is parts-based. Behaviorism is association-based. Rational therapy seems narrative-based. What about Rogers or Maslow? Seems to me that Rogers and the "think about it seriously for 5 minutes" technique should be in the same category. In both cases, the goal is to let the client actually think about the problem and find the solution for themselves. Not sure if this is or isn't an example of narrative-based, except the client is supposed to find the narrative themselves. Maslow comes with a supposed universal model of human desires and lets you find yourself in that system. Jung kinda does the same, but with a mythological model. Sounds like an externally provided narrative. Dunno, maybe the narrative-based should be split into more subgroups, depending on where the narrative comes from (a universal model, an ad-hoc model provided by the therapist, an ad-hoc model constructed by the client)?

2ChristianKl5y

The way I have been taught NLP, you usually don't use either anchors or an ecological check but both. Behavior changes that are created by changing around anchors are not long-term stable when they violate ecology. Changing around associations allows to create new strategies in a more detailed way then you get by just doing parts work and I have the impression that it's often faster in creating new strategies. (A) Interventions that are about resolving traumas feel to me like a different model. (B) None of the three models you listed address the usefulness of connecting with the felt sense of emotions. (C) There's a model of change where you create a setting where people can have new behavioral experiences and then hopefully learn from those experiences and integrate what they learned in their lives. CFAR's goal of wanting to give people more agency about ways they think seems to work through C where CFAR wants to expose people to a bunch of experiences where people actually feel new ways to affect their thinking. In the Danis Bois method both A and C are central.

[-]Eli Tyre6y60

Can someone affiliated with a university, ect. get me a PDF of this paper?

https://psycnet.apa.org/buy/1929-00104-001

It is on Scihub, but that version is missing a few pages in which they describe the methodology.

[I hope this isn't an abuse of LessWrong.]

3romeostevensit6y

time for a new instance of this? https://www.lesswrong.com/posts/4sAsygakd4oCpbEKs/lesswrong-help-desk-free-paper-downloads-and-more-2014

[-]Eli Tyre6y*60

New (image) post: My strategic picture of the work that needs to be done

0Raemon6y

I edited the image into the comment box, predicting that the reason you didn't was because you didn't know you could (using markdown). Apologies if you prefer it not to be here (and can edit it back if so)

[-]Eli Tyre6y120

In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.

I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of... not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.

That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.

4Raemon6y

Understood, and apologies. A fairly common mod practice has been to fix typos and stuff in a sort of "move first and then ask if it was okay" thing. (I'm not confident this is the best policy, but it saves time/friction, and meanwhile I don't think anyone had had an issue with it). But, your preference definitely makes sense and if others felt the same I'd reconsider the overall policy. (It's also the case that adding an image is a bit of a larger change than the usual typo fixing, and may have been more of an overstep of bounds) In any case I definitely won't edit your stuff again without express permission.

1Eli Tyre6y

Cool. : )

4Wei Dai6y

If it's not just you, it's at least pretty rare. I've seen the mods "helpfully" edit posts several times (without asking first) and this is the first time I've seen anyone complain about it.

1Eli Tyre6y

I knew that I could, and didn’t, because it didn’t seem worth it. (Thinking that I still have to upload it to a third party photo repository and link to it. It’s easier than that now?)

2Raemon6y

In this case your blog already counted as a third party repository.

[-]Eli Tyre6y60

New post: Napping Protocol

4Raemon6y

Some of these seem likely to generalize and some seem likely to be more specific. Curious about your thoughts "best experimental approaches to figuring out your own napping protocol."

[-]Eli Tyre5y50

Doing actual mini-RCTs can be pretty simple. You only need 3 things:

1. A spreadsheet

2. A digital coin for randomization

3. A way to measure the variable that you care about

I think one of practically powerful "techniques" of rationality is doing simple empirical experiments like this. You want to get something? You don't know how to get it? Try out some ideas and check which ones work!

There are other applications of empiricism that are not as formal, and sometimes faster. Those are also awesome. But at the very least, I've found that doing ... (read more)

[-]Eli Tyre6y50

New (unedited) post: The bootstrapping attitude

[-]Eli Tyre4y40

Is there a LessWrong article that unifies physical determinism and choice / "free will"? Something about thinking of yourself as the algorithm computed on this brain?

1Measure4y

Perhaps This one?

[-]Eli Tyre6y40

New (unedited) post: Exercise and nap, then mope, if I still want to

[-]Eli Tyre6y40

New post: _Why_ do we fear the twinge of starting?

[-]Eli Tyre2mo30

Is it true that no one knows why Claude 3 Opus (but not other Claude models) has strong behavioral dispositions about animal welfare?

[-]Buck2mo150

IIRC, an Anthropic staff member told me that he had a strong suspicion for why this is, but that it was tied up in proprietary info so he didn't want to say.

[-]Eli Tyre4y30

Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?

(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)

I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the oppo... (read more)

4Raemon4y

My own take is that moral mazes should be considered in the "interesting hypothesis" stage, and that the next step is to actually figure out how to go be empirical about checking it. I made some cursory attempts at this last year, and then found myself unsure this was even the right question. The core operationalization I wanted was something like: * Does having more layers of management introduce pathologies into an organization? * How much value is generated by organizations scaling up? * Can you reap the benefits of organizations scaling up by instead having them splinter off? (The "middle management == disconnected from reality == bad" hypothesis was the most clear-cut of the moral maze model to me, although I don't think it was the only part of the model) I have some disagreements with Zvi about this. I chatted briefly with habryka about this and I think he said something like "it seems like a more useful question is to look for positive examples of orgs that work well, rather than try and tease out various negative ways orgs could fail to work." I think there are maybe two overarching questions this is all relevant to: 1. How should the rationality / xrisk / EA community handle scale? Should we be worried about introducing middle-management into ourselves? 2. What's up with civilization? Is maziness a major bottleneck on humanity? Should we try to do anything about it? (My default answer here is "there's not much to be done here, simply because the world is full of hard problems and this one doesn't seem very tractable even if the models are straightforwardly true." But, I do think this is a contender for humanity hamming problem)

2Dagon4y

There are multiple dimensions to the credibility question. You probably should increase your credence from prior to reading it/about it that large organizations very often have more severe misalignment than you thought. You probably should recognize that the model of middle-management internal competition has some explanatory power. You probably should NOT go all the way to believing that the corporate world is homogeneously broken in exactly this way. I don't think he makes that claim, but it's what a lot of readers seem to take. There's plenty of variation, and the Anna Karenina principle applies (paraphrased): well-functioning organizations are alike; disfunctional organizations are each broken in their own way. But really, it's wrong too - each group is actually distinct, and has distinct sets of forces that have driven it to whatever pathologies or successes it has. Even when there are elements that appear very similar, they have different causes and likely different solutions or coping mechanisms. "is most of the world dominated by moral mazes"? I don't think this is a useful framing. Most groups have some elements of Moral Mazes. Some groups appear dominated by those elements, in some ways. From the outside, most groups are at least somewhat effective at their stated mission, so the level of domination is low enough that it hasn't killed them (though there are certainly "zombie orgs" which HAVE been killed, but don't know it yet).

[-]Eli Tyre5y30

My understanding is that there was a 10 year period starting around 1868, in which South Carolina's legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.

I would like to find a relatively non-partisan account of this period.

Anyone have suggestions?

1_mp_5y

I would just read W. E. B. Du Bois - Black Reconstruction in America (1935)

[-]Eli Tyre3y20

When is an event surprising enough that I should be confused?

Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.

In retrospect, I should have been surprised that "Rohin" kept talking about what Eliezer says in the Sequences. I wouldn't have guessed that Rohin was that "culturally rationalist" or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohi... (read more)

2Raemon3y

Surprise and confusion are two different things[1], but surprise usually goes along with confusion. I think it's a good rationalist skill-to-cultivate to use "surprise" as a trigger to practice noticing confusion, because you don't get many opportunities to do that. I think for most people this is worth doing for minor surprises, not so much because you're that likely to need to do a major update, but because it's just good mental hygiene/practice. 1. ^ Surprise is "an unlikely thing happened." Confusion is "a thing I don't have a good explanation for happened."

[-]Eli Tyre4y20

What was the best conference that you every attended?

2Yoav Ravid4y

IDEC - International Democratic Education Conference - it's hosted by a democratic school in a different country each year, so I attended when my school was hosting (it was 2 days in our school and then 3 more days somewhere else). It was very open, had very good energy, had great people which I got to meet (and since it wasn't too filled with talks actually got the time to talk to) - and oh, yeah, also a few good talks :) If you have any more specific questions I'd be happy to answer.

[-]Eli Tyre4y20

I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.

I thought he specifically mentioned "using AI as a microscope."

Is that a real post, or am I misremembering this one?

4Unnamed4y

https://www.lesswrong.com/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety

[-]Eli Tyre4y20

Are there any hidden risks to buying or owning a car that someone who's never been a car owner might neglect?

I'm considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.

That's inexpensive enough that I'm not that worried about it completely breaking down on me. I'm willing to just eat the monetary cost for the information value.

However, maybe there are other costs or other risks that I'm not tracking, that make this a worse idea.

Things like

- Some ways that a car can break make it dangerous, instead of ... (read more)

3[anonymous]4y

There are. https://www.iihs.org/ratings/driver-death-rates-by-make-and-model You can explore the data yourself, but the general trend is that it appears there have been real improvements in crash fatality rates. Better designed structure, more and better airbags, stability control, and now in some new vehicles automatic emergency braking is standard. Generally a bigger vehicle like a minivan is safer, and a newer version of that minivan will be safer, but you just have to go with what you can afford. Main risk is simply that at this price point that minivan is going to have a lot of miles, and it's simply probability how long it will run until a very expensive major repair is needed. One strategy is to plan to junk the vehicle and get a similar 'beater' vehicle when the present one fails. If you're so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis. I'm not saying one exists, but there are survival risks to poverty.

2Eli Tyre4y

Lol. I'm not impoverished, but I want to cheaply experiment with having a car. It isn't worth it to spend throw away $30,000 on a thing that I'm not going to get much value from.

7[anonymous]4y

Ok but at the price point you are talking you are not going to have a good time. Analogy: would you "experiment with having a computer" by grabbing a packard bell from the 1990s and putting an ethernet card in it so it can connect to the internet from windows 95? Do you need the minivan form factor? As a vehicle in decent condition (6-10 years old, under 100k miles, from a reputable brand) is cheapest in the small car form factor.

2Raemon4y

Not spending $30,000 makes sense, but my impression from car shopping last year was that trying to get a good car for less than $7k was fairly hard. (I get the ‘willingness to eat the cost’ price point of $1k, but wanted to highlight that the next price point up was more like 10k than 30k.) Depending on your experimentation goals, you might want to rent a a car rather than buy.

2Dagon4y

Most auto shops will do a safety/mechanical inspection for a small amount (usually in the $50-200 range, but be aware that the cheaper ones subsidize it by anticipating that they can sell you services to fix the car if you buy it). However, as others have said, this price point is too low for your first car as a novice, unless you have a mentor and intend to spend a lot of time learning to maintain/fix. Something reliable enough for you to actually run the experiment and get the information you want about the benefits vs frustrations of owning a car is going to run probably $5-$10K, depending on regional variance and specifics of your needs. For a first car, look into getting a warranty, not because it's a good insurance bet, but because it forces the seller to make claims of warrantability to their insurance company. You can probably cut the cost in half (or more) if you educate yourself and get to know the local car community. If the car is a hobby rather than an experiment in transportation convenience, you can take a lot more risk, AND those risks are mitigated if you know how to get things fixed cheaply.

[-]Eli Tyre4y20

Is there a standard article on what "the critical risk period" is?

I thought I remembered an arbital post, but I can't seem to find it.

[-]Eli Tyre4y20

I remember reading a Zvi Mowshowitz post in which he says something like "if you have concluded that the most ethical thing to do is to destroy the world, you've made a mistake in your reasoning somewhere."

I spent some time search around his blog for that post, but couldn't find it. Does anyone know what I'm talking about?

2Pattern4y

It sounds like a tagline for a blog.

2Raemon4y

Probably this one? http://lesswrong.com/posts/XgGwQ9vhJQ2nat76o/book-trilogy-review-remembrance-of-earth-s-past-the-three

2Eli Tyre4y

Thanks! I thought that it was in the context of talking about EA, but maybe this is what I am remembering? It seems unlikely though, since wouldn't have read the spoiler-part.

[-]Eli Tyre5y20

Anyone have a link to the sequence post where someone posits that AIs would do art and science from a drive to compress information, but rather it would create and then reveal cryptographic strings (or something)?

1niplav5y

I think you are thinking of “AI Alignment: Why It’s Hard, and Where to Start”: There's also a mention of that method in this post.

[-]Eli Tyre5y20

2Raemon5y

Review of three body problem is my first guess

[-]Eli Tyre5y20

A hierarchy of behavioral change methods

Follow up to, and a continuation of the line of thinking from: Some classes of models of psychology and psychological change

Related to: The universe of possible interventions on human behavior (from 2017)

This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before... (read more)

[-]Eli Tyre5y20

Can anyone get a copy of this paper for me? I'm looking to get clarity about how important cryopreserving non-brain tissue is for preserving personality.