LESSWRONG
LW

All of Eli Tyre's Comments + Replies

This has the obvious problem that an AI will then be indifferent between astronomical suffering and oblivion. In ANY situation where it will need to choose between those two, it will not care about which occurs on the merits, not just blackmail situations.

You don't want your AI to prefer a 99.999% chance of astronomical suffering to a 99.9999% of oblivion. Astronomical suffering is much worse.

Blue light, 'Adrenal ASMR': strange experiences I can't find any literature about

Eli Tyre11d40

What is the blue lamp, so that other pepole can try to replicate?

1vernichtung9d

It's this one! Darken the room and look into it for at least 10 seconds Hawofly LED Flame Effect Light,USB Rechargeable Flameless Candles, Led Blue Fireplace Light with Timer and Remote, Lantern Lights Outdoor Flame for Bedroom Bar Party Home Decor : Amazon.co.uk: Lighting

Eli's shortform feed

Eli Tyre17d30

Is it true that no one knows why Claude 3 Opus (but not other Claude models) has strong behavioral dispositions about animal welfare?

Buck17d150

IIRC, an Anthropic staff member told me that he had a strong suspicion for why this is, but that it was tied up in proprietary info so he didn't want to say.

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Eli Tyre20d20

You mean that the human attention mechanism is the assessor?

2Rafael Harth20d

No I definitely think thought assessment has more to it than just attention. In fact I think you could argue that LLMs' attention equivalent is already more powerful/accurate than human attention.

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Eli Tyre21d20

Do you have a pointer for why you think that?

My (admittedly weak) understanding of the neuroscience doesn't suggest that there's a specialized mechanism for critique of prior thoughts.

2Rafael Harth21d

The only evidence I can provide at this point is the similarity of LLMs to humans who don't pay attention (as first observed in Sarah's post that I linked in the text). If you want to reject the post based on the lack of evidence for this claim, I think that's fair.

Views on when AGI comes and on strategy to reduce existential risk

Eli Tyre21d20

I'm kind of baffled that people are so willing to say that LLMs understand X, for various X. LLMs do not behave with respect to X like a person who understands X, for many X.

Do you have two or three representative examples?

2TsviBT21d

Just think of anything that you've wanted to use a gippity to understand, but it didn't quickly work and you tried to ask it followup questions and it didn't understand what was happening / didn't propagate propositions / didn't clarify / etc.

Views on when AGI comes and on strategy to reduce existential risk

Eli Tyre22d20

In particular, even if the LLM were being continually trained (in a way that's similar to how LLMs are already trained, with similar architecture), it still wouldn't do the thing humans do with quickly picking up new analogies, quickly creating new concepts, and generally reforging concepts.

Is this true? How do you know? (I assume there's some facts here about in-context learning that I just happen to not know.)

It seems like eg I can teach an LLM a new game in one session, and it will operate within the rules of that game.

4TsviBT21d

I won't give why I think this, but I'll give another reason that should make you more seriously consider this: their sample complexity sucks.

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Eli Tyre22d42

Remember that we have no a priori reason to suspect that there are jumps in the future; humans perform sequential reasoning differently, so comparisons to the brain are just not informative.

In what way do we do it differently than the reasoning models?

2Rafael Harth21d

I think we have specialized architectures for consciously assessing thoughts, whereas LLMs do the equivalent of rattling off the first thing that comes to mind, and reasoning models do the equivalent of repeatedly feeding back what comes to mind into the input (and rattling off the first thing that comes to mind for that input).

TsviBT's Shortform

Eli Tyre1mo*20

@Valentine comes to mind as a person who was raised lifeist and is now still lifeist, but I think has more complicated feelings/views about the situation related to enlightenment and metaphysics that make death an illusion, or something.

8Valentine25d

I think I've been unwaveringly lifeist the whole time. My main shift has been that I think I see some value in deathist sentiment that's absent from most lifeist rhetoric I'm familiar with. I want a perspective that honors why both arise. I did dabble with ideas around whether death is an illusion. And I still think there might be something to it. But having done so, it looks like a moving goalposts thing to me. I still don't want to die, and I don't want my loved ones to die, and I think that means something that matters.

Eli's shortform feed

Eli Tyre1mo20

Of course the default outcome of doing finetuning on any subset of data with easy-to-predict biases will be that you aren't shifting the inductive biases of the model on the vast majority of the distribution. This isn't because of an analogy with evolution, it's a necessity of how we train big transformers. In this case, the AI will likely just learn how to speak the "corrigible language" the same way it learned to speak french, and this will make approximately zero difference to any of its internal cognition, unless you are doing transformations to its in

... (read more)

2habryka1mo

I don't undertand what it would mean for "outputs" to be corrigible, so I feel like you must be talking about internal chain of thoughts here? The output of a corrigible AI and a non-corrigibile AI is the same for almost all tasks? They both try to perform any task as well as possible, the difference is how they relate to the task and how they handle interference.

Eli's shortform feed

Eli Tyre1mo20

Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in?

I would guess that if you finetuned a model so that it always responded in French, regardless of the languge you prompt it with, it would persistently respond in French (absent various jailbreaks which would almost definitely exist).

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Eli Tyre1mo20

I'm not sure that I share that intuition, I think because my background model of humans has them as much less general than I imagine yours does.

2Kaj_Sotala1mo

To make it a bit more explicit: * If you are superintelligent in the bioweapon domain: seems pretty obvious why that wouldn't let you take over the world. Sure maybe you can get all the humans killed, but unless automation also advances very substantially, this will leave nobody to maintain the infrastructure that you need to run. * Cybersecurity: if you just crash all the digital infrastructure, then similar. If you try to run some scheme where you extort humans to get what you want, expect humans to fight back, and then you are quickly in a very novel situation and the kind of a "world war" nobody has ever seen before. * Persuasion: depends on what we take the limits of persuasion to be. If it's possible to completely take over the mind of anyone by speaking ten words to them then sure, you win. But if we look at humans, great persuaders often aren't persuasive to everyone - rather they appeal very strongly to a segment of the population that happens to respond to a particular message while turning others off. (Trump, Eliezer, most politicians.) This strategy will get you part of the population while polarizing the rest against you and then you need more than persuasion ability to figure out how to get your faction to triumph. * If you want to run some galaxy-brained scheme where you give people inconsistent messages in order to appeal to all of them, you risk getting caught and need more than persuasion ability to make it work. * You can also be persuasive by being generally truthful and providing people with a lot of value and doing beneficial things. One can try to fake this by doing things that look beneficial but aren't, but then you need more than persuasion ability to figure out what those would be. * Probably the best strategy would be to keep being genuinely helpful until people trust you enough to put you in a position of power and then betray that trust. I could imagine this working. But it would be a slow strategy as it would take time to build

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Eli Tyre1mo50

Fascinating and useful post.

Thank you for writing it.

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Eli Tyre1mo131

In my experience, this is a common kind of failure with LLMs - that if asked directly about how to best a solve problem, they do know the answer. But if they aren’t given that slight scaffolding, they totally fail to apply it.

Notably, this is also true of almost all humans, at least of content that they've learned in school. The literature on transfer learning is pretty dismal in this respect. Almost all students will fail to apply their knowledge to new domains without very explicit prompting.

6Steven Byrnes1mo

I kinda agree, but that’s more a sign that schools are bad at teaching things, than a sign that human brains are bad at flexibly applying knowledge. See my comment here.

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Eli Tyre1mo40

implies that they would also be inable to deal with the kind of novelty that an AGI would by definition need to deal with.

I guess this is technically true, because of the "General" in "AGI". But I this doesn't imply as much about how dangerous future LLM-based AI systems will be.

The first Strategically Superhuman AI systems might be importantly less general than humans, but still shockingly competent in the many specific domains on which they've been trained. An AI might make many basic reasoning failures in domains that are not represented in the training... (read more)

2Kaj_Sotala1mo

Yeah, I could imagine an AI being superhuman in some narrow but important domain like persuasion, cybersecurity, or bioweapons despite this. Intuitively that feels like it wouldn't be enough to take over the world, but it could possibly still fail in a way that took humanity down with it.

Eli's shortform feed

Eli Tyre1mo30

For the same reasons 'training an agent on a constitution that says to care about $x$ ' does not, at arbitrary capability levels, produce an agent that cares about $x$

Ok, but I'm trying to ask why not.

Here's the argument that I would make for why not, followed by why I'm skeptical of it right now.

New options for the AI will open up at high capability levels that were not available at lower capability levels. This could in principle lead to undefined behavior that deviates from what we intended.

More specifically, if it's the case that if...

The best / e

... (read more)

4Lucius Bushnaq1mo

I don't think I am very good at explaining my thoughts on this in text. Some prior writings that have informed my models here are the MIRI dialogues, and the beginning parts of Steven Byrnes' sequence on brain-like AGI, which sketch how the loss functions human minds train on might look and gave me an example apart from evolution to think about. Some scattered points that may or may not be of use: * There is something here about path dependence. Late in training at high capability levels, very many things the system might want are compatible with scoring very well on the loss, because the system realises that doing things that score well on the loss is instrumentally useful. Thus, while many aspects of how the system thinks are maybe nailed down quite definitively and robustly by the environment, what it wants does not seem nailed down in this same robust way. Desires thus seem like they can be very chaotically dependent on dynamics in early training, what the system reflected on when, which heuristics it learned in what order, and other low level details like this that are very hard to precisely control. * I feel like there is something here about our imaginations, or at least mine, privileging the hypothesis. When I imagine an AI trained to say things a human observer would rate as 'nice', and to not say things a human observer rates as 'not nice', my imagination finds it natural to suppose that this AI will generalise to wanting to be a nice person. But when I imagine an AI trained to respond in English, rather than French or some other language, I do not jump to supposing that this AI will generalise to terminally valuing the English language. Every training signal we expose the AI to reinforces very many behaviours at the same time. The human raters that may think they are training the AI to be nice are also training it to respond in English (because the raters speak English), to respond to queries at all instead of ignoring them, to respond in English

habryka1mo232

Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in? If so, I am confident you are wrong and that you have learned something new today!

Training transformers in additional languages basically doesn't really change performance at all, the model just learns to translate between its existing internal latent distribution and the new language, and then just now has a... (read more)

Chris_Leong's Shortform

Eli Tyre1mo20

Nonetheless, it does seem as though there should be at least one program that aims to find the best talent (even if they aren't immediately useful) and which provides them with the freedom to explore and the intellectual environment in which to do so.

I think SPARC and its decedents are something like this.

2Chris_Leong1mo

How long does SPARC go for?

Eli's shortform feed

Eli Tyre1mo254

Dumb question: Why doesn't using constitutional AI, where the constitution is mostly or entirely corrigibility produce a corrigible AI (at arbitrary capability levels)?

My dumb proposal:

1. Train a model in something like o1's RL training loop, with a scratch pad for chain of thought, and reinforcement of correct answers to hard technical questions across domains.

2. Also, take those outputs, prompt the model to generate versions of those outputs that "are more corrigible / loyal / aligned to the will of your human creators". Do backprop to reinforce those mo... (read more)

habryka1mo*170

Things that happen:

Backpropagating on the outputs that are "more corrigible" will have some (though mostly very small) impact on your task performance. If you set the learning rate high, or you backpropagate on a lot of data, your performance can go down arbitrarily far.
By default this will do very little because you are providing training data with very little variance in it (even less so than usual, because you are training on AI outputs, which the AI is of course already amazing at predicting). If you train very hard you will probably deal with co

... (read more)

7Wei Dai1mo

What happens when this agent is faced with a problem that is out of its training distribution? I don't see any mechanisms for ensuring that it remains corrigible out of distribution... I guess it would learn some circuits for acting corrigibly (or at least in accordance to how it would explicitly answer "are more corrigible / loyal / aligned to the will of your human creators") in distribution, and then it's just a matter of luck how those circuits end up working OOD?

Lucius Bushnaq1mo*1614

For the same reasons training an agent on a constitution that says to care about $x$ does not, at arbitrary capability levels, produce an agent that cares about $x$ .

If you think that doing this does produce an agent that cares about $x$ even at arbitrary capability levels, then I guess in your world model it would indeed be consistent for that to work for inducing corrigibility as well.

3tailcalled1mo

Let's say you are using the AI for some highly sensitive matter where it's important that it resists prompt-hacking - e.g. driving a car (prompt injections could trigger car crashes), something where it makes financial transactions on the basis of public information (online websites might scam it), or military drones (the enemy might be able to convince the AI to attack the country that sent it). A general method for ensuring corrigibility is to be eager to follow anything instruction-like that you see. However, this interferes with being good at resisting prompt-hacking.

7Seth Herd1mo

I have the same question. My provisional answer is that it might work, and even if it doesn't, it's probably approximately what someone will try, to the extent they really bother with real alignment before it's too late. What you suggest seems very close to the default path toward capabilities. That's why I've been focused on this as perhaps the most practical path to alignment. But there are definitely still many problems and failure points. I have accidentally written a TED talk below; thanks for coming, and you can still slip out before the lights go down. What you've said above is essentially what I say in Instruction-following AGI is easier and more likely than value aligned AGI. Instruction-following (IF) is a poor man's corrigibility - real corrigibility as the singular target seems safer. But instruction-following is also arguably already the single largest training objective in functional terms for current-gen models - a model that won't follow instructions is considered a poor model. So making sure it's the strongest factor in training isn't a huge divergence from the default course in capabilities. Constitutional AI and similar RL methods are one way of ensuring that's the model's main goal. There are many others, and some might be deployed even if devs want to skimp on alignment. See System 2 Alignment or at least the intro for more. There are still ways it could go wrong, of course. One must decide: corrigible to whom? You don't want full-on-AGI following orders from just anyone. And if it's a restricted set, there will be power struggles. But hey, technically, you had (personal-intent-) aligned AGI. One might ask: If we solve alignment, do we die anyway? (I did). The answer I've got so far is maybe we would die anyway, but maybe we wouldn't. This seems like our most likely path, and also quite possibly also our best chance (short of a global AI freeze starting soon). Even if the base model is very well aligned, it's quite possible for the full s

2Knight Lee1mo

Edit: I thought more about this and wrote a post inspired by your idea! A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives :) strong upvote.[1] I really agree it's a good idea, and may increase the level of capability/intelligence we can reach before we lose corrigibility. I think it is very efficient (low alignment tax). The only nitpick is that Claude's constitution already includes aspects of corrigibility,[2] though maybe they aren't emphasized enough. Unfortunately I don't think this will maintain corrigibility for unlimited amounts of intelligence. Corrigibility training makes the AI talk like a corrigible agent, but reinforcement learning eventually teaches it chains-of-thought which (regardless of what language it uses) computes the most intelligent solution that achieves the maximum reward (or proxies to reward), subject to restraints (talking like a corrigible agent). Nate Soares of MIRI wrote a long story on how an AI trained to never think bad thoughts still ends up computing bad thoughts indirectly, though in my opinion his story actually backfired and illustrated how difficult it is for the AI, raising the bar on the superintelligence required to defeat your idea. It's a very good idea :) 1. ^ I wish LessWrong would promote/discuss solutions more, instead of purely reflecting on how hard the problems are. 2. ^ Near the bottom of Claude's constitution, in the section "From Anthropic Research Set 2"

No77e's Shortform

Eli Tyre2mo816

What are the two groups in question here?

1No77e2mo

I think it's probably more of a spectrum than two distinct groups, and I tried to pick two extremes. On one end, there are the empirical alignment people, like Anthropic and Redwood; on the other, pure conceptual researchers and the LLM whisperers like Janus, and there are shades in between, like MIRI and Paul Christiano. I'm not even sure this fits neatly on one axis, but probably the biggest divide is empirical vs. conceptual. There are other splits too, like rigor vs. exploration or legibility vs. 'lore,' and the preferences kinda seem correlated.

Martin Randall's Shortform

Eli Tyre3mo92

AI x-risk is high, which makes cryonics less attractive (because cryonics doesn't protect you from AI takeover-mediated human extinction). But on the flip side, timelines are short, which makes cryonics more attractive (because one of the major risks of cryonics is society persisting stably enough to keep you preserved until revival is possible, and near term AGI means that that period of time is short).

Cryonics is more likely to work, given a positive AI trajectory, and less likely to work given a negative AI trajectory.

I agree that it seems less likely to work, overall, than it seemed to me a few years ago.

2Martin Randall3mo

Makes sense. Short timelines mean faster societal changes and so less stability. But I could see factoring societal instability risk into time-based risk and tech-based risk. If so, short timelines are net positive for the question "I'm going to die tomorrow, should I get frozen?".

Eli's shortform feed

Eli Tyre3mo20

yeahh i'm afraid I have too many other obligations right now to give a elaboration that does it justice.

Fair enough!

otoh i'm in the Bay and we should definitely catch up sometime!

Sounds good.

Eli's shortform feed

Eli Tyre3mo110

Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.

Um, I'm not attempting to do cause prioritization or action-planning in the above comment. More like sense-making. Before I move on to the question of what should we do, I want to have an accurate model of the social dynamics in the space.

(That said, it doesn't seem a foregone conclusion that there are actionable things to do, that will come out of this analysis. If the above story is tr... (read more)

2Alexander Gietelink Oldenziel3mo

yeahh i'm afraid I have too many other obligations right now to give a elaboration that does it justice. otoh i'm in the Bay and we should definitely catch up sometime!

Eli's shortform feed

Eli Tyre3mo92

@Alexander Gietelink Oldenziel, you put a soldier mindset react on this (and also my earlier, similar, comment this week).

What makes you think so?

Definitely this model posits that adversariality, but I don't think that I'm invested in "my side" of the argument winning here, FWTIW. This currently seems like the most plausible high level summary of the situation, given my level of context.

Is there a version of this comment that would regard as better?

Alexander Gietelink Oldenziel3mo13-7

Yes sorry Eli, I meant to write out a more fully fleshed out response but unfortunately it got stuck in drafts.

The tl;dr is that I feel this perspective is singling out Sam Altman as some uniquely machiavellian actor in a way I find naive /misleading and ultimately maybe unhelpful.

I think in general im skeptical of the intense focus on individuals & individual tech companies that LW/EA has develloped recently. Frankly, it feels more rooted in savannah-brained tribalism & human interest than a evenkeeled analysis of what factors are actually important, neglected and tractable.

sjadler's Shortform

Eli Tyre3mo20

I don't dispute that he never had any genuine concern. I guess that he probably did have genuine concern (though not necessarily that that was his main motivation for founding OpenAI).

Eli's shortform feed

Eli Tyre3mo*6735

In a private slack someone extended credit to Sam Altman for putting EAs on the on the OpenAI board originally, especially that this turned out to be pretty risky / costly for him.

I responded:

It seems to me that there were AI safety people on the board at all is fully explainable by strategic moves from an earlier phase of the game.

Namely, OpenAI traded a boardseat for OpenPhil grant money, and more importantly, OpenPhil endorsement, which translated into talent sourcing and effectively defused what might have been vocal denouncement from one of the major ... (read more)

9Eli Tyre3mo

@Alexander Gietelink Oldenziel, you put a soldier mindset react on this (and also my earlier, similar, comment this week). What makes you think so? Definitely this model posits that adversariality, but I don't think that I'm invested in "my side" of the argument winning here, FWTIW. This currently seems like the most plausible high level summary of the situation, given my level of context. Is there a version of this comment that would regard as better?

2romeostevensit3mo

*got paid to remove them as a social threat

plex3mo220

More cynical take based on the Musk/Altman emails: Altman was expecting Musk to be CEO. He set up a governance structure which would effectively be able to dethrone Musk, with him as the obvious successor, and was happy to staff the board with ideological people who might well take issue with something Musk did down the line to give him a shot at the throne.

Musk walked away, and it would've been too weird to change his mind on the governance structure. Altman thought this trap wouldn't fire with high enough probability to disarm it at any time before it di... (read more)

1[comment deleted]3mo

Elizabeth3mo278

Note that at time of donation, Altman was co-chair of the board but 2 years away from becoming CEO.

Thread for Sense-Making on Recent Murders and How to Sanely Respond

Eli Tyre3mo92

But it is our mistake that we didn't stand firmly against drugs, didn't pay more attention to the dangers of self-experimenting, and didn't kick out Ziz sooner.

These don't seem like very relevant or very actionable takeways.

we didn't stand firmly against drugs - Maybe this would have been a good move generally, but it wouldn't have helped with this situation at all. Ziz reports that they don't take psychedelics, and I believe that extends to her compatriots, as well.
didn't pay more attention to the dangers of self-experimenting - What does this mean concre

... (read more)

4Said Achmiz3mo

From https://www.sfchronicle.com/bayarea/article/ziz-lasota-zizians-rationalism-20063671.php:

9habryka3mo

FWIW, I think I had triggers around them being weird/sketchy that would now cause me to exclude them from many community things, so I do think there were concrete triggers, and I did update on that.

2Viliam3mo

I wasn't there, so who knows how I would have reacted, it probably looks different in hindsight, but it seems like there were already red flags, some people noticed them, and others ignored them: -- ‘Zizian’ namesake who faked death in 2022 is wanted in two states

Eli's shortform feed

Eli Tyre3mo100

[For some of my work for Palisade]

Does anyone know of even very simple examples of AIs exhibiting instrumentally convergent resource aquisition?

Something like "an AI system in a video game learns to seek out the power ups, because that helps it win." (Even better would be a version in which, you can give the agent one of several distinct-video game goals, but regardless of the goal, it goes and gets the powerups first).

It needs to be an example where the instrumental resource is not strictly required for succeeding at the task, while still being extremely helpful.

4Mateusz Bagiński3mo

I haven't looked into this in detail but I would be quite surprised if Voyager didn't do any of that? Although I'm not sure whether what you're asking for is exactly what you're looking for. It seems straightforward that if you train/fine-tune a model on examples of people playing a game that involves leveraging [very helpful but not strictly necessary] resources, you are going to get an AI capable of that. It would be more non-trivial if you got an RL agent doing that, especially if it didn't stumble into that strategy/association "I need to do X, so let me get Y first" by accident but rather figured that Y tends to be helpful for X via some chain of associations.

sjadler's Shortform

Eli Tyre3mo31

Is this taken to be a counterpoint to my story above? I'm not sure exactly how it's related.

6RobertM3mo

Yes: In the context of the thread, I took this to suggest that Sam Altman never had any genuine concern about x-risk from AI, or, at a minimum, that any such concern was dominated by the social maneuvering you're describing. That seems implausible to me given that he publicly expressed concern about x-risk from AI 10 months before OpenAI was publicly founded, and possibly several months before it was even conceived.

sjadler's Shortform

Eli Tyre3mo1516

My model is that Sam Altman regarded the EA world as a memetic threat, early on, and took actions to defuse that threat by paying lip service / taking openphil money / hiring prominent AI safety people for AI safety teams.

Like, possibly the EAs could have crea ed a widespread vibe that building AGI is a cartoon evil thing to do, sort of the way many people think of working for a tobacco company or an oil company.

Then, after ChatGPT, OpenAI was a much bigger fish than the EAs or the rationalists, and he began taking moves to extricate himself from them.

7RobertM3mo

Sam Altman posted Machine intelligence, part 1[1] on February 25th, 2015. This is admittedly after the FLI conference in Puerto Rico, which is reportedly where Elon Musk was inspired to start OpenAI (though I can't find a reference substantiating his interaction with Demis as the specific trigger), but there is other reporting suggesting that OpenAI was only properly conceived later in the year, and Sam Altman wasn't at the FLI conference himself. (Also, it'd surprise me a bit if it took nearly a year, i.e. from Jan 2nd[2] to Dec 11th[3], for OpenAI to go from "conceived of" to "existing".) 1. ^ That of the famous "Development of superhuman machine intelligence (SMI) [1] is probably the greatest threat to the continued existence of humanity." quote. 2. ^ The FLI conference. 3. ^ OpenAI's public founding.

Thread for Sense-Making on Recent Murders and How to Sanely Respond

Eli Tyre3mo*345

My read:

"Zizian ideology" is a cross between rationalist ideas (the historical importance of AI, a warped version timeless decision theory, that more is possible with regards to mental tech) and radical leftist/anarchist ideas (the state and broader society are basically evil oppressive systems, strategic violence is morally justified, veganism), plus some homegrown ideas (all the hemisphere stuff, the undead types, etc).

That mix of ideas is compelling primarily to people who are already deeply invested in both rationality ideas and leftist / social justic... (read more)

The Failed Strategy of Artificial Intelligence Doomers

Eli Tyre3mo*2-3

(I endorse personal call outs like this one.)

Why? Forecasting the future is hard, and I expect surprises that deviate from my model of how things will go. But o1 and o3 seem like pretty blatant evidence that reduced my uncertainty a lot. On pretty simple heuristics, it looks like earth now knows how to make a science and engineering superintelligence: by scaling reasoning modes in a self-play-ish regime.

I would take a bet with you about what we expect to see in the next 5 years. But more than that, what kind of epistemology do you think I should be doing that I'm not?

5Nick_Tarleton3mo

To be more object-level than Tsvi: o1/o3/R1/R1-Zero seem to me like evidence that "scaling reasoning models in a self-play-ish regime" can reach superhuman performance on some class of tasks, with properties like {short horizons, cheap objective verifiability, at most shallow conceptual innovation needed} or maybe some subset thereof. This is important! But, for reasons similar to this part of Tsvi's post, it's a lot less apparent to me that it can get to superintelligence at all science and engineering tasks.

2TsviBT3mo

I can't tell what you mean by much of this (e.g. idk what you mean by "pretty simple heuristics" or "science + engineering SI" or "self-play-ish regime"). (Not especially asking you to elaborate.) Most of my thoughts are here, including the comments: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce Not really into formal betting, but what are a couple Pareto[impressive, you're confident we'll see within 5 years] things? Come on, you know. Actually doubt, and then think it through. I mean, I don't know. Maybe you really did truly doubt a bunch. Maybe you could argue me from 5% omnicide in next ten years to 50%. Go ahead. I'm speaking from informed priors and impressions.

The Failed Strategy of Artificial Intelligence Doomers

Eli Tyre3mo20

Have the others you listed produced insights on that level? What did you observe that leads you to call them geniuses, "by any reasonable standard"?

2Mateusz Bagiński3mo

Sam: https://www.lesswrong.com/posts/CvKnhXTu9BPcdKE4W/an-untrollable-mathematician-illustrated

4TsviBT3mo

Jessica I'm less sure about. Sam, from large quantities of insights in many conversations. If you want something more legible, I'm what, >300 ELO points better than you at math; Sam's >150 ELO points better than me at math if I'm trained up, now probably more like >250 or something. Not by David's standard though, lol.

Falsehoods you might believe about people who are at a rationalist meetup

Eli Tyre3mo42

It might help if you spelled it as LSuser. (I think you can change that in the settings).

2lsusr3mo

I often spell it Lsusr because "lsusr" looks too similar to "Isusr" in certain fonts.

The Failed Strategy of Artificial Intelligence Doomers

Eli Tyre3mo45

In that sense, for many such people, short timelines actually are totally vibes based.

I dispute this characterization. It's normal and appropriate for people's views to update in response to the arguments produced by others.

Sure, sometimes people most parrot other people's views, without either developing them independently or even doing evaluatory checks to see if those views seem correct. But most of the time, I think people are doing those checks?

Speaking for myself, most of my views on timelines are downstream of ideas that I didn't generate myself. But I did think about those ideas, and evaluate if they seemed true.

TsviBT3mo100

I think people are doing those checks?

No. You can tell because they can't have an interesting conversation about it, because they don't have surrounding mental content (such as analyses of examples that stand up to interrogation, or open questions, or cruxes that aren't stupid). (This is in contrast to several people who can have an interesting conversation about, even if I think they're wrong and making mistakes and so on.)

But I did think about those ideas, and evaluate if they seemed true.

Of course I can't tell from this sentence, but I'm pretty s... (read more)

Steering Gemini with BiDPO

Eli Tyre3mo108

I find your commitment to the basics of rational epistemology inspiring.

Keep it up and let me know if you could use support.

Falsehoods you might believe about people who are at a rationalist meetup

Eli Tyre3mo52

I currently believe it's el-es-user, as in LSuser. Is that right?

4lsusr3mo

Yup!

The Failed Strategy of Artificial Intelligence Doomers

Eli Tyre3mo60

Can you operationalize the standard you're using for "genius" here? Do you mean "IQ > 150"?

7TsviBT3mo

Of course not. I mean, any reasonable standard? Garrabrant induction, bro. "Produces deep novel (ETA: important difficult) insight"

Thread for Sense-Making on Recent Murders and How to Sanely Respond

Eli Tyre3mo*7711

I think that Octavia is confused / mistaken about a number of points here, such that her testimony seems likely to be misleading to people without much context.

[I could find citations for many of my claims here, but I'm going to write and post this fast, mostly without the links, for the time being. I am largely going off of my memory of blog post comments that I read months to years ago, and my memory is fallible. I'll try to accurately represent my epistemic status inline. If anyone knows the links that I'm referring to, feel free to put them in the comm... (read more)

Hire (or Become) a Thinking Assistant

Eli Tyre4mo20

Somewhat. Not as well as a thinking assistant.

Namely, the impetus to start still needed to come from inside of me in my low efficacy state.

I thought that I should do a training regime where I took some drugs or something (maybe mega doses of carbs?) to intentionally induce low efficacy states and practice executing a simple crisp routine, like triggering the flowchart, but I never actually got around to doing that.

I maybe still should?

Hire (or Become) a Thinking Assistant

Eli Tyre4mo50

Here's an example.

This was process I tried for a while to make transitioning out of less effective states easier, by reducing the cognitive overhead. I would basically answer a series of questions to navigate a tree of possible states, and then the app would tell me directly what to do next, instead of my needing to diagnose what was up with me free-form, and then figure out how to respond to that, all of which was unaffordable when I was in a low-efficacy state.

State modulation process:
- Start: #[[state modulation notes]]
  - Is this a high activation stat

... (read more)

2Raemon4mo

Did this work?

When can I be numerate?

Answer by Eli TyreJan 01, 202542

A friend of mine once told me "if you're making a decision that depends on a number, and you haven't multiplied two numbers together, you're messing up." I think this is basically right, and I've taken it to heart.

Some triggers for me:

Verbiage

When I use any of the following words, in writing or in speech, I either look up an actual number, or quickly do a fermi estimate in a spreadsheet, to check if my intutitive idea is actually right.

"Order of magnitude"
"A lot"
"Enormous" / "enormously"

Question Templates

When I'm asking a question, that effectively reduces... (read more)

Hire (or Become) a Thinking Assistant

Eli Tyre5mo20

I'm open to hiring people remotely. DM me.

Hire (or Become) a Thinking Assistant

Eli Tyre5mo40

Then, since I've done the upfront work of thinking through my own metacognitive practices, the assistant only has to track in the moment what situation I'm in, and basically follow a flowchart I might be too tunnel-visioned to handle myself.

In the past I have literally used flowcharts for this, including very simple "choose your own adventure" templates in roam.

The root node is just "something feels off, or something", and then the template would guide me through a series of diagnostic questions, leading me to root nodes with checklists of very specific next actions depending on my state.

2CstineSublime5mo

The fact that you have and are using flowcharts for that use is very validating to me, because I've been trying to create my own special flowcharts to guide me through diagnostic questions on a wide range of situations for about 6 months down. Are you willing or able to share any of yours? Or at the very least what observations you've made about the ones you use the most or are most effective? (Obviously different courses for different horses/adjust the seat - everyone will have different flowcharts depending on their own meta-cognitive bottlenecks) Mine has gone through many iterations. The most most expansive one is it lists different interrogatives "Should I..." "Why do I..." "How can/should I..." and suggests what I should be asking instead. For example "Why do I always..." should be replaced with "Oh yeah, name three times this happened?" which itself begs the problem statement questions - Why did you expect that to work (How how confident were you/how surprised when it didn't)? How did it differ from your expectations? How did you react (and why did you react in that way)? The most useful one is a cheatsheet of how to edit videos, with stuff like "Cut at least one frame after the dialogue/vocals comes in", "if an edit feels sloppy, consider grouping B-roll by location rather than theme/motif". It's not really a flowchart in that there's rarely branching paths like the question one. Does this, at least structurally or implementation wise resemble your most effective flow-charts?

Hire (or Become) a Thinking Assistant

Eli Tyre5mo*40

FYI: I'm hiring for basically a thinking assistant, right now, for I expect 5 to 10 hours a week. Pay depending on skill-level. Open to in-person or remote.

If you're really good, I'll recommend you to other people who I want boosted, and I speculate that this could easily turn into a full time role.

If you're interested or maybe interested, DM me. I'll send you my current writeup of what I'm looking for (I would prefer not to post that publicly quite yet), and if you're still interested, we can do a work trial.

However, fair warning: I've tried various versi... (read more)

Review: Planecrash

Eli Tyre5mo30

A different way to ask the question: what, specifically, is the last part of the text that is spoiled by this review?

Review: Planecrash

Eli Tyre5mo30

Can someone tell me if this post contains spoilers?

Planecrash might be the single work of fiction for which I most want to avoid spoilers, of either the plot or the finer points of technical philosophy.

1NoriMori19924mo

I think if you're describing planecrash as "the single work of fiction for which I most want to avoid spoilers", you probably just shouldn't read any reviews of it or anything about it until after you've read it. If you do read this review beforehand, you should avoid the paragraph that begins with "By far the best …" (The paragraph right before the heading called "The competence".) That mentions something that I definitely would have considered a spoiler if I'd read it before I read planecrash. Aside from that, it's hard to answer without knowing what kinds of things you consider spoilers and what you already know about planecrash.

2momom24mo

Having read Planecrash, I do not think there is anything in this review that I would not have wanted to know before reading the work (which is the important part of what people consider "spoilers" for me).

3eggsyntax4mo

It's definitely spoilerful by my standards. I do have unusually strict standards for what counts as spoilers, but it sounds like in this case you're wanting to err on the side of caution. Giving a quick look back over it, I don't see any spoilers for anything past book 1 ('Mad Investor Chaos and the Woman of Asmodeus').

3Eli Tyre5mo

A different way to ask the question: what, specifically, is the last part of the text that is spoiled by this review?

2davekasten5mo

Everyone who's telling you there aren't spoilers in here is well-meaning, but wrong. But to justify why I'm saying that is also spoilery, so to some degree you have to take this on faith. (Rot13'd for those curious about my justification: Bar bs gur znwbe cbvagf bs gur jubyr svp vf gung crbcyr pna, vs fhssvpvragyl zbgvingrq, vasre sne zber sebz n srj vfbyngrq ovgf bs vasbezngvba guna lbh jbhyq anviryl cerqvpg. Vs lbh ner gryyvat Ryv gung gurfr ner abg fcbvyref V cbyvgryl fhttrfg gung V cerqvpg Nfzbqvn naq Xbein naq Pnevffn jbhyq fnl lbh ner jebat.)

2L Rudolf L5mo

It doesn't contain anything I would consider a spoiler. If you're extra scrupulous, the closest things are: * A description of a bunch of stuff that happens very early on to set up the plot * One revelation about the character development arc of a non-major character * A high-level overview of technical topics covered, and commentary on the general Yudkowskian position on them (with links to precise Planecrash parts covering them), but not spoiling any puzzles or anything that's surprising if you've read a lot of other Yudkowsky * A bunch of long quotes about dath ilani governance structures (but these are not plot relevant to Planecrash at all) * A few verbatim quotes from characters, which I guess would technically let you infer the characters don't die until they've said those words?

4ryan_greenblatt5mo

It has spoilers thought they aren't that big of spoilers I think.

Mark Xu's Shortform

Eli Tyre5mo130

I've sometimes said that dignity in the first skill I learned (often to the surprise of others, since I am so willing to look silly or dumb or socially undignified). Part of my original motivation for bothering to intervene on x-risk, is that it would be beneath my dignity to live on a planet with an impending intelligence explosion on track to wipe out the future, and not do anything about it.

I think Ben's is a pretty good description of what it means for me, modulo that the "respect" in question is not at all social. It's entirely about my relationship with myself. My dignity or not is often not visible to others at all.

2Ben Pace5mo

When/how did you learn it? (Inasmuch as your phrasing is not entirely metaphorical.)

Nathan Young's Shortform

Eli Tyre5mo20

I use daily checklists, in spreadsheet form, for this.

leogao's Shortform

Eli Tyre5mo41

Was this possibly a language thing? Are there Chinese or Indian machine learning researchers who would use a different term than AGI in their native language?

6leogao5mo

I'd be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I'm confident in my ability to administer this experiment on Chinese, French, and German speakers, which won't be an unbiased sample of non-native speakers, but hopefully still provides some signal.

[Link] A community alert about Ziz

Eli Tyre5mo137

If your takeaway is only that you should have fatter tails on the outcomes of an aspiring rationality community, then I don't object.

If "I got some friends together and we all decided to be really dedicatedly rational" is intended as a description of Ziz and co, I think it is a at least missing many crucial elements, and generally not a very good characterization.

1Hastings5mo

It is intended as a description of Ziz and co, but with a couple caveats: 1) It was meant as a description that I could hypothetically pattern match to while getting sucked in to one of these, which meant no negative value judgements in the conditions, only in the observed outcomes. 2) It was meant to cast a wide net - hence the tails. When checking if my own activities could be spiraling into yet another rationalist cult, false positives of the form "2% yes- let's look into that" are very cheap. It wasn't meant as a way for me to police the activities of others since that's a setting where false positives are expensive.