1 min read

29

This is a special post for quick takes by Eli Tyre. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.

Eli's shortform feed
216 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.

This post was my attempt at the time.

I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon.

2024

  • A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better langua
... (read more)
Reply5444322211
6Adele Lopez
Love seeing stuff like this, and it makes me want to try this exercise myself! A couple places which clashed with my (implicit) models: This is arguably already happening, with Character AI and its competitors. Character AI has almost half a billion visits per month with an average visit time of 22 minutes. They aren't quite assistants the way you're envisioning; the sole purpose (for the vast majority of users) seems to be the parasocial aspect. I predict that the average person will like this (at least with the most successful such bots), similar to how e.g. Logan Paul uses his popularity to promote his Maverick Clothing brand, which his viewers proudly wear. A fun, engaging, and charismatic such bot will be able to direct its users towards arbitrary brands while also making the user feel cool and special for choosing that brand.
4Raemon
lol at the approval/agreement ratio here. It does seem like this is a post that surely gets something wrong.

I think that, in almost full generality, we should taboo the term "values". It's usually ambiguous between a bunch of distinct meanings.

  • The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
  • The incentives of an agent in a formalized game with quantified payoffs.
  • A utility function - one's hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
  • A person's revealed preferences.
  • The experiences and activities that a person likes for their own sake.
  • A person's vision of an ideal world. (Which, I claim, often reduces to "an imagined world that's aesthetically appealing.")
  • The goals that are at the root of a chain or tree of instrumental goals.
    • [This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
  • The goals/narratives that give meaning to someone's life.
    • [It can be the case almost all one's meaning can come through a particula
... (read more)

I at least partly buy this, but I want to play devil's advocate.

Let's suppose there's a single underlying thing which ~everyone is gesturing at when talking about (humans') "values". How could a common underlying notion of "values" be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their "values"?

An analogy: in political science, people talk about "power". Right up top, wikipedia defines "power" in the political science sense as:

In political science, power is the social production of an effect that determines the capacities, actions, beliefs, or conduct of actors.

A minute's thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn't even match the way the term is used by the person who came up with that definition.

That's the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.

... but that doesn't imply that there's no single underlyin... (read more)

4Steven Byrnes
I’m kinda confused by this example. Let’s say the person exhibits three behaviors: * (1): They make broad abstract “value claims” like “I follow Biblical values”. * (2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”. * (3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not? In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something. (For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.) I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them. Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading. In the power example, the physics definition (energy over time) and the
2cubefox
I agree. Some interpretations of "values" you didn't explicitly list, but I think are important: * What someone wants to be true (analogous to what someone believes to be true) * What someone would want to be true if they knew what it would be like if it were true * What someone believes would be good if it were true These are distinct, because either could clearly differ from the others. So the term "value" is actually ambiguous, not just vague. Talking about "values" is usually unnecessarily unclear, similar to talking about "utilities" in utility theory.
7Shankar Sivarajan
A few of the "distinct meanings" you list are very different from the others, but many of those are pretty similar. "Values" is a pretty broad term, including everything on the "ought" side of the is–ought divide, less "high-minded or noble" preferences, and one's "ranking over possible worlds", and that's fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn't adequately clarify what you mean. Seeing through heaven's eyes or not, I see no meaningful difference between the statements "I would like to sleep with that pretty girl" and "worlds in which I sleep with that pretty girl are better than the ones in which I don't, ceteris paribus." I agree this is the key difference: yes, I conflate these two meanings[1], and like the term "values" because it allows me to avoid awkward constructions like the latter when describing one's motivations.   1. ^ I actually don't see two different meanings, but for the sake of argument, let's grant that they exist.
2cubefox
Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about "values" is usually abstract. The fact that people could be more specific didn't so far imply that they are.
2quetzal_rainbow
My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.

New post: Some things I think about Double Crux and related topics

I've spent a lot of my discretionary time working on the broad problem of developing tools for bridging deep disagreements and transferring tacit knowledge. I'm also probably the person who has spent the most time explicitly thinking about and working with CFAR's Double Crux framework. It seems good for at least some of my high level thoughts to be written up some place, even if I'm not going to go into detail about, defend, or substantiate, most of them.

The following are my own beliefs and do not necessarily represent CFAR, or anyone else.

I, of course, reserve the right to change my mind.

[Throughout I use "Double Crux" to refer to the Double Crux technique, the Double Crux class, or a Double Crux conversation, and I use "double crux" to refer to a proposition that is a shared crux for two people in a conversation.]

Here are some things I currently believe:

(General)

  1. Double Crux is one (highly important) tool/ framework among many. I want to distinguish between the the overall art of untangling and resolving deep disagreements and the Double Crux tool in particular. The Double Crux framework is maybe the most
... (read more)

People rarely change their mind when they feel like you have trapped them in some inconsistency [...] In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate [bolding mine—ZMD]

"People" in general rarely change their mind when they feel like you have trapped them in some inconsistency, but people using the double-crux method in the first place are going to be aspiring rationalists, right? Trapping someone in an inconsistency (if it's a real inconsistency and not a false perception of one) is collaborative: the thing they were thinking was flawed, and you helped them see the flaw! That's a good thing! (As it is written of the fifth virtue, "Do not believe you do others a favor if you accept their arguments; the favor is to you.")

Obviously, I agree that people should try to understand their interlocutors. (If you performatively try to find fault in something you don't understand, then apparent "faults" you find are likely to be your own misunderstandings rather than actual faults.) But if someone spots an actual inconsistency in my ideas, I want them to tell me right away. Pe

... (read more)
1Slider
I would think that inconsistencies are easier to appriciate when they are in the central machinery. A rationalist might have more load bearing on their beliefs so most beliefs are central to atleast something but I think a centrality/point-of-communication check is more upside than downside to keep. Also cognitive time spent looking for inconsistencies could be better spent on more constructive activities. Then there is the whole class of heuristics which don't even claim to be consistent. So the ability to pass by an inconsistency without hanging onto it will see use.
2ChristianKl
How about doing this a few times on video? Watching the video might not be as effective as the one-on-one teaching but I would expect that watching a few 1-on-1 explanations would be a good way to learn about the process. From a learning perspective it also helps a lot for reflecting on the technique. The early NLP folks spent a lot of time analysing tapes of people performing techniques to better understand the techniques.
2Eli Tyre
I in fact recorded a test session of attempting to teach this via Zoom last weekend. This was the first time I tried a test session via Zoom however and there were a lot of kinks to work out, so I probably won't publish that version in particular. But yeah, I'm interested in making video recordings of some of this stuff and putting up online.
2Chris_Leong
Thanks for mentioning conjugative cruxes. That was always my biggest objection to this technique. At least when I went through CFAR, the training completely ignored this possibility. It was clear that it often worked anyway, but the impression that I got was that it was the general frame which was important more than the precise methodology which at that time still seemed in need of refinement.
2DanielFilan
FYI the numbering in the (General) section is pretty off.
3Eli Tyre
What do you mean? All the numbers are in order. Are you objecting to the nested numbers?
2DanielFilan
To me, it looks like the numbers in the General section go 1, 4, 5, 5, 6, 7, 8, 9, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 2, 3, 3, 4, 2, 3, 4 (ignoring the nested numbers).
2DanielFilan
(this appears to be a problem where it displays differently on different browser/OS pairs)

Old post: RAND needed the "say oops" skill

[Epistemic status: a middling argument]

A few months ago, I wrote about how RAND, and the “Defense Intellectuals” of the cold war represent another precious datapoint of “very smart people, trying to prevent the destruction of the world, in a civilization that they acknowledge to be inadequate to dealing sanely with x-risk.”

Since then I spent some time doing additional research into what cognitive errors and mistakes  those consultants, military officials, and politicians made that endangered the world. The idea being that if we could diagnose which specific irrationalities they were subject to, that this would suggest errors that might also be relevant to contemporary x-risk mitigators, and might point out some specific areas where development of rationality training is needed.

However, this proved somewhat less fruitful than I was hoping, and I’ve put it aside for the time being. I might come back to it in the coming months.

It does seem worth sharing at least one relevant anecdote, from Daniel Ellsberg’s excellent book, the Doomsday Machine, and analysis, given that I’ve already written it up.

The missile gap

In the late nineteen-fi... (read more)

This was quite valuable to me, and I think I would be excited about seeing it as a top-level post.

3Eli Tyre
Can you say more about what you got from it?
4billzito
I can't speak for habryka, but I think your post did a great job of laying out the need for "say oops" in detail. I read the Doomsday Machine and felt this point very strongly while reading it, but this was a great reminder to me of its importance. I think "say oops" is one of the most important skills for actually working on the right thing, and that in my opinion, very few people have this skill even within the rationality community.
4Adam Scholl
There feel to me like two relevant questions here, which seem conflated in this analysis: 1) At what point did the USSR gain the ability to launch a comprehensively-destructive, undetectable-in-advance nuclear strike on the US? That is, at what point would a first strike have been achievable and effective? 2) At what point did the USSR gain the ability to launch such a first strike using ICBMs in particular? By 1960 the USSR had 1,605 nuclear warheads; there may have been few ICBMs among them, but there are other ways to deliver warheads than shooting them across continents. Planes fail the "undetectable" criteria, but ocean-adjacent cities can be blown up by small boats, and by 1960 the USSR had submarines equipped with six "short"-range (650 km and 1,300 km) ballistic missiles. By 1967 they were producing subs like this, each of which was armed with 16 missiles with ranges of 2,800-4,600 km. All of which is to say that from what I understand, RAND's fears were only a few years premature.

New post: What is mental energy?

[Note: I’ve started a research side project on this question, and it is already obvious to me that this ontology importantly wrong.]

There’s a common phenomenology of “mental energy”. For instance, if I spend a couple of hours thinking hard (maybe doing math), I find it harder to do more mental work afterwards. My thinking may be slower and less productive. And I feel tired, or drained, (mentally, instead of physically).

Mental energy is one of the primary resources that one has to allocate, in doing productive work. In almost all cases, humans have less mental energy than they have time, and therefore effective productivity is a matter of energy management, more than time management. If we want to maximize personal effectiveness, mental energy seems like an extremely important domain to understand. So what is it?

The naive story is that mental energy is an actual energy resource that one expends and then needs to recoup. That is, when one is doing cognitive work, they are burning calories, depleting their bodies energy stores. As they use energy, they have less fuel to burn.

My current understanding is that this story is not physiologically realistic. T... (read more)

6gilch
On Hypothesis 3, the brain may build up waste as a byproduct of its metabolism when it's working harder than normal, just as muscles do. Cleaning up this buildup seems to be one of the functions of sleep. Even brainless animals like jellyfish sleep. They do have neurons though.
5Gordon Seidoh Worley
I also think it's reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being "full up" and needing to allow time for memory consolidation where I can still force my attention it just doesn't take in new information vs. being unable to force the direction of attention generally.
3Eli Tyre
Yeah. I think you're on to something here. My current read is that "mental energy" is at least 3 things. Can you elaborate on the what "knowledge saturation" feels like for you?
2Gordon Seidoh Worley
Sure. It feels like my head is "full", although the felt sense is more like my head has gone from being porous and sponge-like to hard and concrete-like. When I try to read or listen to something I can feel it "bounce off" in that I can't hold the thought in memory beyond forcing it to stay in short term memory.
3Matt Goldenberg
Isn't it possible that there's some other biological sink that is time delayed from caloric energy? Like say, a very specific part of your brain needs a very specific protein, and only holds enough of that protein for 4 hours? And it can take hours to build that protein back up. This seems to me to be at least somewhat likeely.
2Ruby
Someone smart once made a case like to this to me in support of a specific substance (can't remember which) as a nootropic, though I'm a bit skeptical.
2eigen
I think about this a lot. I'm currently dangling with the fourth Hypothesis, which seems more correct to me and one where I can actually do something to ameliorate the trade-off implied by it. In this comment, I talk what it means to me and how I can do something about it, which ,in summary, is to use Anki a lot and change subjects when working memory gets overloaded. It's important to note that mathematics is sort-of different from another subjects, since concepts build on each other and you need to keep up with what all of them mean and entail, so we may be bound to reach an overload faster in that sense. A few notes about your other hypothesis: Hypothesis 1c: It's because we're not used to it. Some things come easier than other; some things are more closely similar to what we have been doing for 60000 years (math is not one of them). So we flinch from that which we are not use to. Although, adaptation is easy and the major hurdle is only at the beginning. It may also mean that the reward system is different. Is difficult to see on a piece of mathematics, as we explore it, how fulfilling it's when we know that we may not be getting anywhere. So the inherent reward is missing or has to be more artificially created. Hypothesis 1d: This seems correct to me. Consider the following: “This statement is false”. Thinking about it for a few minutes (or iterations of that statement) is quickly bound to make us flinch away in just a few seconds. How many other things take this form? I bet there are many. Instead of working to trust System 2 is it there a way to train System 1? It seems more apt to me, like training tactics in chess or to make rapid calculations. Thank you for the good post, I'd really like to further know more about your findings.
2Viliam
Seems to me that mental energy is lost by frustration. If what you are doing is fun, you can do it for a log time; if it frustrates you at every moment, you will get "tired" soon. The exact mechanism... I guess is that some part of the brain takes frustration as an evidence that this is not the right thing to do, and suggests doing something else. (Would correspond to "1b" in your model?)
2AprilSR
I’ve definitely experienced mental exhaustion from video games before - particularly when trying to do an especially difficult task.

New post: Some notes on Von Neumann, as a human being

I recently read Prisoner’s Dilemma, which half an introduction to very elementary game theory, and half a biography of John Von Neumann, and watched this old PBS documentary about the man.

I’m glad I did. Von Neumann has legendary status in my circles, as the smartest person ever to live. [1] Many times I’ve written the words “Von Neumann Level Intelligence” in a AI strategy document, or speculated about how many coordinated Von Neumanns would it take to take over the world. (For reference, I now think that 10 is far too low, mostly because he didn’t seem to have the entrepreneurial or managerial dispositions.)

Learning a little bit more about him was humanizing. Yes, he was the smartest person ever to live, but he was also an actual human being, with actual human traits.

Watching this first clip, I noticed that I was surprised by a number of thing.

  1. That VN had an accent. I had known that he was Hungarian, but somehow it had never quite propagated that he would speak with a Hungarian accent.
  2. That he was middling height (somewhat shorter than the presenter he’s talking too).
  3. The thing he is saying is the sort of thing that I would
... (read more)
3Viliam
Thank you, this is very interesting! Seems to me the most imporant lesson here is "even if you are John von Neumann, you can't take over the world alone." First, because no matter how smart you are, you will have blind spots. Second, because your time is still limited to 24 hours a day; even if you'd decide to focus on things you have been neglecting until now, you would have to start neglecting the things you have been focusing on until now. Being better at poker (converting your smartness to money more directly), living healthier and therefore on average longer, developing social skills, and being strategic in gaining power... would perhaps come at a cost of not having invented half of the stuff. When you are John von Neumann, your time has insane opportunity costs.
1Liam Donovan
Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? "My wife is Catholic" doesn't seem like very strong evidence...
3Eli Tyre
I don't know why Catholicism. I note that it does seem to be the religion of choice for former atheists, or at least for rationalists. I know of several rationalists that converted to catholicism, but none that have converted to any other religion.

TL;DR: I’m offering to help people productively have difficult conversations and resolve disagreements, for free. Feel free to email me if and when that seems helpful. elitrye [at] gmail.com

Facilitation

Over the past 4-ish years, I’ve had a side project of learning, developing, and iterating on methods for resolving tricky disagreements, and failures to communicate. A lot of this has been in the Double Crux frame, but I’ve also been exploring a number of other frameworks (including, NVC, Convergent Facilitation, Circling-inspired stuff, intuition extraction, and some home-grown methods).

As part of that, I’ve had a standing offer to facilitate / mediate tricky conversations for folks in the CFAR and MIRI spheres (testimonials below). Facilitating “real disagreements”, allows me to get feedback on my current conversational frameworks and techniques. When I encounter blockers that I don’t know how to deal with, I can go back to the drawing board to model those problems and interventions that would solve them, and iterate from there, developing new methods.

I generally like doing this kind of conversational facilitation and am open to do... (read more)

8riceissa
I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like "the quality of facilitation at that event was X percentile among all the conversation facilitation I have done".

[I wrote a much longer and more detailed comment, and then decided that I wanted to think more about it. In lieu of posting nothing, here's a short version.]

I mean I did very little facilitation one way or the other at that event, so I think my counterfactual impact was pretty minimal.

In terms of my value added, I think that one was in the bottom 5th percentile?

In terms of how useful that tiny amount of facilitation was, maybe 15 to 20th percentile? (This is a little weird, because quantity and quality are related. More active facilitation has a quality span: active (read: a lot of) facilitation can be much more helpful when it is good and much more disruptive / annoying / harmful, when it is bad, compared to less active backstop facilitation,

Overall, the conversation served the goals of the participants and had a median outcome for that kind of conversation, which is maybe 30th percentile, but there is a long right tail of positive outcomes (and maybe I am messing up how to think about percentile scores with skewed distributions).

The outcome that occured ("had an interesting conversation, and had some new thoughts / clarifications") is good but also far below the sort of outcome that I'm ussually aiming for (but often missing), of substantive, permanent (epistemic!) change to the way that one or both of the people orient on this topic.

2habryka
Looks like you dropped a sentence.
2Eli Tyre
Fixed.
1m_arj
Could you recommended the best book about this topic?
3Eli Tyre
Nope? I've gotten very little out of books in this area. It is a little afield, but strongly recommend the basic NVC book: Nonviolent Communication: A Language for Life. I recommend that at minimum, everyone read at least the first two chapters, which is something like 8 pages long, and has the most content in the book. (The rest of the book is good too, but it is mostly examples.) Also, people I trust have gotten value out of How to Have Impossible Conversations. This is still on my reading stack though (for this month, I hope), so I don't personally recommend it. My expectation, from not having read it yet, is that it will cover the basics pretty well.

(Reasonably personal)

I spend a lot of time trying to build skills, because I want to be awesome. But there is something off about that.

I think I should just go after things that I want, and solve the problems that come up on the way. The idea of building skills sort of implies that if I don't have some foundation or some skill, I'll be blocked, and won't be able to solve some thing in the way of my goals.

But that doesn't actually sound right. Like it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a "skill bank".

Raw problem solving is the real thing and skills are cruft. (Or maybe not cruft per se, but more like a side effect. The compiled residue of previous problem solving. Or like a code base from previous project that you might repurpose.)

Part of the problem with this is that I don't know what I want for my own sake, though. I want to be awesome, which in my conception, means being able to do things.

I note that wanting "to be able to do things" is a leaky sort of motivation: because the... (read more)

3Marcello
Your seemingly target-less skill-building motive isn't necessarily irrational or non-awesome. My steel-man is that you're in a hibernation period, in which you're waiting for the best opportunity of some sort (romantic, or business, or career, or other) to show up so you can execute on it. Picking a goal to focus on really hard now might well be the wrong thing to do; you might miss a golden opportunity if your nose is at the grindstone. In such a situation a good strategy would, in fact, be to spend some time cultivating skills, and some time in existential confusion (which is what I think not knowing which broad opportunities you want to pursue feels like from the inside). The other point I'd like to make is that I expect building specific skills actually is a way to increase general problem solving ability; they're not at odds. It's not that super specific skills are extremely likely to be useful directly, but that the act of constructing a skill is itself trainable and a significant part of general problem solving ability for sufficiently large problems. Also, there's lots of cross-fertilization of analogies between skills; skills aren't quite as discrete as you're thinking.
3Dagon
Skills and problem-solving are deeply related. The basics of most skills are mechanical and knowledge-based, with some generalization creeping in on your 3rd or 4th skill in terms of how to learn and seeing non-obvious crossover. Intermediate (say, after the first 500 to a few thousand hours) use of skills requires application of problem-solving within the basic capabilities of that skill. Again, you get good practice within a skill, and better across a few skills. Advanced application in many skills is MOSTLY problem-solving. How to apply your well-indexed-and-integrated knowledge to novel situations, and how to combine that knowledge across domains. I don't know of any shortcuts, though - it takes those thousands of hours to get enough knowledge and basic techniques embedded in your brain that you can intuit what avenues to more deeply explore in new applications. There is a huge amount of human variance - some people pick up some domains ludicrously easily. This is a blessing and a curse, as it causes great frustration when they hit a domain that they have to really work at. Others have to work at everything, and never get their Nobel, but still contribute a whole lot of less-transformational "just work" within the domains they work at.
2Viliam
Seems to me there is some risk either way. If you keep developing skills without applying them to a specific goal, it can be a form of procrastination (an insidious one, because it feels so virtuous). There are many skills you could develop, and life is short. On the other hand, as you said, if you go right after your goal, you may find an obstacle you can't overcome... or even worse, an obstacle you can't even properly analyze, so the problem is not merely that you don't have the necessary skill, but that you even have no idea which skill you miss (so if you try to develop the skills as needed, you may waste time developing the wrong skills, because you misunderstood the nature of the problem). It could be both. And perhaps you notice the problem-specific skills more, because those are rare. But I also kinda agree that the attitude is more important, and skills often can be acquired when needed. So... dunno, maybe there are two kinds of skills? Like, the skills with obvious application, such as "learn to play a piano"; and the world-modelling skills, such as "understand whether playing a piano would realistically help you accomplish your goals"? You can acquire the former when needed, but you need the latter in advance, to remove your blind spots? Or perhaps some skills such as "understand math" are useful in many kinds of situations and take a lot of time to learn, so you probably want to develop these in advance? (Also, if you don't know yet what to do, it probably helps to get power: learn math, develop social skills, make money... When you later make up your mind, you will likely find some of this useful.) And maybe you need the world-modelling skills before you make specific goals, because how could your goal be to learn play the piano, if you don't know the piano exists? You could have a more general goal, such as "become famous at something", but if you don't know that piano exists, maybe you wouldn't even look in this direction. Could this also be abo
2Matt Goldenberg
I've gone through something very similar. Based on your language here, it feels to me like you're in the contemplation stage along the stages of change.   So the very first thing I'd say is to not feel the desire to jump ahead and "get started on a goal right now." That's jumping ahead in the stages of change, and will likely create a relapse.  I will predict that there's a 50% chance that if you continue thinking about this without "forcing it", you'll have started in on a goal (action stage) within 3 months. Secondly, unlike some of the other responses here, I think your analysis is fairly accurate.  I've certainly found that picking up gears when I need them for my goals is better than learning them ahead of time. Now, in terms of "how to actually do it."  I'm pretty convinced that they key to getting yourself to do stuff is "Creative Tension" - creating a clear internal tension between the end state that feels good and the current state that doesn't feel as good. There are 4 ways I know to go about generating internal tension: 1. Develop a strong sense of self, and create tension between the world where you're fully expressing that self and the world where you're not. 2. Develop a strong sense of taste, and create tension between the beautiful things that could exist and what exists now. 3. Develop a strong pain, and create tension between the world where you have that pain and the world where you've solved it. 4. Develop a strong vision, and create tension between the world as it is now and the world as it would be in your vision. One especially useful trick that worked for me coming from the "just develop myself into someone awesome" place was tying the vision of the awesome person I could be with the vision of what I'd achieved - that is, in m vision of the future, including a vision of the awesome person I had to become in order to reach that future.  I then would deliberately contrast where I was now with that compelling vision/self/taste with w

I’m no longer sure that I buy dutch book arguments, in full generality, and this makes me skeptical of the "utility function" abstraction

Thesis: I now think that utility functions might be a pretty bad abstraction for thinking about the behavior of agents in general including highly capable agents.

[Epistemic status: half-baked, elucidating an intuition. Possibly what I’m saying here is just wrong, and someone will helpfully explain why.]

Over the past years, in thinking about agency and AI, I’ve taken the concept of a “utility function” for granted as the natural way to express an entity's goals or preferences. 

Of course, we know that humans don’t have well defined utility functions (they’re inconsistent, and subject to all kinds of framing effects), but that’s only because humans are irrational. To the extent that a thing acts like an agent, it’s behavior corresponds to some utility function. That utility function might not be explicitly represented, but if an agent is rational, there’s some utility function that reflects it’s preferences. 

Given this, I might be inclined to scoff at people who scoff at “blindly maximizing” AGIs. “They just don’t get it”, I might think. “T... (read more)

4Gordon Seidoh Worley
I've long been somewhat skeptical that utility functions are the right abstraction. My argument is also rather handwavy, being something like "this is the wrong abstraction for how agents actually function, so even if you can always construct a utility function and say some interesting things about its properties, it doesn't tell you the thing you need to know to understand and predict how an agent will behave". In my mind I liken it to the state of trying to code in functional programming languages on modern computers: you can do it, but you're also fighting an uphill battle against the way the computer is physically implemented, so don't be surprised if things get confusing. And much like in the utility function case, people still program in functional languages because of the benefits they confer. I think the same is true of utility functions: they confer some big benefits when trying to reason about certain problems, so we accept the tradeoffs of using them. I think that's fine so long as we have a morphism to other abstractions that will work better for understanding the things that utility functions obscure.
2JBlack
Utility functions are especially problematic in modeling behaviour for agents with bounded rationality, or those where there are costs of reasoning. These include every physically realizable agent. For modelling human behaviour, even considering the ideals of what we would like human behaviour to achieve, there are even worse problems. We can hope that there is some utility function consistent with the behaviour we're modelling and just ignore cases where there isn't, but that doesn't seem satisfactory either.
2Pattern
'Or you will leave money on the table.' You rotated 'different' and 'between'. (Or a serious of rotations isomorphic to such.)

New post: The Basic Double Crux Pattern

[This is a draft, to be posted on LessWrong soon.]

I’ve spent a lot of time developing tools and frameworks for bridging "intractable" disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work on it.

People often express to me something to the effect, “The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.”

I half agree with that sentiment. I do think that those low level cognitive and conversational patterns are the most important thing, and at Double Crux trainings that I have run, most of the time is spent focusing on specific exercises to instill those low level TAPs.

However, I don’t think that the only value of the Double Crux schema is in training those low level habits. Double cruxes are extremely powerful machines that allow one to identify, if not the most efficient conversational path, a very high efficiency conversationa... (read more)

Eliezer claims that dath ilani never give in to threats. But I'm not sure I buy it.

The only reason people will make threats against you, the argument goes, is if those people expect that you might give in. If you have an iron-clad policy against acting in response to threats made against you, then there's no point in making or enforcing the threats in the first place. There's no reason for the threatener to bother, so they don't. Which means in some sufficiently long run, refusing to submit to threats means you're not subject to threats.

This seems a bit fishy to me. I have a lingering suspicion that this argument doesn't apply, or at least doesn't apply universally, in the real world.

I'm thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits, such as the soldiers of the local government. The bandits say "give us half your harvest, or we'll just kill you."

The argument above depends on a claim about the cost of executing on a threat. "There's no reason to bother" implies that the threatener has a preference not to bother, if they know that the t... (read more)

[-]aphyer161

Eliezer, this is what you get for not writing up the planecrash threat lecture thread.  We'll keep bothering you with things like this until you give in to our threats and write it.

What you’ve hit upon is “BATNA,” or “Best alternative to a negotiated agreement.” Because the robbers can get what they want by just killing the farmers, the dath ilani will give in- and from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.

This may not be crazy- it reminds me of the Ancient Greek social mores around hospitality, which seem insanely generous to a modern reader but I guess make sense if the equilibrium number of roving <s>bandits</s> honored guests is kept low by some other force

2Eli Tyre
This seems like it weakens the "don't give into threats" policy substantially, because it makes it much harder to tell what's a threat-in-the-technical-sense, and the incentives push of exaggeration and dishonesty about what is or isn't a threat-in-the-the-technical-sense. The bandits should always act as if they're willing to kill the farmers and take their stuff, even if they're bluffing about their willingness to do violence. The farmers need to estimate whether the bandits are bluffing, and either call the bluff, or submit to the demand-which-is-not-technically-a-threat. That policy has notably more complexity than just "don't give in to threats."
2kave
What is the "don't give in to threats" policy that this is more complex than? In particular, what are 'threats'?
1Eli Tyre
"Anytime someone credibly demands that you do X, otherwise they'll do Y to you, you should not do X." This is a simple reading of the "don't give into threats" policy.
2kave
What are the semantics of "otherwise"? Are they more like: * X otherwise Y ↦ X → ¬Y, or * X otherwise Y ↦ X ↔ ¬Y
2kave
Presumably you also want the policy to include that you don't want "Y" and weren't going to do "X" anyway?
2Eli Tyre
Yes, to the first part, probably yes to the second part.
1Hastings
With a grain of salt, There’s a sort of quiet assumption that should be louder about the dath Ilan fiction: which is that it’s about a world where a bunch of theorems like “as systems of agents get sufficiently intelligent, they gain the ability to coordinate in prisoner’s dilemma like problems” have proofs. You could similarly write fiction set in a world where P=NP has a proof and all of cryptography collapses. I’m not sure whether EY would guess that sufficiently intelligent agents actually coordinate- Just like I could write the P=NP fiction while being pretty sure that P/=NP
2Eli Tyre
Huh, the idea that Greek guest-friendship was a adaption to warriors who would otherwise kill you and take your stuff is something that I had never considered before. Isn't it generally depicited as a relationship between nobles who, presumably, would be able to repel roving bandits?
4Vladimir_Nesov
Threateners similarly can employ bindings, always enforcing regardless of local cost. A binding has an overall cost from following it in all relevant situations, costs in individual situations are what goes into estimating this overall cost, but individually they are not decision relevant, when deciding whether to commit to a global binding. In this case opposing commitments effectively result in global enmity (threateners always enforce, targets never give in to threats), so if targets are collectively stronger than threateners, then threateners lose. But this collective strength (for the winning side) or vulnerability (for the losing side) is only channeled through targets or threateners who join their respective binding. If few people join, the faction is weak and loses.
2RobertM
But threateners don't want want to follow that policy, since in the resulting equilibrium they're wasting a lot of their own resources.
2Vladimir_Nesov
The equilibrium depends on which faction is stronger. Threateners who don't always enforce and targets who don't always ignore threats are not parts of this game, so it's not even about relative positions of threateners and targets, only those that commit are relevant. If the threateners win, targets start mostly giving in to threats, and so for threateners the cost of binding becomes low overall.
2RobertM
I'm talking about the equilibrium where targets are following their "don't give in to threats" policy.  Threateners don't want to follow a policy of always executing threats in that world - really, they'd probably prefer to never make any threats in that world, since it's strictly negative EV for them.
2Vladimir_Nesov
If the unyielding targets faction is stronger, the equilibrium is bad for committed enforcers. If the committed enforcer faction is stronger, the equilibrium doesn't retain high cost of enforcement, and in that world the targets similarly wouldn't prefer to be unyielding. I think the toy model where that fails leaves the winning enforcers with no pie, but that depends on enforcers not making use of their victory to set up systems for keeping targets relatively defenseless, taking the pie even without their consent. This would no longer be the same game ("it's not a threat"), but it's not a losing equilibrium for committed enforcers of the preceding game either.
3Multicore
This distinction of which demands are or aren't decision-theoretic threats that rational agents shouldn't give in to is a major theme of the last ~quarter of Planecrash (enormous spoilers in the spoiler text). This theme is brought up many times, but there's not one comprehensive explanation to link to. (The parable of the little bird is the closest I can think of.)
2RHollerith
The assertion IIUC is not that it never makes sense for anyone to give in to a threat -- that would clearly be an untrue assertion -- but rather that it is possible for a society to reach a level of internal coordination where it starts to make sense to adopt a categorical policy of never giving in to a threat. That would mean for example that any society member that wants to live in dath ilan's equivalent of an isolated farm would probably need to formally and publicly relinquish their citizenship to maintain dath ilan's reputation for never giving in to a threat. Or dath ilan would make it very clear that they must not give in to any threats, and if they do and dath ilan finds out, then dath ilan will be the one that slaughters the whole family. The latter policy is a lot like how men's prisons work at least in the US whereby the inmates are organized into groups (usually based on race or gang affiliation) and if anyone even hints (where others can hear) that you might give in to sexual extortion, you need to respond with violence because if you don't, your own group (the main purpose of which is mutual protection from the members of the other groups) will beat you up. That got a little grim. Should I add a trigger warning? Should I hide the grim parts behind a spoiler tag thingie?
2quetzal_rainbow
Bandits have obvious cost: if they kill all farmers, from whom are they going to take stuff?
2Eli Tyre
That's not a cost.  At worst, all the farmers will relentlessly fight to the death, in that case the bandits get one year of food and have to figure something else out next year.  That outcome strictly dominates not stealing any food this year, and needing to figure out something else out both this year and next year.
-13David Hornbein

Old post: A mechanistic description of status

[This is an essay that I’ve had bopping around in my head for a long time. I’m not sure if this says anything usefully new-but it might click with some folks. If you haven’t read Social Status: Down the Rabbit Hole on Kevin Simler’s excellent blog, Melting Asphalt read that first. I think this is pretty bad and needs to be rewritten and maybe expanded substantially, but this blog is called “musings and rough drafts.”]

In this post, I’m going to outline how I think about status. In particular, I want to give a mechanistic account of how status necessarily arises, given some set of axioms, in much the same way one can show that evolution by natural selection must necessarily occur given the axioms of 1) inheritance of traits 2) variance in reproductive success based on variance in traits and 3) mutation.

(I am not claiming any particular skill at navigating status relationships, any more than a student of sports-biology is necessarily a skilled basketball player.)

By “status” I mean prestige-status.

Axiom 1: People have goals.

That is, for any given human, there are some things that they want. This can include just about anything. You might wan... (read more)

4Kaj_Sotala
Related: The red paperclip theory of status describes status as a form of optimization power, specifically one that can be used to influence a group.
4Raemon
(it says "more stuff here" but links to your overall blog, not sure if that meant to be a link to a specific post)

I've offered to be a point person for folks who believe that they were severely impacted by Leverage 1.0, and have related information, but who might be unwilling to share that info, for any of a number of reasons. 

In short,

  • If someone wants to tell me private meta-level information (such as "I don't want to talk about my experience publicly because X"), so that I can pass along in an anonymized way to someone else (including Geoff, Matt Fallshaw, Oliver Habryka, or others) - I'm up for doing that.
    • In this case, I'm willing to keep info non-public (ie not publish it on the internet), and anonymized, but am reluctant to keep it secret (ie pretend that I don't have any information bearing on the topic).
      • For instance, let's say someone tells me that they are afraid to publish their account due to a fear of being sued.
      • If later, as a part of this whole process, some third party asks "is there anyone who isn't speaking out of a fear of legal repercussions?", I would respond "yes, without going into the details, one of the people that I spoke to said that", unless my saying that would uniquely identify the person I spoke to.
      • If someone asked me point-blank "is it Y-person who is afraid o
... (read more)

So it seems like one way that the world could go is:

  • China develops a domestic semiconductor fab industry that's not at the cutting edge, but close, so that it's less dependent on Taiwan's TSMC
     
  • China invades Taiwan, destroying TSMC, ending up with a compute advantage over the US, which translates into a military advantage
     
  • (which might or might not actually be leveraged in a hot war).

I could imagine China building a competent domestic chip industry. China seems more determined to do that than the US is.

Though notably, China is not on track to do that currently. It's not anywhere close to it's goal producing 70% it's chips, by 2025.

And if the US was serious about building a domestic cutting-edge chip industry again, could it? I basically don't think that American work culture can keep up with Taiwanese/TSMC work culture, in this super-competitive industry.

TSMC is building fabs in the US, but from what I hear, they're not going well.

(While TSMC is a Taiwanese company, having a large fraction of TSMC fabs in in the US would preement the scenario above. TSMC fabs in the US counts as "a domestic US chip industry.")

Building and running leading node fabs is just a really really hard thing to do.

I guess the most likely status scenario is the continuation of the status quo where China and the US continue to both awkwardly depend on TSMC's chips for crucial military and economic AI tech.

5O O
Hold on. The TSMC Arizona fab is actually ahead of schedule. They were simply waiting for funds. I believe TSMC’s edge is largely cheap labor. https://www.tweaktown.com/news/97293/tsmc-to-begin-pilot-program-at-its-arizona-usa-fab-plant-for-mass-production-by-end-of-2024/index.html
2Eli Tyre
I'm not that confident about how the Arizona fab is going. I've mostly heard second hand accounts. I'm very confident that TSMC's edge is more than cheap labor. It would be basically impossible for another country, even one with low median wages, to replicate TSMC. Singapore and China have both tried, and can't compete. At this point in time, TSMC has a basically insurmountable human capital and institutional capital advantage, that enables it to produce leading node chips that no other company in the world can produce. Samsung will catch up, sure. But by the time they catch up to the TSMC's 2024 state of the art, TSMC will have moved on to the next node. My understanding is that, short of TSMC being destroyed by war with mainland China, or some similar disaster, it's not feasible for any company to catch up with TSMC within the next 10 years, at least.
1O O
So, from their site "TSMC Arizona’s first fab is on track to begin production leveraging 4nm technology in first half of 2025." You are probably thinking of their other Arizona fabs. Those are indeed delayed. However, they cite "funding" as the issue.[1] Based on how quickly TSMC changed tune on delays once they got Chips funding, I think it's largely artificial, and a means to extract CHIPS money.  They have cumulative investments over the years, but based on accounts of Americans who have worked there, they don't sound extremely advanced. Instead they sound very hard working, which gives them a strong ability to execute. Also, I still think these delays are somewhat artificial. There are natsec concerns for Taiwan to let TSMC diversify, and TSMC seems to think it can wring a lot of money out of the US by holding up construction. They are, after all, a monopoly. Is Samsung 5 generations behind? I know that nanometers don't really mean anything anymore, but TSMC and Samsung's 4 nm don't seem 10 years apart based on the tidbits I get online.  1. ^ Liu said construction on the shell of the factory had begun, but the Taiwanese chipmaking titan needed to review “how much incentives … the US government can provide.”  
2Eli Tyre
I'm not claiming they're 10 years behind. My understanding from talking with people is that TSMC is around 2 to 3 years behind TSMC. My claim is that Samsung and TSMC are advancing at ~the same rate, so Samsung can't close that 2 to 3 year gap.
1O O
Oh yeah I agree. Misread that. Still, maybe not so confident. Market leaders often don’t last. Competition always catches up.
3davekasten
As you note, TSMC is building fabs in the US (and Europe) to reduce this risk. I also think that it's worth noting that, at least in the short run, if the US didn't have shipments of new chips and was at war, the US government would just use wartime powers to take existing GPUs from whichever companies they felt weren't using them optimally for war and give them to the companies (or US Govt labs) that are.   Plus, are you really gonna bet that the intelligence community and DoD and DoE don't have a HUUUUGE stack of H100s? I sure wouldn't take that action.
4Eli Tyre
What, just sitting in a warehouse?  I would bet that the government's supply of GPUs is notably smaller than that of Google and Microsoft. 
1davekasten
I meant more "already in a data center," though probably some in a warehouse, too. I roll to disbelieve that the people who read Hacker News in Ft. Meade, MD and have giant budgets aren't making some of the same decisions that people who read Hacker News in Palo Alto, CA and Redmond, WA would.   
2Eli Tyre
I don't think the budgets are comparable. I read recently that Intel's R&D budget in the 2010s was 3x bigger than all of DARPA.
2davekasten
No clue if true, but even if true, but DARPA is not at all a comparable to Intel.  Entity set up for very different purposes and engaging in very different patterns of capital investment. Also very unclear to me why R&D is relevant bucket.  Presumably buying GPUs is either capex or if rented, is recognized under a different opex bucket (for secure cloud services) than R&D ?  My claim isn't that the USG is like running its own research and fabs at equivalent levels of capability to Intel or TSMC.  It's just that if a war starts, it has access to plenty of GPUs through its own capacity and its ability to mandate borrowing of hardware at scale from the private sector.    
0ChristianKl
When I look at the current US government it does not seem to be able to just take whatever they want from big companies with powerful lobbyists. 
2O O
Wartime powers let governments do whatever they want essentially. Even recently Biden has flexed the defense production act. https://www.defense.gov/News/Feature-Stories/story/article/2128446/during-wwii-industries-transitioned-from-peacetime-to-wartime-production/
4ChristianKl
Did he do it in a way that hurt the bottom line of any powerful US company? No, I don't think so. While the same power that existed in WWII still exist on paper today, the US government is much less capable to take actions. 
2Seth Herd
We're not at war. If we were in a war with real stakes, I'd expect to see those powers used much more aggressively.
1O O
This makes no sense. Wars are typically existential. In a hot war with another state, why would the government not use all of industrial capacity that is more useful to make weapons to make weapons. It’s well documented that governments can repurpose unnecessary parts of industry (say training Grok or an open source chatbot) into whatever else. Biden used them for largely irrelevant reasons. This indicates that with an actual war, usage would be wider and more extensive.
2ozziegooen
I'd flag that I think it's very possible TSMC will be very much hurt/destroyed if China is in control. There's been a bit of discussion of this. I'd suspect China might fix this after some years, but would expect it would be tough for a while.  https://news.ycombinator.com/item?id=40426843
2Eli Tyre
You mean if they're in control of Taiwan? Yes, the US would destroy it on the way out.
2ozziegooen
Yea

Something that I've been thinking about lately is the possibility of an agent's values being partially encoded by the constraints of that agent's natural environment, or arising from the interaction between the agent and environment.

That is, an agent's environment puts constraints on the agent. From one perspective removing those constraints is always good, because it lets the agent get more of what it wants. But sometimes from a different perspective, we might feel that with those constraints removed, the agent goodhearts or wire-heads, or otherwise fails to actualize its "true" values.

The Generator freed from the oppression of the Discriminator

As a metaphor: if I'm one half of a GAN, let's say the generator, then in one sense my "values" are fooling the discriminator, and if you make me relatively more powerful than my discriminator, and I dominate it...I'm loving it, and also no longer making good images.

But you might also say, "No, wait. That is a super-stimulus, and actually what you value is making good images, but half of that value was encoded in your partner."

This second perspective seems a little stupid to me. A little too Aristotelian. I mean if we're going to take that ... (read more)

2Eli Tyre
Side note, which is not my main point: I think this also has something to do with what meditation and psychedelics do to people, which was recently up for discussion on Duncan's Facebook. I bet that mediation is actually a way to repair psychblocks and trauma and what-not. But if you do that enough, and you remove all the psych constraints...a person might sort of become so relaxed that they become less and less of an agent. I'm a lot less sure of this part.

[Real short post. Random. Complete speculation.]

Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.

I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.

But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.

7jimrandomh
A slightly different spin on this model: it's not about the types of strategies people generate, but the number. If you think about something and only come up with one strategy, you'll do it without hesitation; if you generate three strategies, you'll pause to think about which is the right one. So people who can't come up with as many strategies are impulsive.
1Eli Tyre
This seems that it might be testable. If you force impulsive folk to wait and think, do they generate more ideas for how to proceed?
1David Scott Krueger (formerly: capybaralet)
This reminded me of the argument that superintelligent agents will be very good at coordinating and just divvy of the multiverse and be done with it. It would be interesting to do an experimental study of how the intelligence profile of a population influences the level of cooperation between them.
2Eli Tyre
I think that's what the book referenced here, is about.

new post: Metacognitive space


[Part of my Psychological Principles of Personal Productivity, which I am writing mostly in my Roam, now.]

Metacognitive space is a term of art that refers to a particular first person state / experience. In particular it refers to my propensity to be reflective about my urges and deliberate about the use of my resources.

I think it might literally be having the broader context of my life, including my goals and values, and my personal resource constraints loaded up in peripheral awareness.

Metacognitive space allows me to notice aversions and flinches, and take them as object, so that I can respond to them with Focusing or dialogue, instead of being swept around by them. Similarly, it seems to, in practice, to reduce my propensity to act on immediate urges and temptations.

[Having MCS is the opposite of being [[{Urge-y-ness | reactivity | compulsiveness}]]?]

It allows me to “absorb” and respond to happenings in my environment, including problems and opportunities, taking considered instead of semi-automatic, first response that occurred to me, action. [That sentence there feels a little fake, or maybe about something else, or may... (read more)

In this interview, Eliezer says the following:

I think if you push anything [referring to AI systems] far enough, especially on anything remotely like the current paradigms, like if you make it capable enough, the way it gets that capable is by starting to be general. 

And at the same sort of point where it starts to be general, it will start to have it's own internal preferences, because that is how you get to be general. You don't become creative and able to solve lots and lots of problems without something inside you that organizes your problem solvi

... (read more)
4Adele Lopez
In my view, this is where the Omohundro Drives come into play. Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference. Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference. This seems possible to me. Humans have plenty of text in which we generate new abstractions/hypotheses, and so effective next-token prediction would necessitate forming a model of that process. Once the AI has human-level ability to create new abstractions, it could then simulate experiments (via e.g. its ability to predict python code outputs) and cross-examine the results with its own knowledge to adjust them and pick out the best ones.
4bideup
Sorry, what's the difference between these two positions? Is the second one meant to be a more extreme version of the first?
2Eli Tyre
Yes.
2Steven Byrnes
In Section 1 of this post I make an argument kinda similar to the one you’re attributing to Eliezer. That might or might not help you, I dunno, just wanted to share.

Does anyone know of a good technical overview of why it seems hard to get Whole Brain Emulations before we get neuromorphic AGI?

I think maybe I read a PDF that made this case years ago, but I don't know where.

4Steven Byrnes
I haven't seen such a document but I'd be interested to read it too. I made an argument to that effect here: https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than (Well, a related argument anyway. WBE is about scanning and simulating the brain rather than understanding it, but I would make a similar argument using "hard-to-scan" and/or "hard-to-simulate" things the brain does, rather than "hard-understand" things the brain does, which is what I was nominally blogging about. There's a lot of overlap between those anyway; the examples I put in mostly work for both.)
2Eli Tyre
Great. This post is exactly the sort of thing that I was thinking about.

There’s a psychological variable that seems to be able to change on different timescales, in me, at least. I want to gesture at it, and see if anyone can give me pointers to related resources.

[Hopefully this is super basic.]

There a set of states that I occasionally fall into that include what I call “reactive” (meaning that I respond compulsively to the things around me), and what I call “urgy” (meaning that that I feel a sort of “graspy” desire for some kind of immediate gratification).

These states all have... (read more)

2Matt Goldenberg
I remembered there was a set of audios from Eben Pagan that really helped me before I turned them into the 9 breaths technique. Just emailed them to you. They go a bit more into depth and you may find them useful.
2Matt Goldenberg
I don't know if this is what you're looking for, but I've heard the variable you're pointing at referred to as your level of groundedness, centeredness, and stillness in the self-help space.   There are all sorts of meditations, visualizations, and exercises aimed to make you more grounded/centered/still and a quick google search pulls up a bunch. One I teach is called the 9 breaths technique. Here's another.

new (boring) post on controlled actions.

1rk
This link (and the one for "Why do we fear the twinge of starting?") is broken (I think it's an admin view?). (Correct link)
1Eli Tyre
They should both be fixed now. Thanks!
6Raemon
Thanks! I just read through a few of your most recent posts and found them all real useful.
5Eli Tyre
Cool! I'd be glad to hear more. I don't have much of a sense of which thing I write are useful or how.
2Hazard
Relating to the "Perception of Progress" bit at the end. I can confirm for a handful of physical skills I practice there can be a big disconnect between Perception of Progress and Progress from a given session. Sometimes this looks like working on a piece of sleight of hand, it feeling weird and awkward, and the next day suddenly I'm a lot better at it, much more than I was at any point in the previous days practice. I've got a hazy memory of a breakdancer blogging about how a particular shade of "no progress fumbling" can be a signal that a certain about of "unlearning" is happening, though I can't find the source to vet it.

I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with them.

This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.

There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.

Preparadigmatic science

There are a number ... (read more)

New (short) post: Desires vs. Reflexes

[Epistemic status: a quick thought that I had a minute ago.]

There are goals / desires (I want to have sex, I want to stop working, I want to eat ice cream) and there are reflexes (anger, “wasted motions”, complaining about a problem, etc.).

If you try and squash goals / desires, they will often (not always?) resurface around the side, or find some way to get met. (Why not always? What are the difference between those that do and those that don’t?) You need to bargain with them, or design outlet poli... (read more)

3eigen
I'm interested about knowing more about the meditation aspect and how it relates to productivity!
2Matt Goldenberg
I'm currently running a pilot program that takes a very similar psychological slant on productivity and procrastination, and planning to write a sequence starting in the next week or so. It covers a lot of the same subjects, including habits, ambiguity or overwhelm aversion, coercion aversion, and creating good relationships with parts. Maybe we should chat!

Totally an experiment, I'm trying out posting my raw notes from a personal review / theorizing session, in my short form. I'd be glad to hear people's thoughts.

This is written for me, straight out of my personal Roam repository. The formatting is a little messed up because LessWrong's bullet don't support indefinite levels of nesting.

This one is about Urge-y-ness / reactivity / compulsiveness

  • I don't know if I'm naming this right. I think I might be lumping categories together.
  • Let's start with what I know:
    • There are th
... (read more)

New post: Some musings about exercise and time discount rates

[Epistemic status: a half-thought, which I started on earlier today, and which might or might not be a full thought by the time I finish writing this post.]

I’ve long counted exercise as an important component of my overall productivity and functionality. But over the past months my exercise habit has slipped some, without apparent detriment to my focus or productivity. But this week, after coming back from a workshop, my focus and productivity haven’t really booted up.

Her... (read more)

2Viliam
Alternative hypothesis: maybe what expands your time horizon is not exercise and meditation per se, but the fact that you are doing several different things (work, meditation, exercise), instead of doing the same thing over and over again (work). It probably also helps that the different activities use different muscles, so that they feel completely different. This hypothesis predicts that a combination of e.g. work, walking, and painting, could provide similar benefits compared to work only.
2Eli Tyre
Well, my working is often pretty varied, while my "being distracted" is pretty monotonous (watching youtube clips), so I don't think it is this one.

New post: Capability testing as a pseudo fire alarm

[epistemic status: a thought I had]

It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.

I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you... (read more)

In There’s No Fire Alarm for Artificial General Intelligence Eliezer argues:

A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.

If I have a predetermined set of tests, this could serve as a fire alarm, but only if you've successfully built a consensus that it is one. This is hard, and the consensus would need to be quite strong. To avoid ambiguity, the test itself would need to be demonstrably resistant to being clever Hans'ed. Otherwise it would be just another milestone.

3Eli Tyre
I very much agree.

Sometime people talk about advanced AIs "boiling the oceans". My impression is that there's some specific model for why that is plausible outcome (something about energy and heat dispensation?), and it's not just a random "big change."

What is that model? Is there existing citations for the idea, including LessWrong posts?

Roughly, Earth average temperature:

Where j is dissipating power per area and sigma is Stephan-Boltzmann constant.

We can estimate j as 

Where  is a solar constant 1361 W/m^2. We take all incoming power and divide it by Earth surface area. Earth albedo is 0.31.

After substitution of variables, we get Earth temperature 254K (-19C), because we ignore greenhouse effect here.

How much humanity power consumption contributes to direct warming? In 2023 Earth energy consumption was 620 exajoules (source: first link in Google), which is 19TW. Modified rough estimation of Earth temperature is:

Human power production per square meter is, like, 0.04W/m^2, which gives us approximately zero effect of direct Earth heating on Earth temperature. But what happens if we, say, increase power by factor x1000? We are going to get increase of Earth temperature to 264K, by 10K, again, we are ignoring greenhouse effect. But qualitatively, increasing power consumption x1000 is likely to screw the biosphere really hard, if we count increasing amount of water vapor, CO2 from water and methane from melting permafrost.

How is it realistic to... (read more)

5Thomas Kwa
The power density of nanotech is extremely high (10 kW/kg), so it only takes 16 kilograms of active nanotech per person * 10 billion people to generate enough waste heat to melt the polar ice caps. Literally boiling the oceans should only be a couple more orders of magnitude, so it's well within possible energy demand if the AIs can generate enough energy. But I think it's unlikely they would want to. Source: http://www.imm.org/Reports/rep054.pdf
5ryan_greenblatt
I don't know of an existing citation. My understanding is that here is enough energy generable via fusion that if you did as much fusion as possible on earth, the oceans would boil. Or more minimally, earth would be uninhabitable by humans living as they currently do. I think this holds even if you just fuse lighter elements which are relatively easy to fuse. (As in, just fusing hydrogen.) Of course, it would be possible to avoid doing this on earth and instead go straight to a dyson swarm or similar. And, it might be possible to dissipate all the heat away from earth though this seems hard and not what would happen in the most efficient approach from my understanding. I think if you want to advance energy/compute production as fast as possible, boiling the oceans makes sense for a technologically mature civilization. However, I expect that boiling the oceans advances progress by no more than several years and possibly much, much less than that (e.g. days or hours) depending on how quickly you can build a dyson sphere and an industrial base in space. My current median guess would be that it saves virtually no time (several days), but a few months seems plausible. Overall, I currently expect the oceans to not be boiled because: * It saves only a tiny amount of time (less than several years, probably much less). So, this is only very important if you are in an conflict or you are very ambitious in resource usage and not patient. * Probably humans will care some about not having the oceans boiled and I expect human preferences to get some weight even conditional on AI takeover. * I expect that you'll have world peace (no conflict) by the time you have ocean boiling technology due to improved coordination/negotiation/commitment technology.
4the gears to ascension
Build enough nuclear power plants and we could boil the oceans with current tech, yeah? They're a significant fraction of fusion output iiuc?
2Thomas Kwa
Not quite, there is a finite quantity of fissiles. IIRC it's only an order of magnitude of energy more than fossil fuel reserves.

How do you use a correlation coefficient to do a Bayesian update?

For instance, the wikipedia page on the Heritability of IQ reads:

"The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15."

I'd like to get an intuitive sense of what those quantities actually mean, "how big" they are, how impressed I should be with them.

I imagine I would do that by working out a series of examples. Examples like...

If I know that Alice has has an IQ of 120, what does that tell me about th... (read more)

1JBlack
In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable. Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I've emphasized some of the words indicating stronger assumptions. You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: "The model is adequate for predicting reality". Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters - that is the hypotheses "A and (specific parameter values)" - and also various alternative "assumption failure" hypotheses. In the given example, we would very quickly find overwhelming evidence for "the noise is not independent", and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models.   This hasn't actually answered your original question "what does that tell me about the IQ of her twin sister Beth?", because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on mor

I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn't make them super-human.

Does anyone remember what I'm talking about?

2riceissa
Maybe this? (There are a few subthreads on that post that mention linear regression.)

Question: Have Moral Mazes been getting worse over time? 

Could the growth of Moral Mazes be the cause of cost disease? 

I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how "mazy" an organization is. 

I considered the metric of "how much output for each input", but 1) that metric is just cost disease itself, so it doesn't help us distinguish the mazy cause from other possible causes, 2) If you're good enough at rent seeking maybe you can get high revenue despite you poor production. 

What metric could we use?

6Raemon
This is still a bit superficial/goodharty, but I think "number of layers of hierarchy" is at least one thing to look at. (Maybe find pairs of companies that output comparable products that you're somehow able to measure the inputs and outputs of, and see if layers of management correlate with cost disease)

This is my current take about where we're at in the world:

Deep learning, scaled up, might be basically enough to get AGI. There might be some additional conceptual work necessary, but the main difference between 2020 and the year in which we have transformative AI is that in that year, the models are much bigger.

If this is the case, then the most urgent problem is strong AI alignment + wise deployment of strong AI.

We'll know if this is the case in the next 10 years or so, because either we'll continue to see incredible gains from increasingly bigger Deep L... (read more)

1niplav
(This question is only related to a small point) You write that one possible foundational strategy could be to "radically detraumatize large fractions of the population". Do you believe that 1. A large part of the population is traumatized 2. That trauma is reversible 3. Removing/reversing that trauma would improve the development of humanity drastically? If yes, why? I'm happy to get a 1k page PDF thrown at me. I know that this has been a relatively popular talking point on twitter, but without a canonical resource, and I also haven't seen it discussed on LW.
6Eli Tyre
I was wondering if I would get comment on that part in particular. ; ) I don't have a strong belief about your points one through three, currently. But it is an important hypothesis in my hypothesis space, and I'm hoping that I can get to the bottom of it in the next year or two. I do confidently think that one of the "forces for badness" in the world is that people regularly feel triggered or threatened by all kinds of different proposals, reflexively act to defend themselves. I think this is among the top three problems in having good discourse and cooperative politics. Systematically reducing that trigger response would be super high value, if it were feasible. My best guess is that that propensity to be triggered is not mostly the result of infant or childhood trauma. It seems more parsimonious to posit that it is basic tribal stuff. But I could imagine it having its root in something like "trauma" (meaning it is the result of specific experiences, not just general dispositions, and it is practically feasible, if difficult, to clear or heal the underlying problem in a way completely prevents the symptoms). I think there is no canonical resource on trauma-stuff because 1) the people on twitter are less interested on average, in that kind of theory building than we are on lesswong and 2) because mostly those people are (I think) extrapolating from their own experience, in which some practices unlocked subjectively huge breakthroughs in personal well-being / freedom of thought and action. Does that help at all?
2Hazard
I plan to blog more about how I understand some of these trigger states and how it relates to trauma. I do think there's a decent amount of written work, not sure how "canonical", but I've read some great stuff that from sources I'm surprised I haven't heard more hype about. The most useful stuff I've read so far is the first three chapters of this book. It has hugely sharpened my thinking. I agree that a lot of trauma discourse on our chunk of twitter is more for used on the personal experience/transformation side, and doesn't let itself well to bigger Theory of Change type scheming. http://www.traumaandnonviolence.com/chapter1.html
2Eli Tyre
Thanks for the link! I'm going to take a look!
1niplav
Yes, it definitely does–you just created the resource I will will link people to. Thank you! Especially the third paragraph is cruxy. As far as I can tell, there are many people who have (to some extent) defused this propensity to get triggered for themselves. At least for me, LW was a resource to achieve that.

I was thinking lately about how there are some different classes of models of psychological change, and I thought I would outline them and see where that leads me. 

It turns out it led me into a question about where and when Parts-based vs. Association-based models are applicable.

Google Doc version.

Parts-based / agent-based models 

Some examples: 

  • Focusing
  • IFS
  • IDC
  • Connection Theory
  • The NLP ecological check

This is the frame that I make the most use of, in my personal practice. It assumes that all behavior is the result of some goal directed subproce... (read more)

6Raemon
I like this a lot, and think it’d make a good top level post. 
2Eli Tyre
Really? I would prefer to have something much more developed and/or to have solved my key puzzle here before I put as a top level post.
2Raemon
I saw the post more as giving me a framework that was helping for sorting various psych models, and the fact that you had one question about it didn't actually feel too central for my own reading. (Separately, I think it's basically fine for posts to be framed as questions rather than definitive statements/arguments after you've finished your thinking)
4Viliam
I wonder how the ancient schools of psychotherapy would fit here. Psychoanalysis is parts-based. Behaviorism is association-based. Rational therapy seems narrative-based. What about Rogers or Maslow? Seems to me that Rogers and the "think about it seriously for 5 minutes" technique should be in the same category. In both cases, the goal is to let the client actually think about the problem and find the solution for themselves. Not sure if this is or isn't an example of narrative-based, except the client is supposed to find the narrative themselves. Maslow comes with a supposed universal model of human desires and lets you find yourself in that system. Jung kinda does the same, but with a mythological model. Sounds like an externally provided narrative. Dunno, maybe the narrative-based should be split into more subgroups, depending on where the narrative comes from (a universal model, an ad-hoc model provided by the therapist, an ad-hoc model constructed by the client)?
2ChristianKl
The way I have been taught NLP, you usually don't use either anchors or an ecological check but both.  Behavior changes that are created by changing around anchors are not long-term stable when they violate ecology.  Changing around associations allows to create new strategies in a more detailed way then you get by just doing parts work and I have the impression that it's often faster in creating new strategies.  (A) Interventions that are about resolving traumas feel to me like a different model.  (B) None of the three models you listed address the usefulness of connecting with the felt sense of emotions.  (C) There's a model of change where you create a setting where people can have new behavioral experiences and then hopefully learn from those experiences and integrate what they learned in their lives.  CFAR's goal of wanting to give people more agency about ways they think seems to work through C where CFAR wants to expose people to a bunch of experiences where people actually feel new ways to affect their thinking.  In the Danis Bois method both A and C are central.

Can someone affiliated with a university, ect. get me a PDF of this paper?

https://psycnet.apa.org/buy/1929-00104-001

It is on Scihub, but that version is missing a few pages in which they describe the methodology.

[I hope this isn't an abuse of LessWrong.]

3romeostevensit
time for a new instance of this? https://www.lesswrong.com/posts/4sAsygakd4oCpbEKs/lesswrong-help-desk-free-paper-downloads-and-more-2014
0Raemon
I edited the image into the comment box, predicting that the reason you didn't was because you didn't know you could (using markdown). Apologies if you prefer it not to be here (and can edit it back if so)

In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.

I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of... not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.

That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.

4Raemon
Understood, and apologies. A fairly common mod practice has been to fix typos and stuff in a sort of "move first and then ask if it was okay" thing. (I'm not confident this is the best policy, but it saves time/friction, and meanwhile I don't think anyone had had an issue with it). But, your preference definitely makes sense and if others felt the same I'd reconsider the overall policy. (It's also the case that adding an image is a bit of a larger change than the usual typo fixing, and may have been more of an overstep of bounds) In any case I definitely won't edit your stuff again without express permission.
1Eli Tyre
Cool. : )
4Wei Dai
If it's not just you, it's at least pretty rare. I've seen the mods "helpfully" edit posts several times (without asking first) and this is the first time I've seen anyone complain about it.
1Eli Tyre
I knew that I could, and didn’t, because it didn’t seem worth it. (Thinking that I still have to upload it to a third party photo repository and link to it. It’s easier than that now?)
2Raemon
In this case your blog already counted as a third party repository.
4Raemon
Some of these seem likely to generalize and some seem likely to be more specific. Curious about your thoughts "best experimental approaches to figuring out your own napping protocol."

Doing actual mini-RCTs can be pretty simple. You only need 3 things: 

1. A spreadsheet 

2. A digital coin for randomization 

3. A way to measure the variable that you care about

I think one of practically powerful "techniques" of rationality is doing simple empirical experiments like this. You want to get something? You don't know how to get it? Try out some ideas and check which ones work!

There are other applications of empiricism that are not as formal, and sometimes faster. Those are also awesome. But at the very least, I've found that doing ... (read more)

Is there a LessWrong article that unifies physical determinism and choice / "free will"? Something about thinking of yourself as the algorithm computed on this brain?

1Measure
Perhaps This one?

Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?

(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)

I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the oppo... (read more)

4Raemon
My own take is that moral mazes should be considered in the "interesting hypothesis" stage, and that the next step is to actually figure out how to go be empirical about checking it. I made some cursory attempts at this last year, and then found myself unsure this was even the right question. The core operationalization I wanted was something like: * Does having more layers of management introduce pathologies into an organization? * How much value is generated by organizations scaling up? * Can you reap the benefits of organizations scaling up by instead having them splinter off? (The "middle management == disconnected from reality == bad" hypothesis was the most clear-cut of the moral maze model to me, although I don't think it was the only part of the model) I have some disagreements with Zvi about this. I chatted briefly with habryka about this and I think he said something like "it seems like a more useful question is to look for positive examples of orgs that work well, rather than try and tease out various negative ways orgs could fail to work." I think there are maybe two overarching questions this is all relevant to: 1. How should the rationality / xrisk / EA community handle scale? Should we be worried about introducing middle-management into ourselves? 2. What's up with civilization? Is maziness a major bottleneck on humanity? Should we try to do anything about it? (My default answer here is "there's not much to be done here, simply because the world is full of hard problems and this one doesn't seem very tractable even if the models are straightforwardly true." But, I do think this is a contender for humanity hamming problem)
2Dagon
There are multiple dimensions to the credibility question.  You probably should increase your credence from prior to reading it/about it that large organizations very often have more severe misalignment than you thought.  You probably should recognize that the model of middle-management internal competition has some explanatory power.   You probably should NOT go all the way to believing that the corporate world is homogeneously broken in exactly this way.  I don't think he makes that claim, but it's what a lot of readers seem to take.  There's plenty of variation, and the Anna Karenina principle applies (paraphrased): well-functioning organizations are alike; disfunctional organizations are each broken in their own way.   But really, it's wrong too - each group is actually distinct, and has distinct sets of forces that have driven it to whatever pathologies or successes it has.  Even when there are elements that appear very similar, they have different causes and likely different solutions or coping mechanisms. "is most of the world dominated by moral mazes"?  I don't think this is a useful framing.  Most groups have some elements of Moral Mazes.  Some groups appear dominated by those elements, in some ways.  From the outside, most groups are at least somewhat effective at their stated mission, so the level of domination is low enough that it hasn't killed them (though there are certainly "zombie orgs" which HAVE been killed, but don't know it yet).  

My understanding is that there was a 10 year period starting around 1868, in which South Carolina's legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.

I would like to find a relatively non-partisan account of this period.

Anyone have suggestions?

1_mp_
I would just read W. E. B. Du Bois - Black Reconstruction in America (1935)

When is an event surprising enough that I should be confused?

Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.

In retrospect, I should have been surprised that "Rohin" kept talking about what Eliezer says in the Sequences. I wouldn't have guessed that Rohin was that "culturally rationalist" or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohi... (read more)

2Raemon
Surprise and confusion are two different things[1], but surprise usually goes along with confusion. I think it's a good rationalist skill-to-cultivate to use "surprise" as a trigger to practice noticing confusion, because you don't get many opportunities to do that. I think for most people this is worth doing for minor surprises, not so much because you're that likely to need to do a major update, but because it's just good mental hygiene/practice. 1. ^ Surprise is "an unlikely thing happened." Confusion is "a thing I don't have a good explanation for happened."

What was the best conference that you every attended?

2Yoav Ravid
IDEC - International Democratic Education Conference - it's hosted by a democratic school in a different country each year, so I attended when my school was hosting (it was 2 days in our school and then 3 more days somewhere else). It was very open, had very good energy, had great people which I got to meet (and since it wasn't too filled with talks actually got the time to talk to) - and oh, yeah, also a few good talks :) If you have any more specific questions I'd be happy to answer.

I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers. 

I thought he specifically mentioned "using AI as a microscope."

Is that a real post, or am I misremembering this one? 

4Unnamed
https://www.lesswrong.com/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety 

Are there any hidden risks to buying or owning a car that someone who's never been a car owner might neglect?

I'm considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.

That's inexpensive enough that I'm not that worried about it completely breaking down on me. I'm willing to just eat the monetary cost for the information value.

However, maybe there are other costs or other risks that I'm not tracking, that make this a worse idea.

Things like

- Some ways that a car can break make it dangerous, instead of ... (read more)

3[anonymous]
There are.  https://www.iihs.org/ratings/driver-death-rates-by-make-and-model You can explore the data yourself, but the general trend is that it appears there have been real improvements in crash fatality rates.  Better designed structure, more and better airbags, stability control, and now in some new vehicles automatic emergency braking is standard. Generally a bigger vehicle like a minivan is safer, and a newer version of that minivan will be safer, but you just have to go with what you can afford. Main risk is simply that at this price point that minivan is going to have a lot of miles, and it's simply probability how long it will run until a very expensive major repair is needed.  One strategy is to plan to junk the vehicle and get a similar 'beater' vehicle when the present one fails. If you're so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis.  I'm not saying one exists, but there are survival risks to poverty.
2Eli Tyre
Lol. I'm not impoverished, but I want to cheaply experiment with having a car. It isn't worth it to spend throw away $30,000 on a thing that I'm not going to get much value from.
7[anonymous]
Ok but at the price point you are talking you are not going to have a good time. Analogy: would you "experiment with having a computer" by grabbing a packard bell from the 1990s and putting an ethernet card in it so it can connect to the internet from windows 95? Do you need the minivan form factor? As a vehicle in decent condition (6-10 years old, under 100k miles, from a reputable brand) is cheapest in the small car form factor.
2Raemon
Not spending $30,000 makes sense, but my impression from car shopping last year was that trying to get a good car for less than $7k was fairly hard. (I get the ‘willingness to eat the cost’ price point of $1k, but wanted to highlight that the next price point up was more like 10k than 30k.) Depending on your experimentation goals, you might want to rent a a car rather than buy.
2Dagon
Most auto shops will do a safety/mechanical inspection for a small amount (usually in the $50-200 range, but be aware that the cheaper ones subsidize it by anticipating that they can sell you services to fix the car if you buy it).    However, as others have said, this price point is too low for your first car as a novice, unless you have a mentor and intend to spend a lot of time learning to maintain/fix.  Something reliable enough for you to actually run the experiment and get the information you want about the benefits vs frustrations of owning a car is going to run probably $5-$10K, depending on regional variance and specifics of your needs.   For a first car, look into getting a warranty, not because it's a good insurance bet, but because it forces the seller to make claims of warrantability to their insurance company. You can probably cut the cost in half (or more) if you educate yourself and get to know the local car community.  If the car is a hobby rather than an experiment in transportation convenience, you can take a lot more risk, AND those risks are mitigated if you know how to get things fixed cheaply.

Is there a standard article on what "the critical risk period" is?

I thought I remembered an arbital post, but I can't seem to find it.

I remember reading a Zvi Mowshowitz post in which he says something like "if you have concluded that the most ethical thing to do is to destroy the world, you've made a mistake in your reasoning somewhere." 

I spent some time search around his blog for that post, but couldn't find it. Does anyone know what I'm talking about? 

2Pattern
It sounds like a tagline for a blog.
2Raemon
Probably this one? http://lesswrong.com/posts/XgGwQ9vhJQ2nat76o/book-trilogy-review-remembrance-of-earth-s-past-the-three
2Eli Tyre
Thanks! I thought that it was in the context of talking about EA, but maybe this is what I am remembering?  It seems unlikely though, since wouldn't have read the spoiler-part.

Anyone have a link to the sequence post where someone posits that AIs would do art and science from a drive to compress information, but rather it would create and then reveal cryptographic strings (or something)?

1niplav
I think you are thinking of “AI Alignment: Why It’s Hard, and Where to Start”: There's also a mention of that method in this post.

I remember reading a Zvi Mowshowitz post in which he says something like "if you have concluded that the most ethical thing to do is to destroy the world, you've made a mistake in your reasoning somewhere." 

I spent some time search around his blog for that post, but couldn't find it. Does anyone know what I'm talking about? 

2Raemon
Review of three body problem is my first guess

A hierarchy of behavioral change methods

Follow up to, and a continuation of the line of thinking from: Some classes of models of psychology and psychological change

Related to: The universe of possible interventions on human behavior (from 2017)

This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before... (read more)

Can anyone get a copy of this paper for me? I'm looking to get clarity about how important cryopreserving non-brain tissue is for preserving personality.

9Raemon
I'm interested in knowing your napping tools
1Eli Tyre
Here you go. New post: Napping Protocol
2Raemon
Thanks!