Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.
This post was my attempt at the time.
I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon.
2024
An optional feature that I think LessWrong should have: shortform posts that get more than some amount of karma get automatically converted into personal blog posts, including all the comments.
It should have a note at the top "originally published in shortform", with a link to the shortform comment. (All the copied comments should have a similar note).
I think its reasonable for the conversion to be at the original author's discretion rather than an automatic process.
Whether or not it would happen by default, this would be the single most useful LW feature for me. I'm often really unsure whether a post will get enough attention to be worth making it a longform, and sometimes even post shortforms like "comment if you want this to be a longform".
Disagreed insofar by "automatically converted" you mean "the shortform author has no recourse against this'".
No. That's why I said the feature should be optional. You can make a general default setting for your shortform, plus there should and there should be a toggle (hidden in the three dots menu?) to turn this on and off on a post by post basis.
Just ask a LLM. The author can always edit it, after all.
My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.
If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".
I think that, in almost full generality, we should taboo the term "values". It's usually ambiguous between a bunch of distinct meanings.
I at least partly buy this, but I want to play devil's advocate.
Let's suppose there's a single underlying thing which ~everyone is gesturing at when talking about (humans') "values". How could a common underlying notion of "values" be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their "values"?
An analogy: in political science, people talk about "power". Right up top, wikipedia defines "power" in the political science sense as:
In political science, power is the social production of an effect that determines the capacities, actions, beliefs, or conduct of actors.
A minute's thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn't even match the way the term is used by the person who came up with that definition.
That's the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.
... but that doesn't imply that there's no single underlyin...
New post: Some things I think about Double Crux and related topics
I've spent a lot of my discretionary time working on the broad problem of developing tools for bridging deep disagreements and transferring tacit knowledge. I'm also probably the person who has spent the most time explicitly thinking about and working with CFAR's Double Crux framework. It seems good for at least some of my high level thoughts to be written up some place, even if I'm not going to go into detail about, defend, or substantiate, most of them.
The following are my own beliefs and do not necessarily represent CFAR, or anyone else.
I, of course, reserve the right to change my mind.
[Throughout I use "Double Crux" to refer to the Double Crux technique, the Double Crux class, or a Double Crux conversation, and I use "double crux" to refer to a proposition that is a shared crux for two people in a conversation.]
Here are some things I currently believe:
(General)
People rarely change their mind when they feel like you have trapped them in some inconsistency [...] In general (but not universally) it is more productive to adopt a collaborative attitude of sincerely trying to help a person articulate, clarify, and substantiate [bolding mine—ZMD]
"People" in general rarely change their mind when they feel like you have trapped them in some inconsistency, but people using the double-crux method in the first place are going to be aspiring rationalists, right? Trapping someone in an inconsistency (if it's a real inconsistency and not a false perception of one) is collaborative: the thing they were thinking was flawed, and you helped them see the flaw! That's a good thing! (As it is written of the fifth virtue, "Do not believe you do others a favor if you accept their arguments; the favor is to you.")
Obviously, I agree that people should try to understand their interlocutors. (If you performatively try to find fault in something you don't understand, then apparent "faults" you find are likely to be your own misunderstandings rather than actual faults.) But if someone spots an actual inconsistency in my ideas, I want them to tell me right away. Pe
...Old post: RAND needed the "say oops" skill
[Epistemic status: a middling argument]
A few months ago, I wrote about how RAND, and the “Defense Intellectuals” of the cold war represent another precious datapoint of “very smart people, trying to prevent the destruction of the world, in a civilization that they acknowledge to be inadequate to dealing sanely with x-risk.”
Since then I spent some time doing additional research into what cognitive errors and mistakes those consultants, military officials, and politicians made that endangered the world. The idea being that if we could diagnose which specific irrationalities they were subject to, that this would suggest errors that might also be relevant to contemporary x-risk mitigators, and might point out some specific areas where development of rationality training is needed.
However, this proved somewhat less fruitful than I was hoping, and I’ve put it aside for the time being. I might come back to it in the coming months.
It does seem worth sharing at least one relevant anecdote, from Daniel Ellsberg’s excellent book, the Doomsday Machine, and analysis, given that I’ve already written it up.
The missile gap
In the late nineteen-fi...
This was quite valuable to me, and I think I would be excited about seeing it as a top-level post.
New post: What is mental energy?
[Note: I’ve started a research side project on this question, and it is already obvious to me that this ontology importantly wrong.]
There’s a common phenomenology of “mental energy”. For instance, if I spend a couple of hours thinking hard (maybe doing math), I find it harder to do more mental work afterwards. My thinking may be slower and less productive. And I feel tired, or drained, (mentally, instead of physically).
Mental energy is one of the primary resources that one has to allocate, in doing productive work. In almost all cases, humans have less mental energy than they have time, and therefore effective productivity is a matter of energy management, more than time management. If we want to maximize personal effectiveness, mental energy seems like an extremely important domain to understand. So what is it?
The naive story is that mental energy is an actual energy resource that one expends and then needs to recoup. That is, when one is doing cognitive work, they are burning calories, depleting their bodies energy stores. As they use energy, they have less fuel to burn.
My current understanding is that this story is not physiologically realistic. T...
New post: Some notes on Von Neumann, as a human being
I recently read Prisoner’s Dilemma, which half an introduction to very elementary game theory, and half a biography of John Von Neumann, and watched this old PBS documentary about the man.
I’m glad I did. Von Neumann has legendary status in my circles, as the smartest person ever to live. [1] Many times I’ve written the words “Von Neumann Level Intelligence” in a AI strategy document, or speculated about how many coordinated Von Neumanns would it take to take over the world. (For reference, I now think that 10 is far too low, mostly because he didn’t seem to have the entrepreneurial or managerial dispositions.)
Learning a little bit more about him was humanizing. Yes, he was the smartest person ever to live, but he was also an actual human being, with actual human traits.
Watching this first clip, I noticed that I was surprised by a number of thing.
TL;DR: I’m offering to help people productively have difficult conversations and resolve disagreements, for free. Feel free to email me if and when that seems helpful. elitrye [at] gmail.com
Over the past 4-ish years, I’ve had a side project of learning, developing, and iterating on methods for resolving tricky disagreements, and failures to communicate. A lot of this has been in the Double Crux frame, but I’ve also been exploring a number of other frameworks (including, NVC, Convergent Facilitation, Circling-inspired stuff, intuition extraction, and some home-grown methods).
As part of that, I’ve had a standing offer to facilitate / mediate tricky conversations for folks in the CFAR and MIRI spheres (testimonials below). Facilitating “real disagreements”, allows me to get feedback on my current conversational frameworks and techniques. When I encounter blockers that I don’t know how to deal with, I can go back to the drawing board to model those problems and interventions that would solve them, and iterate from there, developing new methods.
I generally like doing this kind of conversational facilitation and am open to do...
[I wrote a much longer and more detailed comment, and then decided that I wanted to think more about it. In lieu of posting nothing, here's a short version.]
I mean I did very little facilitation one way or the other at that event, so I think my counterfactual impact was pretty minimal.
In terms of my value added, I think that one was in the bottom 5th percentile?
In terms of how useful that tiny amount of facilitation was, maybe 15 to 20th percentile? (This is a little weird, because quantity and quality are related. More active facilitation has a quality span: active (read: a lot of) facilitation can be much more helpful when it is good and much more disruptive / annoying / harmful, when it is bad, compared to less active backstop facilitation,
Overall, the conversation served the goals of the participants and had a median outcome for that kind of conversation, which is maybe 30th percentile, but there is a long right tail of positive outcomes (and maybe I am messing up how to think about percentile scores with skewed distributions).
The outcome that occured ("had an interesting conversation, and had some new thoughts / clarifications") is good but also far below the sort of outcome that I'm ussually aiming for (but often missing), of substantive, permanent (epistemic!) change to the way that one or both of the people orient on this topic.
That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better
The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that's better than the past, by creating new products that add value to people's lives.
I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.
It's been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.
Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more p...
Basically: I don't blame founders or companies for following their incentive gradients, I blame individuals/society for being unwilling to assign reasonable prices to important goods.
I think the bad-ness of dating apps is downstream of poor norms around impact attribution for matches made. Even though relationships and marriages are extremely valuable, individual people are not in the habit of paying that to anyone.
Like, $100k or a year's salary seems like a very cheap value to assign to your life partner. If dating apps could rely on that size of payment when they succeed, then I think there could be enough funding for something at least a good small business. But I've never heard of anyone actually paying anywhere near that. (myself included - though I paid a retroactive $1k payment to the person who organized the conference I met my wife at)
I think keeper.ai tries to solve this with large bounties on dating/marriages, it's one of the things I wish we pushed for more on Manifold Love. It seems possible to build one for the niche of "the ea/rat community"; Manifold Love, the checkboxes thing, dating docs got pretty good adoption for not that much execution.
(Also: be the change! I think building out OKC is one of the easiest "hello world" software projects one could imagine, Claude could definitely make a passable version in a day. Then you'll discover a bunch of hard stuff around getting users, but it sure could be a good exercise.)
I mean, it's obviously very dependent on your personal finance situation but I'm using $100k as an order of magnitude proxy for "about a years salary". I think it's very coherent to give up a year of marginal salary in exchange for finding the love of your life, rather than like $10k or ~1mo salary.
Of course, the world is full of mispricings, and currently you can save a life for something like $5k. I think these are both good trades to make, and most people should have a portfolio that consists of both "life partners" and "impact from lives saved" and crucially not put all their investment into just one or the other.
It's possible no one tried literally "recreate OkC", but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1] [2] [3] (and I think VCs are mostly correct that they won't make money [4] [5]).
(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of "several samples of weak evidence")
I don't particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.
You mention manifold.love, but also mention it's in maintenance mode – I think because the type of business you want people to build does not in fact work.
I think it's fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you're trying to draw an update that's something like "tech startups should be doing an unbiased search through viable valuable business, but they're clearly not", or maybe, "tech startups are suppose...
I agree that more people should be starting revenue-funded/bootstrapped businesses (including ones enabled by software/technology).
The meme is that if you're starting a tech company, it's going to be a VC-funded startup. This is, I think, a meme put out by VCs themselves, including Paul Graham/YCombinator, and it conflates new software projects and businesses generally with a specific kind of business model called the "tech startup".
Not every project worth doing should be a business (some should be hobbies or donation-funded) and not every business worth doing should be a VC-funded startup (some should be bootstrapped and grow from sales revenue.)
The VC startup business model requires rapid growth and expects 30x returns over a roughly 5-10 year time horizon. That simply doesn't include every project worth doing. Some businesses are viable but are not likely to grow that much or that fast; some projects shouldn't be expected to be profitable at all and need philanthropic support.
I think the narrative that "tech startups are where innovation happens" is...badly incomplete, but still a hell of a lot more correct than "tech startups are net destructive". ...
I worked at Manifold but not on Love. My impression from watching and talking to my coworkers was that it was a fun side idea that they felt like launching and seeing if it happened to take off, and when it didn't they got bored and moved on. Manifold also had a very quirky take on it due to the ideology of trying to use prediction markets as much as possible and making everything very public. I would advise against taking it seriously as evidence that an OKC-like product is a bad idea or a bad business.
Shreeda Segan is working on building it, as a cashflow business. they need $10K to get to the MVP. https://manifund.org/projects/hire-a-dev-to-finish-and-launch-our-dating-site
(Reasonably personal)
I spend a lot of time trying to build skills, because I want to be awesome. But there is something off about that.
I think I should just go after things that I want, and solve the problems that come up on the way. The idea of building skills sort of implies that if I don't have some foundation or some skill, I'll be blocked, and won't be able to solve some thing in the way of my goals.
But that doesn't actually sound right. Like it seems like the main important thing for people who do incredible things is their ability to do problem solving on the things that come up, and not the skills that they had previously built up in a "skill bank".
Raw problem solving is the real thing and skills are cruft. (Or maybe not cruft per se, but more like a side effect. The compiled residue of previous problem solving. Or like a code base from previous project that you might repurpose.)
Part of the problem with this is that I don't know what I want for my own sake, though. I want to be awesome, which in my conception, means being able to do things.
I note that wanting "to be able to do things" is a leaky sort of motivation: because the...
Thesis: I now think that utility functions might be a pretty bad abstraction for thinking about the behavior of agents in general including highly capable agents.
[Epistemic status: half-baked, elucidating an intuition. Possibly what I’m saying here is just wrong, and someone will helpfully explain why.]
Over the past years, in thinking about agency and AI, I’ve taken the concept of a “utility function” for granted as the natural way to express an entity's goals or preferences.
Of course, we know that humans don’t have well defined utility functions (they’re inconsistent, and subject to all kinds of framing effects), but that’s only because humans are irrational. To the extent that a thing acts like an agent, it’s behavior corresponds to some utility function. That utility function might not be explicitly represented, but if an agent is rational, there’s some utility function that reflects it’s preferences.
Given this, I might be inclined to scoff at people who scoff at “blindly maximizing” AGIs. “They just don’t get it”, I might think. “T...
New post: The Basic Double Crux Pattern
[This is a draft, to be posted on LessWrong soon.]
I’ve spent a lot of time developing tools and frameworks for bridging "intractable" disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work on it.
People often express to me something to the effect, “The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.”
I half agree with that sentiment. I do think that those low level cognitive and conversational patterns are the most important thing, and at Double Crux trainings that I have run, most of the time is spent focusing on specific exercises to instill those low level TAPs.
However, I don’t think that the only value of the Double Crux schema is in training those low level habits. Double cruxes are extremely powerful machines that allow one to identify, if not the most efficient conversational path, a very high efficiency conversationa...
Eliezer claims that dath ilani never give in to threats. But I'm not sure I buy it.
The only reason people will make threats against you, the argument goes, is if those people expect that you might give in. If you have an iron-clad policy against acting in response to threats made against you, then there's no point in making or enforcing the threats in the first place. There's no reason for the threatener to bother, so they don't. Which means in some sufficiently long run, refusing to submit to threats means you're not subject to threats.
This seems a bit fishy to me. I have a lingering suspicion that this argument doesn't apply, or at least doesn't apply universally, in the real world.
I'm thinking here mainly of a prototypical case of an isolated farmer family (like the early farming families of the greek peninsula, not absorbed into a polis), being accosted by some roving bandits, such as the soldiers of the local government. The bandits say "give us half your harvest, or we'll just kill you."
The argument above depends on a claim about the cost of executing on a threat. "There's no reason to bother" implies that the threatener has a preference not to bother, if they know that the t...
Eliezer, this is what you get for not writing up the planecrash threat lecture thread. We'll keep bothering you with things like this until you give in to our threats and write it.
What you’ve hit upon is “BATNA,” or “Best alternative to a negotiated agreement.” Because the robbers can get what they want by just killing the farmers, the dath ilani will give in- and from what I understand, Yudowsky therefore doesn’t classify the original request (give me half your wheat or die) as a threat.
This may not be crazy- it reminds me of the Ancient Greek social mores around hospitality, which seem insanely generous to a modern reader but I guess make sense if the equilibrium number of roving <s>bandits</s> honored guests is kept low by some other force
Old post: A mechanistic description of status
[This is an essay that I’ve had bopping around in my head for a long time. I’m not sure if this says anything usefully new-but it might click with some folks. If you haven’t read Social Status: Down the Rabbit Hole on Kevin Simler’s excellent blog, Melting Asphalt read that first. I think this is pretty bad and needs to be rewritten and maybe expanded substantially, but this blog is called “musings and rough drafts.”]
In this post, I’m going to outline how I think about status. In particular, I want to give a mechanistic account of how status necessarily arises, given some set of axioms, in much the same way one can show that evolution by natural selection must necessarily occur given the axioms of 1) inheritance of traits 2) variance in reproductive success based on variance in traits and 3) mutation.
(I am not claiming any particular skill at navigating status relationships, any more than a student of sports-biology is necessarily a skilled basketball player.)
By “status” I mean prestige-status.
Axiom 1: People have goals.
That is, for any given human, there are some things that they want. This can include just about anything. You might wan...
I've offered to be a point person for folks who believe that they were severely impacted by Leverage 1.0, and have related information, but who might be unwilling to share that info, for any of a number of reasons.
In short,
So it seems like one way that the world could go is:
I could imagine China building a competent domestic chip industry. China seems more determined to do that than the US is.
Though notably, China is not on track to do that currently. It's not anywhere close to it's goal producing 70% it's chips, by 2025.
And if the US was serious about building a domestic cutting-edge chip industry again, could it? I basically don't think that American work culture can keep up with Taiwanese/TSMC work culture, in this super-competitive industry.
TSMC is building fabs in the US, but from what I hear, they're not going well.
(While TSMC is a Taiwanese company, having a large fraction of TSMC fabs in in the US would preement the scenario above. TSMC fabs in the US counts as "a domestic US chip industry.")
Building and running leading node fabs is just a really really hard thing to do.
I guess the most likely status scenario is the continuation of the status quo where China and the US continue to both awkwardly depend on TSMC's chips for crucial military and economic AI tech.
Something that I've been thinking about lately is the possibility of an agent's values being partially encoded by the constraints of that agent's natural environment, or arising from the interaction between the agent and environment.
That is, an agent's environment puts constraints on the agent. From one perspective removing those constraints is always good, because it lets the agent get more of what it wants. But sometimes from a different perspective, we might feel that with those constraints removed, the agent goodhearts or wire-heads, or otherwise fails to actualize its "true" values.
The Generator freed from the oppression of the Discriminator
As a metaphor: if I'm one half of a GAN, let's say the generator, then in one sense my "values" are fooling the discriminator, and if you make me relatively more powerful than my discriminator, and I dominate it...I'm loving it, and also no longer making good images.
But you might also say, "No, wait. That is a super-stimulus, and actually what you value is making good images, but half of that value was encoded in your partner."
This second perspective seems a little stupid to me. A little too Aristotelian. I mean if we're going to take that ...
[Real short post. Random. Complete speculation.]
Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.
I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.
But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.
new post: Metacognitive space
[Part of my Psychological Principles of Personal Productivity, which I am writing mostly in my Roam, now.]
Metacognitive space is a term of art that refers to a particular first person state / experience. In particular it refers to my propensity to be reflective about my urges and deliberate about the use of my resources.
I think it might literally be having the broader context of my life, including my goals and values, and my personal resource constraints loaded up in peripheral awareness.
Metacognitive space allows me to notice aversions and flinches, and take them as object, so that I can respond to them with Focusing or dialogue, instead of being swept around by them. Similarly, it seems to, in practice, to reduce my propensity to act on immediate urges and temptations.
[Having MCS is the opposite of being [[{Urge-y-ness | reactivity | compulsiveness}]]?]
It allows me to “absorb” and respond to happenings in my environment, including problems and opportunities, taking considered instead of semi-automatic, first response that occurred to me, action. [That sentence there feels a little fake, or maybe about something else, or may...
In this interview, Eliezer says the following:
...I think if you push anything [referring to AI systems] far enough, especially on anything remotely like the current paradigms, like if you make it capable enough, the way it gets that capable is by starting to be general.
And at the same sort of point where it starts to be general, it will start to have it's own internal preferences, because that is how you get to be general. You don't become creative and able to solve lots and lots of problems without something inside you that organizes your problem solvi
Does anyone know of a good technical overview of why it seems hard to get Whole Brain Emulations before we get neuromorphic AGI?
I think maybe I read a PDF that made this case years ago, but I don't know where.
There’s a psychological variable that seems to be able to change on different timescales, in me, at least. I want to gesture at it, and see if anyone can give me pointers to related resources.
[Hopefully this is super basic.]
There a set of states that I occasionally fall into that include what I call “reactive” (meaning that I respond compulsively to the things around me), and what I call “urgy” (meaning that that I feel a sort of “graspy” desire for some kind of immediate gratification).
These states all have...
I’ve decided that I want to to make more of a point to write down my macro-strategic thoughts, because writing things down often produces new insights and refinements, and so that other folks can engage with them.
This is one frame or lens that I tend to think with a lot. This might be more of a lens or a model-let than a full break-down.
There are two broad classes of problems that we need to solve: we have some pre-paradigmatic science to figure out, and we have have the problem of civilizational sanity.
There are a number ...
New (short) post: Desires vs. Reflexes
[Epistemic status: a quick thought that I had a minute ago.]
There are goals / desires (I want to have sex, I want to stop working, I want to eat ice cream) and there are reflexes (anger, “wasted motions”, complaining about a problem, etc.).
If you try and squash goals / desires, they will often (not always?) resurface around the side, or find some way to get met. (Why not always? What are the difference between those that do and those that don’t?) You need to bargain with them, or design outlet poli...
Totally an experiment, I'm trying out posting my raw notes from a personal review / theorizing session, in my short form. I'd be glad to hear people's thoughts.
This is written for me, straight out of my personal Roam repository. The formatting is a little messed up because LessWrong's bullet don't support indefinite levels of nesting.
This one is about Urge-y-ness / reactivity / compulsiveness
New post: Some musings about exercise and time discount rates
[Epistemic status: a half-thought, which I started on earlier today, and which might or might not be a full thought by the time I finish writing this post.]
I’ve long counted exercise as an important component of my overall productivity and functionality. But over the past months my exercise habit has slipped some, without apparent detriment to my focus or productivity. But this week, after coming back from a workshop, my focus and productivity haven’t really booted up.
Her...
New post: Capability testing as a pseudo fire alarm
[epistemic status: a thought I had]
It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.
I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you...
In There’s No Fire Alarm for Artificial General Intelligence Eliezer argues:
A fire alarm creates common knowledge, in the you-know-I-know sense, that there is a fire; after which it is socially safe to react. When the fire alarm goes off, you know that everyone else knows there is a fire, you know you won’t lose face if you proceed to exit the building.
If I have a predetermined set of tests, this could serve as a fire alarm, but only if you've successfully built a consensus that it is one. This is hard, and the consensus would need to be quite strong. To avoid ambiguity, the test itself would need to be demonstrably resistant to being clever Hans'ed. Otherwise it would be just another milestone.
Sometime people talk about advanced AIs "boiling the oceans". My impression is that there's some specific model for why that is plausible outcome (something about energy and heat dispensation?), and it's not just a random "big change."
What is that model? Is there existing citations for the idea, including LessWrong posts?
Roughly, Earth average temperature:
Where j is dissipating power per area and sigma is Stephan-Boltzmann constant.
We can estimate j as
Where is a solar constant 1361 W/m^2. We take all incoming power and divide it by Earth surface area. Earth albedo is 0.31.
After substitution of variables, we get Earth temperature 254K (-19C), because we ignore greenhouse effect here.
How much humanity power consumption contributes to direct warming? In 2023 Earth energy consumption was 620 exajoules (source: first link in Google), which is 19TW. Modified rough estimation of Earth temperature is:
Human power production per square meter is, like, 0.04W/m^2, which gives us approximately zero effect of direct Earth heating on Earth temperature. But what happens if we, say, increase power by factor x1000? We are going to get increase of Earth temperature to 264K, by 10K, again, we are ignoring greenhouse effect. But qualitatively, increasing power consumption x1000 is likely to screw the biosphere really hard, if we count increasing amount of water vapor, CO2 from water and methane from melting permafrost.
How is it realistic to...
How do you use a correlation coefficient to do a Bayesian update?
For instance, the wikipedia page on the Heritability of IQ reads:
"The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15."
I'd like to get an intuitive sense of what those quantities actually mean, "how big" they are, how impressed I should be with them.
I imagine I would do that by working out a series of examples. Examples like...
If I know that Alice has has an IQ of 120, what does that tell me about th...
I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn't make them super-human.
Does anyone remember what I'm talking about?
Question: Have Moral Mazes been getting worse over time?
Could the growth of Moral Mazes be the cause of cost disease?
I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how "mazy" an organization is.
I considered the metric of "how much output for each input", but 1) that metric is just cost disease itself, so it doesn't help us distinguish the mazy cause from other possible causes, 2) If you're good enough at rent seeking maybe you can get high revenue despite you poor production.
What metric could we use?
This is my current take about where we're at in the world:
Deep learning, scaled up, might be basically enough to get AGI. There might be some additional conceptual work necessary, but the main difference between 2020 and the year in which we have transformative AI is that in that year, the models are much bigger.
If this is the case, then the most urgent problem is strong AI alignment + wise deployment of strong AI.
We'll know if this is the case in the next 10 years or so, because either we'll continue to see incredible gains from increasingly bigger Deep L...
I was thinking lately about how there are some different classes of models of psychological change, and I thought I would outline them and see where that leads me.
It turns out it led me into a question about where and when Parts-based vs. Association-based models are applicable.
Some examples:
This is the frame that I make the most use of, in my personal practice. It assumes that all behavior is the result of some goal directed subproce...
Can someone affiliated with a university, ect. get me a PDF of this paper?
https://psycnet.apa.org/buy/1929-00104-001
It is on Scihub, but that version is missing a few pages in which they describe the methodology.
[I hope this isn't an abuse of LessWrong.]
In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.
I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of... not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.
That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.
Doing actual mini-RCTs can be pretty simple. You only need 3 things:
1. A spreadsheet
2. A digital coin for randomization
3. A way to measure the variable that you care about
I think one of practically powerful "techniques" of rationality is doing simple empirical experiments like this. You want to get something? You don't know how to get it? Try out some ideas and check which ones work!
There are other applications of empiricism that are not as formal, and sometimes faster. Those are also awesome. But at the very least, I've found that doing ...
Is there a LessWrong article that unifies physical determinism and choice / "free will"? Something about thinking of yourself as the algorithm computed on this brain?
Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?
(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)
I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the oppo...
My understanding is that there was a 10 year period starting around 1868, in which South Carolina's legislature was mostly black, and when the universities were integrated (causing most white students to leave), before the Dixiecrats regained power.
I would like to find a relatively non-partisan account of this period.
Anyone have suggestions?
Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.
In retrospect, I should have been surprised that "Rohin" kept talking about what Eliezer says in the Sequences. I wouldn't have guessed that Rohin was that "culturally rationalist" or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohi...
I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned "using AI as a microscope."
Is that a real post, or am I misremembering this one?
Are there any hidden risks to buying or owning a car that someone who's never been a car owner might neglect?
I'm considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.
That's inexpensive enough that I'm not that worried about it completely breaking down on me. I'm willing to just eat the monetary cost for the information value.
However, maybe there are other costs or other risks that I'm not tracking, that make this a worse idea.
Things like
- Some ways that a car can break make it dangerous, instead of ...
Is there a standard article on what "the critical risk period" is?
I thought I remembered an arbital post, but I can't seem to find it.
I remember reading a Zvi Mowshowitz post in which he says something like "if you have concluded that the most ethical thing to do is to destroy the world, you've made a mistake in your reasoning somewhere."
I spent some time search around his blog for that post, but couldn't find it. Does anyone know what I'm talking about?
Anyone have a link to the sequence post where someone posits that AIs would do art and science from a drive to compress information, but rather it would create and then reveal cryptographic strings (or something)?
I remember reading a Zvi Mowshowitz post in which he says something like "if you have concluded that the most ethical thing to do is to destroy the world, you've made a mistake in your reasoning somewhere."
I spent some time search around his blog for that post, but couldn't find it. Does anyone know what I'm talking about?
Follow up to, and a continuation of the line of thinking from: Some classes of models of psychology and psychological change
Related to: The universe of possible interventions on human behavior (from 2017)
This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before...
I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.