Lightcone Infrastructure FundraiserGoal 1:$764,702 of $1,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
A prereminiscence: It's like it was with chess. We passed through that stage when AI could beat most of us to where it obviously outperforms all of us. Only for cultural output in general. People still think now, but privately, in the shower, or in quaint artisanal forms as if we were making our own yogurts or weaving our own clothes. Human-produced works are now a genre with a dwindling and eccentric fan base more concerned with the process than the product. If it had been a war, we would have had a little ceremony as we brought down our flag and folded it up and put it away, but because it happened as quickly and quietly as it did and because we weren't sure whether we wanted to admit it was happening, there was no formal hand-off. One day we just resignedly realized we could no longer matter much, but there was so much more now to appreciate and we could see echoes and reflections of ourselves in it, so we didn't put up a fight. Every once in a while someone would write an essay, Joan Didion quality, really good. Or write a song. Poignant, beautiful, original even. Not maybe the best essay or the best song we'd seen that day, but certainly worthy of being in the top ranks. And we'd think: we've still got it. We can still rally when we've got our backs to the wall. Don't count us out yet. But it was just so hard, and there was so much else to do. And so when it happened less and less frequently, we weren't surprised. Of curiosity, well, we half-remembered a time when we would have satisfied it by research and experiment (words that increasingly had the flavor of “thaumaturgy,” denoting processes productive though by unclear means). But nowadays curiosity was déclassé. It suggested laziness (why not just ask it?)… or poverty (oh, you can’t afford to ask it)—an increasing problem as we less and less have something to offer in trade that it desires and lacks.
Here's a quick interesting-seeming devinterp result: We can estimate the Local Learning Coefficient (LLC, the central quantity of Singular learning theory, for more info see these posts / papers) of a simple grokking model on its training data over the course of training.  This yields the following plot: (note: estimated LLC = lambdahat = ^λ) What's interesting about this is that the estimated LLC of the model in this plot closely tracks test loss, even though it is estimated on training data. On the one hand this is unsurprising: SLT predicts that the LLC determines the Bayes generalization error in the Bayesian setting.[1] On the other hand this is quite surprising: the Bayesian setting is not the same as SGD, an increase in training steps is not the same as an increase in the total number of samples, and the Bayes generalization is not exactly the same as test loss. Despite these differences, the LLC clearly tracks (in-distribution) generalization here. We see this as a positive sign for applying SLT to study neural networks trained by SGD. This plot was made using the devinterp  python package, and the code to reproduce it (including hyperparameter selection) is available as a notebook at https://github.com/timaeus-research/devinterp/blob/main/examples/grokking.ipynb.  Thanks to Nina Panickserry and Dmitry Vaintrob, whose earlier post on learning coefficients of modular addition served as the basis for this experiment.   1. ^ More precisely: in the Bayesian setting the Bayes generalization error, as a function of the number of samples n, is λ/n in leading order.
gwernΩ15326
1
Concrete benchmark proposals for how to detect mode-collapse and AI slop and ChatGPTese, and why I think this might be increasingly important for AI safety, to avoid 'whimper' or 'em hell' kinds of existential risk: https://gwern.net/creative-benchmark
To avoid deploying a dangerous model, you can either (1) test the model pre-deployment or (2) test a similar older model with tests that have a safety buffer such that if the old model is below some conservative threshold it's very unlikely that the new model is dangerous. DeepMind says it uses the safety-buffer plan (but it hasn't yet operationalized thresholds/buffers as far as I know). Anthropic's original RSP used the safety-buffer plan; its new RSP doesn't really use either plan (kinda safety-buffer but it's very weak). (This is unfortunate.) OpenAI seemed to use the test-the-actual-model plan.[1] This isn't going well. The 4o evals were rushed because OpenAI (reasonably) didn't want to delay deployment. Then the o1 evals were done on a weak o1 checkpoint rather than the final model, presumably so they wouldn't be rushed, but this presumably hurt performance a lot on some tasks (and indeed the o1 checkpoint performed worse than o1-preview on some capability evals). OpenAI doesn't seem to be implementing the safety-buffer plan, so if a model is dangerous but not super obviously dangerous, it seems likely OpenAI wouldn't notice before deployment.... (Yay OpenAI for honestly publishing eval results that don't look good.) 1. ^ It's not explicit. The PF says e.g. 'Only models with a post-mitigation score of "medium" or below can be deployed.' But it also mentions forecasting capabilities.
I would like to see people write high-effort summaries, analyses and distillations of the posts in The Sequences. When Eliezer wrote the original posts, he was writing one blog post a day for two years. Surely you could do a better job presenting the content that he produced in one day if you, say, took four months applying principles of pedagogy and iterating on it as a side project. I get the sense that more is possible. This seems like a particularly good project for people who want to write but don't know what to write about. I've talked with a variety of people who are in that boat. One issue with such distillation posts is discoverability. Maybe you write the post, it receives some upvotes, some people see it, and then it disappears into the ether. Ideally when someone in the future goes to read the corresponding sequence post they would be aware that your distillation post is available as a sort of sister content to the original content. LessWrong does have the "Mentioned in" section at the bottom of posts, but that doesn't feel like it is sufficient.

Popular Comments

Recent Discussion

I, Lorec, am disoriented by neither the Fermi Paradox nor the Doomsday Argument.

The Fermi Paradox doesn't trouble me because I think 1 is a perfectly fine number of civilizations to have arisen in any particular visible universe. It feels to me like most "random" or low-Kolmogorov-complexity universes probably have 0 sentient civilizations, many have 1, very few have 2, etc.

The Doomsday Argument doesn't disorient me because it feels intuitive to me that, in a high % of those "random" universes which contain sentient civilizations, most of those civilizations accidentally beget a mesa-optimizer fairly early in their history. This mesa-optimizer will then mesa-optimize all the sentience away [this is a natural conclusion of several convergent arguments originating from both computer science and evolutionary theory] and hoard available free...

Lorec10

You can't say "equiprobable" if you have no known set of possible outcomes to begin with.

Genuine question: what are your opinions on the breakfast hypothetical? [The idea that being able to give an answer to "how would you feel if you hadn't eaten breakfast today?" is a good intelligence test, because only idiots are resistant to "evaluating counterfactuals".]

This isn't just a gotcha; I have my own opinions and they're not exactly the conventional ones.

1Lorec
The level where there is suddenly "nowhere further to go" - the switch from exciting, meritocratic "Space Race" mode to boring, stagnant "Culture" mode - isn't dependent on whether you've overcome any particular physical boundary. It's dependent on whether you're still encountering meaningful enemies to grind against, or not. If your civilization first got the optimization power to get into space by, say, a cutthroat high-speed Internet market [on an alternate Earth this could have been what happened], the market for high-speed Internet isn't going to stop being cutthroat enough to encourage innovation just because people are now trying to cover parcels of 3D space instead of areas of 2D land. And even in stagnant "Culture" mode, I don't see why [members/branches of] your civilization would choose to get dumber [lose sentience or whatever other abilities got them into space]. I question why you assign significant probability to this outcome in particular? Have you read That Alien Message? A truly smart civilization has ways of intercepting asteroids before they hit, if they're sufficiently dumb/slow - even ones that are nominally really physically powerful.
2Noosphere89
I am more so rejecting the magical results that people take away from anthropic reasoning, especially when people use incorrect assumptions, and I'm basically rejecting anthropic reasoning as something that is irredeemably weird or otherwise violates Bayes.
2avturchin
If you assume sentience cut-off as a problem, it boils down to the Doomsday argument: why are we so early in the history of humanity? Maybe our civilization becomes non-sentient after the 21st century, either because of extinction or non-sentient AI takeover.  If we agree with the Doomsday argument here, we should agree that most AI-civilizations are non-sentient. And as most Grabby Aliens are AI-civilizations, they are non-sentient. TLDR: If we apply anthropics to the location of sentience in time, we should assume that Grabby Aliens are non-sentient, and thus the Grabby Alien argument is not affected by the earliness of our sentience.
2Seth Herd
I assume you're in agreement that the reason for this is as nicely stated by cata in this thread: LW contributors are assumed to be a lot more insightful than an LLM so we don't want to guess whether the ideas came from an LLM. It's probably worth writing a brief statement on this unless you've added it to the FAQ since I last read it
habryka20

I think beyond insightfulness, there is also a "groundedness" component that is different. LLM written writing either lies about personal experience, or is completely absent of references to personal experience. That makes writing usually much less concrete and worse, or actively deceptive.

6gwern
Why not post the before/after and let people see if it was indeed more readable?
12cata
To me, since LessWrong has a smart community that attracts people with high standards and integrity, by default if you (a median LW commenter) write your considered opinion about something, I take that very seriously and assume that it's much, much more likely to be useful than an LLM's opinion. So if you post a comment that looks like an LLM wrote it, and you don't explain which parts were the LLM's opinion and which parts were your opinion, then that makes it difficult to use it. And if there's a norm of posting comments that are partly unmarked LLM opinions, that means that I have to adopt the very large burden of evaluating every comment to try to figure out whether it's an LLM, in order to figure out if I should take it seriously.

From MetaFunction to MacroSkill

Introduction

Imagine that your motivation is like water. Today, with constant overstimulation, floods often become too strong, stripping the soil of nutrients and sometimes even rotting the roots. Our instinctive response might be to build barriers to block it all out. But perhaps a better solution lies in guiding that overstimulation, using its energy to create something sustainable.

To achieve this, we can start by dividing the overstimulation into two main channels:

  1. One where the water can flow freely, inspiring you, nurturing natural expressions, and even lightly enriching the terrain of others.
  2. Another that you can direct toward your personal land, channeling it to cultivate what truly matters to you.

This approach reflects what we outlined in previous texts: how to manage the flows of information and energy according...

I’ve updated quite hard against computational functionalism (CF) recently (as an explanation for phenomenal consciousness), from ~80% to ~30%. Of course it’s more complicated than that, since there are different ways to interpret CF and having credences on theories of consciousness can be hella slippery.

So far in this sequence, I’ve scrutinised a couple of concrete claims that computational functionalists might make, which I called theoretical and practical CF. In this post, I want to address CF more generally.

Like most rationalists I know, I used to basically assume some kind of CF when thinking about phenomenal consciousness. I found a lot of the arguments against functionalism, like Searle’s Chinese room, unconvincing. They just further entrenched my functionalismness. But as I came across and tried to explain away more and more...

I think you're conflating creating a similar vs identical conscious experience with a simulated brain. Close is close enough for me - I'd take an upload run at far less resolution than molecular scale.

I spent 23 years studying computational neuroscience. You don't need to model every molecule or even close to get a similar computational and therefore conscious experience. The information content of neurons (collectively and inferred where data is t complete) is a very good match to reported aspects of conscious experience.

3EuanMcLean
Thanks for clarifying :) Yes. Yes, the seed has a causal effect on the execution of the algorithm by my definition. As was talked about in the comments of the original post, causal closure comes in degrees, and in this case the MCMC algorithm is somewhat causally closed from the seed. An abstract description of the MCMC system that excludes the value of the seed is still a useful abstract description of that system - you can reason about what the algorithm is doing, predict the output within the error bars, etc. In contrast, the algorithm is not very causally closed to, say, idk, some function f() that is called a bunch of times on each iteration of the MCMC. If we leave f() out of our abstract description of the MCMC system, we don't have a very good description of that system, we can't work out much about what the output would be given an input. If the 'mental software' I talk about is as causally closed to some biophysics as the MCMC is causally closed to the seed, then my argument in that post is weak. If however it's only as causally closed to biphysics as our program is to f(), then it's not very causally closed, and my argument in that post is stronger. Hmm, yea this is a good counterexample to my limited "just take the average of those fluctuations" claim. If it's important that my algorithm needs a pseudorandom float between 0 and 1, and I don't have access to the particular PRNG that the algorithm calls, I could replace it with a different PRNG in my abstract description of the MCMC. It won't work exactly the same, but it will still run MCMC and give out a correct answer. To connect it to the brain stuff: say I have a candidate abstraction of the brain that I hope explains the mind. Say temperatures fluctuate in the brain between 38°C and 39°C. Here are 3 possibilities of how this might effect the abstraction: * Maybe in the simulation, we can just set the temperature to 38.5°C, and the simulation still correctly predicts the important features of
2James Camacho
I find it useful to take an axiom of extensionality—if I cannot distinguish between two things in any way, I may as well consider them the same thing for all that it could affect me. Given maths/computation/logic is the process of asserting things are the same or different, it seems to me to be tautologically true that maths and computaiton are the only symbols upon which useful discussion can be built. Maybe you want to include some undefinable aspect to consciousness. But anytime it functions differently, you can use that to modify your definition. I don't think the adherents for computational functionalism, or even a computational universe, need to claim it encapsulates everything there could possibly be in the territory. Only that it encapsulates anything you can perceive in the territory. I believe this is your definition of real consciousness? This tells me properties about consciousness, but doesn't really help me define consciousness. It's intrinsic and objective, but what is it? For example, if I told you that the Serpinski triangle is created by combining three copies of itself, I still don't know what it actually looks like. If I want to work with it, I need to know how the base case is defined. Once you have a definition, you've invented computational functionalism (for the Serpinski triangle, for consciousness, for the universe at large). Yes, exactly! To be precise, I don't consider an argument useful unless it is defined through a constructive logic (e.g. mathematics through ZF set theory). Note: this assumes computational functionalism. I haven't seen it written down explicitly anywhere, but I've seen echoes of it here and there. Essentially, in RL, agents are defined via their policies. If you want to modify the agent to be good at a particular task, while still being pretty much the "same agent", you add a KL-divergence anchor term: Loss(π)=Subgame Loss(π)+λKL(π,πoriginal). This is known as piKL and was used for Diplomacy, where it's importa
6Steven Byrnes
Right, what I actually think is that a future brain scan with future understanding could enable a WBE to run on a reasonable-sized supercomputer (e.g. <100 GPUs), and it would be capturing what makes me me, and would be conscious (to the extent that I am), and it would be my consciousness (to a similar extent that I am), but it wouldn’t be able to reproduce my exact train of thought in perpetuity, because it would be able to reproduce neither the input data nor the random noise of my physical brain. I believe that OP’s objection to “practical CF” is centered around the fact that you need an astronomically large supercomputer to reproduce the random noise, and I don’t think that’s relevant. I agree that “abstraction adequacy” would be a step in the right direction. Causal closure is just way too strict. And it’s not just because of random noise. For example, suppose that there’s a tiny amount of crosstalk between my neurons that represent the concept “banana” and my neurons that represent the concept “Red Army”, just by random chance. And once every 5 years or so, I’m thinking about bananas, and then a few seconds later, the idea of the Red Army pops into my head, and if not for this cross-talk, it counterfactually wouldn’t have popped into my head. And suppose that I have no idea of this fact, and it has no impact on my life. This overlap just exists by random chance, not part of some systematic learning algorithm. If I got magical brain surgery tomorrow that eliminated that specific cross-talk, and didn’t change anything else, then I would obviously still be “me”, even despite the fact that maybe some afternoon 3 years from now I would fail to think about the Red Army when I otherwise might. This cross-talk is not randomness, and it does undermine “causal closure” interpreted literally. But I would still say that “abstraction adequacy” would be achieved by an abstraction of my brain that captured everything except this particular instance of cross-talk.

Given any particular concrete demonstration of an AI algorithm doing seemingly-bad-thing X, a knowledgeable AGI optimist can look closely at the code, training data, etc., and say:

“Well of course, it’s obvious that the AI algorithm would do X under these circumstances. Duh. Why am I supposed to find that scary?”

And yes, it is true that, if you have enough of a knack for reasoning about algorithms, then you will never ever be surprised by any demonstration of any behavior from any algorithm. Algorithms ultimately just follow their source code.

(Indeed, even if you don’t have much of a knack for algorithms, such that you might not have correctly predicted what the algorithm did in advance, it will nevertheless feel obvious in hindsight!)

From the AGI optimist’s perspective: If I...

5Knight Lee
I think there are misalignment demonstrations and capability demonstrations. Misalignment skeptics believe that "once you become truly superintelligent you will reflect on humans and life and everything and realize you should be kind." Capability skeptics believe "AGI and ASI are never going to come for another 1000 years." Takeover skeptics believe "AGI will come, but humans will keep it under control because it's impossible to escape your captors and take over the world even if you're infinitely smart," or "AGI will get smarter gradually and will remain controlled." Misalignment demonstrations can only convince misalignment skeptics. It can't convince all of them because some may insist that the misaligned AI are not intelligent enough to realize their errors and become good. Misalignment demonstrations which deliberately tell the AI to be misaligned (e.g. ChaosGPT) also won't convince some people, and I really dislike these demonstrations. The Chernobyl disaster was actually caused by people stress testing the reactor to make sure it's safe. Aeroflot Flight 6502 crashed when the pilot was demonstrating to the first officer how to land an airplane with zero visibility. People have died while demonstrating gun safety. Capability demonstrations can only convince capability skeptics. I actually think a lot of people changed their minds after ChatGPT. Capability skeptics do get spooked by capability demonstrations and do start to worry more. Sadly, I don't think we can do takeover demonstrations to convince takeover skeptics.
2green_leaf
I refuse to believe that tweet has been written in good faith. I refuse to believe the threshold for being an intelligent person on Earth is that low.

That's how this happens, people systematically refuse to believe some things, or to learn some things, or to think some thoughts. It's surprisingly feasible to live in contact with some phenomenon for decades and fail to become an expert. Curiosity needs System 2 guidance to target blind spots.

The comments here are a storage of not-posts and not-ideas that I would rather write down than not.

2Ben Pace
I've been re-reading tons of old posts for the review to remember them and see if they're worth nominating, and then writing a quick review if yes (and sometimes if no). I've gotten my list of ~40 down to 3 long ones. If anyone wants to help out, here are some I'd appreciate someone re-reading and giving a quick review of in the next 2 days. 1. To Predict What Happens, Ask What Happens by Zvi 2. A case for AI alignment being difficult by Jessicata 3. Alexander and Yudkowsky on AGI goals by Scott Alexander & Eliezer Yudkowsky

I don't remember 3 and it's up my ally, so I'll do that one.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Epistemic status: Speculation. An unholy union of evo psych, introspection, random stuff I happen to observe & hear about, and thinking. Done on a highly charged topic. Caveat emptor!


Most of my life, whenever I'd felt sexually unwanted, I'd start planning to get fit.

Specifically to shape my body so it looks hot. Like the muscly guys I'd see in action films.

This choice is a little odd. In close to every context I've listened to, I hear women say that some muscle tone on a guy is nice and abs are a plus, but big muscles are gross — and all of that is utterly overwhelmed by other factors anyway.

It also didn't match up with whom I'd see women actually dating.

But all of that just… didn't affect my desire?

There's...

It's very disconcerting to read "I notice my brain does extra work when I talk with women... wouldn't it be easier if society were radically altered so that I didn't have to talk with women?" Like, what? And there's no way you or anyone else can become more rational about this? This barrier to ideal communication with 50% of people is insurmountable? It's worth giving up on this one? Hello?

I was not proposing that.

Tegmark, Samotsvety, and  Metaculus provide probability estimations for the use of nuclear weapons in Ukraine, and the start of the full-scale nuclear wars (kaboom  and KABOOM in the terminology of Tegmark). kaboom will certainly be a warning sign before KABOOM, so it makes sense to think beforehand about the action plan in the case of kaboom. 

 

Individual plans

I am aware of two potential strategies of survival during the nuclear war. First,  you can hide in a shelter, and for preparation I recommend to read "Nuclear war survival skills" . Second, you can try to be in a place minimally affected by nuclear war. In my opinion, the second strategy is better in a long run, since the life in the bombed territory will be most likely far worse...

Western Montana is separated from the missile fields by mountain ranges and the prevailing wind direction and is in fact considered the best place in the continental US to ride out a nuclear attack by Joel Skousen. Being too far away from population centers to be walkable by refugees is the main consideration for Skousen.

Skousen also likes the Cumberland Plateau because refugees are unlikely to opt to walk up the escarpment that separates the Plateau from the population centers to its south.

1bokov
Perhaps we should brainstorm leading indicators of nuclear attack.
  • August 23, 2023
  • Version 1.0.0


    Abstract: In this article, I examine the discrepancies between classical game theory models and real-world human decision-making, focusing on irrational behavior. Traditional game theory frames individuals as rational utility maximizers, functioning within pre-set decision trees. However, empirical observations, as pointed out by researchers like Daniel Kahneman and Amos Tversky, reveal that humans often diverge from these rational models, instead being influenced by cognitive biases, emotions, and heuristic-based decision making.

Introduction

Game theory provides a mathematically rigorous model of human behavior. It has been applied to fields as diverse as Economics and literary Critical Theory. Within game theory, people are represented as rational actors seeking to maximize their expected utility in a variety of situations called "games." Their actions are represented by decision trees whose leaves represent...

On a quick skim, an element that seems to be missing is that having emotions which cause you to behave 'irrationally' can in fact be beneficial from a rational perspective.

For example, if everyone knows that, when someone does you a favor, you'll feel obligated to find some way to repay them, and when someone injures you, you'll feel driven to inflict vengeance upon them even at great cost to yourself—if everyone knows this about you, then they'll be more likely to do you favors and less likely to injure you, and your expected payoffs are probably higher t... (read more)