All of quetzal_rainbow's Comments + Replies

I think the large part of this phenomenon is social status. I.e., if you die early, it means that you did something really embarassingly stupid. Conversely, if you caused someone to die by, say, faulty construction or insufficient medical intervention, you should be really embarassed. If you can't prove/reliably signal that you behaving reasonably, you are incentivized to behave unreasonaboy safe to signal your commitment to not do stupid things. It's also probably linked to trade-off between social status and desire for reproduction. It also explains why ... (read more)

My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop "alien thought patterns" it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not on... (read more)

Criminal negligence leading to catastrophic consequences is already ostracized and persecuted, because, well, it's a crime.

3geoffreymiller
Sure. But if an AI company grows an ASI that extinguishes humanity, who is left to sue them? Who is left to prosecute them?  The threat of legal action for criminal negligence is not an effective deterrent if there is no criminal justice system left, because there is no human species left.

There were cases when LLMs were "lazier" on common vacations periods. EDIT: see here, for example

I feel like this position is... flimsy? Unsubstantial? It's not like I disagree, I don't understand why you would want to articulate it in this way.

On the one hand, I don't think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered "transhuman" instead of "posthuman"? Are you going to still be you after one billion years of life extension? "Keeping relevant features of you/humanity after enormous biological changes" seems to be qualitati... (read more)

3lesswronguser123
Virgin here I think your psychoanalysis of my situation is accurate. The thing is I am slightly scared/aversive of using quick takes too, mayhaps I should take more risks like the depicted chad but I'm a bit indecisive right now.

It is indeed surprising, because it indicates much more sanity I would otherwise expected.

Terrorism is not effective. The only ultimate result of 9/11 from perspective of bin Laden goals was "Al Qaeda got wiped out of the face of Earth and rival groups have replaced it". The only result of firebombing datacenter would be "every single personality in AI safety gets branded terrorist, destroying literally any chance to influence relevant policy".

I think more correct picture is that it's useful to have programmable behavior and then programmable system suddenly becomes Turing-complete weird machine and some of resulting programs are terminal-goal-oriented, which are favored by selection pressures: terminal goals are self-preserving.

Humans in native enviornment have programmable behavior in form of social regulation, information exchange and communicating instructions, if you add sufficient amount of computational power in this system you can get very wide spectrum of behaviors.

I think it's general picture of inner misalignment.

3TsviBT
That seems like part of the picture, but far from all of it. Manufactured stone tools have been around for well over 2 million years. That's the sort of thing you do when you already have a significant amount of "hold weeks-long goal in mind long and strong enough that you put in a couple day's effort towards it" (or something like that). Another example is Richard Alexander's hypothesis: warfare --> strong pressure toward cognitive mechanisms for group-goal-construction. Neither of these are mainly about programmability (though the latter is maybe somewhat). I don't think we see "random self-preserving terminal goals installed exogenously", I think we see goals being self-constructed and then flung into long-termness.
4Gunnar_Zarncke
Sure "The Cerebellum Is The Seat of Classical Conditioning." But I'm not sure it's the only one. Delay eyeblink conditioning is cerebellar-dependent, which we know because of lesion studies. This does not generalize to all conditioned responses: 1. Trace eyeblink conditioning requires hippocampus and medial prefrontal cortex in addition to cerebellum (Takehara 2003). 2. Fear conditioning is driven by the amygdala, not cerebellum. 3. Hebbian plasticity isn’t crushed by cerebellar learning. Cerebellum long-term depression is a timing‐sensitive variant of Hebb’s rule (van Beugen et al. 2013).

I think we can clearly conclude that cortex doesn't do what NNs do, because cortex is incapable to learn conditioned response, it's an uncontested fiefdom of cerebellum, while for NNs learning conditioned response is the simplest thing to do. It also crushes hypothesis of Hebbian rule. I think majority of people in neurobiology neighbourhood haven't properly updated on this fact.

We now have several different architectures that reach parity with but do not substantially exceed transformer: RWKV (RNN), xLSTM, Mamba, Based, etc. This implies they have a sha

... (read more)
2Eli Tyre
What? This isn't my understanding at all, and a quick check with an LLM also disputes this.
9the gears to ascension
what! big if true. what papers originated this claim for you?

I think canonical high-degree polynomial problem is high-dimensional search. We usually don't implement exact grid search because we can deploy Monte Carlo or gradient descent. I wonder if there is any hard lower bounds on approximation hardness for polynomial time problems.

I always struggle with such self-evaluation, that's why I thought of this experiment.

1CstineSublime
I would be very interested to see how narrow or broad that range is between the groups.

As far as I understand MIRI strategy, the point of the book is communication, it's written from assumption that many people would be really worried if they knew the content of the book, and the main reason why they are not worried now is because before book this content exists mostly in Very Nerdy corners of the internet and is written in very idiosyncratic style. My intended purpose of Pope endorsement is not to singal that book is true, it's to move for 2 billions of Catholics topic of AI x-risk from spot "something weird discussed in Very Nerdy corners of the internet" to spot "literal head of my church is interested in this stuff, maybe I should become interested too". 

Can you get Pope to read the book? He seems to have opinions on AI and getting his endorsement would be very useful from book promotion perspective.

I was thinking about something like this because Pope is also a math major. 

I've suggested a pathway or two for this; if you have independent pathways, please try them / coordinate with Rob Bensinger about trying them. 

11a3orn
Like if you want the "remarkably strong reception" to have strong [retracting: any] evidential value it needs to be sent to skeptics. Otherwise you're just like, "Yeah, we sent it to a bunch of famous people using our connections and managed to get endorsements from some." It's basically just filtered evidence and attempting to persuade someone with it is attempting to persuade them with means disconnected from the truth.

Timelines are really uncertain and you can always make predictions conditional on "no singularity". Even if singularity happens you can always ask superintelligence "hey, what would be the consequences of this particular intervention in business-as-usual scenario" and be vindicated.

7No77e
This is true, but then why not state "conditional on no singularity" if they intended that? I somehow don't buy that that's what they meant

If you want example particularly connected to prisons, you can take anarchist revolutionary Sergey Nechayev, who was able to propagandize prison guards enough to connect with outside terrorist cell. The only reason why Nechayev didn't escape is because Narodnaya Volya planned assasination of Tsar and they didn't want escape to interfere.

Thanks, I finally understood the problem with UDT 1.0.

Idea for experiment: make two really large groups of people take Big Five test. Tell one group to answer fast, based on feeling what is true. Tell the other group to seriously consider how they would behave in different situations related to questions. I think different ways of introspection would yield systematic bias in results.

2Thane Ruthenis
That seems interesting! Though my understanding is that it's not directly applicable here. The maximal redund only needs to contain all redundant information that can be contained in any other redund, it doesn't attempt to capture all redundant information in total. So if there's some constraint on what types of redundancy can be captured by any such variable, that's fine. 

My strictly personal hot take is that Quick Takes became a place of hiding from responsibility of top level posts. People sometimes write very large Quick Takes which certainly would be top level posts, but, in my model of these people, top level posts have stricter moderation and cause harsher perception and scarier overall, so people opt out for Quick Takes. I'm afraid if we create even less strict section of LW, people are going to migrate there entirely.

isn't that gosh darn similar to "darwinism"?

No, it's not. Darwinism, as scientific position and not vibes, is a claim that any observable feature in biodiversity is observable because it's causally produced differential fitness with alternative features. It is not what we see in reality.

I suggest you pause discussion of evolution and return to discussion after reading any summary of modern evolutionary theory? I recommend "The Logic of Chance: Nature and Origin of Biological Evolution".

1lumpenspace
Darwinism is, quite simply, the theory that evolution proceeds through the mechanisms of variation and selection. I read Mary Douglas too, btw, but your “any observable feature” is clearly not a necessity not even for the staunchest Dalton/Dawkins fan, and I am frankly puzzled by the fact that such obvious tendentious read could be upvoted so much. I have of course read Koonin—not the worse among those still trying to salvage Lewontin, but not really relevant to the above either. No one is arguing that all phenotypes currently extant confer specific evolutionary advantages.

I think it's because these people don't consider their Twitter content fitting for LW? LW has strict requirements for signal-to-noise ratio and a lot of Twitter stuff is just shitposting.

3β-redex
Not even fitting for Quick Takes? We could have a "Quicker Quick Takes" or "LW Shitposts" section for all I care. (More seriously if you really wanted some separate category for this you could name it something like "LW Microblog".) Also a lot of Eliezer's tweets are quite high effort, to the point where some of them get cross-posted as top level posts. (E.g. https://www.lesswrong.com/posts/fPvssZk3AoDzXwfwJ/universal-basic-income-and-poverty )

The other mechanism is very simple: random drift. Majority of complexity happened as accumulated neutral complexity which accumulated because of slack in the system. Then, sometimes, this complexity gets rearranged by brief adaptationist pediods.

intelligence, at parity of other traits, clearly outcompetes less intelligence.

"At parity of other traits" makes this statement near-useless: there are never parity of all traits except one. Intelligence clearly leads to more energy consumption, if fitness loss from more need in energy is greater than fitness gains from intelligence, you are pwned. 

-5lumpenspace

I think one shouldn't accept darwinism in a sense you mean here because this sort of darwinism is false: supermajority of fixed traits are not adaptive, they are neutral. 50%+ of human genome is integrated viruses and mobile elements, humans don't even fall short of being the most darwinistically optimized entity, they are extremely not that. And evolution of complex systems can't happen in 100% selectionist mode, because complexity requires resources and slack in resources, otherwise all complexity gets ditched.

From real perspective on evolution, the resu... (read more)

1lumpenspace
I'm not sure how this relates to my point. Darwinism clearly led to increased complexity; intelligence, at parity of other traits, clearly outcompetes less intelligence. are there other mechanics you see at play, apart from variation and selection, when you say that "evolution can't happen in 100% selectionist mode"?

For your information, Ukraine seems to have attacked airfields in Murmansk and Irkutsk Oblast's. It's approximately 1800 and 4500 km from Ukraine border respectively. Suspected method of attack is drones, transported on truck.

The problem here is that sequence embeddings should have tons of side-channels which should convey non-semantic information (like, say, frequencies of tokens in sequence) and you can come a long way with this sort of information.

What would be really interesting is to train embedding models in different languages and check whether you can translate highly metaphorical sentences with no correspondence other than semantic, or train embedding models on different representations of the same math (for example, matrix mechanics vs wave mechanics formulations of quantum mechanics) and see if they recognize equivalent theorems.

Who's gonna do that? It's not like we have enough young people for rapid cultural evolution.

I think you would appreciate this post

Yeah, they both made up some stuff in response to the same question.

I'm so far not impressed with Claude 4s. They are trying to make up superficially plausible stuff for my math questions as fast as possible. Sonnet 3.7, at least, explored a lot of genuinely interesting venues before making an error. "Making up superficially plausible stuff" sounds like a good strategy for hacking not very robust verifiers.

31a3orn
These seem to be even more optimized for the agentic coder role, and in the absence of strong domain transfer (whether or not that's a real thing) that means you should mostly expect them to be at about the same level in other domains, or even worse because of the forgetfulness from continued training. Maybe.
1amitlevy49
same experience for a physics question on my end
1fencebuilder
Did you try both opus and sonnet 4?

I dunno how much it's obvious for people who want to try for bounty, but I only now realized that you can express criteria for redund as inequality with mutual information and I find mutual information to be much nicer to work with, even if from pure convenience of notation. Proof: 

Let's take criterion for redund  w.r.t. of 

expand expression for KL divergence:

expand joint distribution:

... (read more)

4johnswentworth
Yup, and you can do that with all the diagrams in this particular post. In general, the DKL error for any of the diagrams X→Y→Z, X←Y←Z, or X←Y→Z is I(X;Z|Y) for any random variables X,Y,Z. Notably, this includes as a special case the "approximately deterministic function" diagram X←Y→X; the error on that one is I(X;X|Y) which is H(X|Y).

how should you pick which reference class to use

You shouldn't. This epistemic bath has no baby in it and we should throw water out of it.

It's really sad that we still don't have bookmarks for comments

No, the point is that AI x-risk is commonsensical. "If you drink much from a bottle marked poison it is certain to disagree with you sooner or later" even if you don't know mechanism of action of poison. We don't expect Newtonian mechanics to prove that hitting yourself with a brick is quite safe, if we'd found that Newtonian mechanics predicts hitting yourself with a brick to be safe, it would be a huge evidence for Newtonian mechanics to be wrong. Good theories usually support common intuitions.

The other thing here is an isolated demand for rigor: there is no "technical understanding of today’s deep learning systems" which predicts, say, success of AGI labs or that their final products are going to be safe.

1Matthew Barnett
If we accept your interpretation—that AI doom is simply the commonsense view—then doesn’t that actually reinforce my point? It suggests that the central concern driving AI doomerism isn't a set of specific technical arguments grounded in the details of deep learning. Instead, it's based on broader and more fundamental intuitions about the nature of artificial life and its potential risks. To borrow your analogy: the belief that a brick falling on someone’s head would cause them harm isn’t ultimately rooted in technical disputes within Newtonian mechanics. It’s based on ordinary, everyday experience. Likewise, our conversations about AI doom should focus on the intuitive, commonsense cruxes behind it, rather than pretending that the real disagreement comes from highly specific technical deep learning arguments. Instead of undermining my comment, I think your point actually strengthens it.

"For a while" is usually, like, day for me. Sometimes even hours. I don't think that whatever damage other addictions inflict on cognitive function is that much easy to reverse.

It doesn't explain why I fully regain concentration ability after abstaining for a while?

2cubefox
That's true. Another theory is that our tolerance for "small pieces of highly engaging information" increases the more we consume, so we need a higher dosage, and if we abstain for a while, the tolerance goes down again (the sensitivity increases), and we no longer need as much. Similar to how you "need" less sugar for food to feel appropriately sweet, if you abstained a while from sugar.

In my personal experience, the main reason why social media causes cognitive decline is fatigue. Evidence from personal experience: like many social media addicts, I struggle with maintaining concentration on books. If I stop using social media for a while, I regain the full ability to concentrate without drawbacks—in a sense, "I suddenly become capable of reading 1,000 pages of a book in two days, which I had been trying to start for two months."

The reason why social media is addictive to me, I think, is the following process:

  1. Social media is entertaining;
... (read more)
2cubefox
Alternative theory: Social media (and the Internet in general) consists of countless small pieces of highly engaging information, which hardly require concentration. This means it is both addictive and underutilizes and therefore weakens our ability to concentrate on longer text. The addictiveness makes it hard to stop quitting Internet-based consumption, and the weakened concentration skill makes it hard to start reading a book.
2Viliam
Similar here. Sometimes I need to switch from my main work to something else, to relax a little. Switching to social media seems driven by the same urge... except that as a result of that, I often do not feel relaxed. Harmless side tasks: doing dishes, exercise. Mostly harmless: watching anime. Harmful: reading social media. What makes the difference? Seems to me that the harmless tasks are mindless, so my mind is free to return to the original topic, and often does it spontaneously. Movies are linear, there is one alternative thing to pay attention to. But when I switch to a medium that had hundreds of comments, it starts hundred new thoughts in my brain, and that makes difficult to stop when I need to return to the main task.

almost all the other agents it expects to encounter are CDT agents

Given this particular setup (you both get source codes of each other and make decision simultaneously without any means to verify choices of counterparty until outcomes happen), you shouldn't self-modify into extortionist, because CDT agents always defect, because no amount of reasoning about source code can causally affect your decision and D-D is Nash equilibrium. CDT agents can expect with high probability to meet extortionist in the future and self-modify into weird Son-of-CDT agent, whi... (read more)

Let's suppose that you look at the code of your counterparty which says "I'll Nuke you unless you Cooperate in which case I Defect", call it "extortionist". You have two hypotheses here:

  1. Your counterparty deliberately modified its decision-making procedures in hope to extort more utility;
  2. This decision-making procedure is a genuine result of some weird evolutionary/learning process.

If you can't actually get any evidence in favor of each hypothesis, you go with your prior and do whatever is the best from the standpoint of UDT/FDT/LDT counterfactual operationa... (read more)

6Oskar Mathiasen
Minor note. Your choice of utilities makes a 50/50 mixture of Cooperate:Defect and Defect:Cooperate better than the Cooperate:Cooperate outcome. So Cooperate:Cooperate isnt on the pareto frontier.  
2CronoDAS
Does any process in which they ended up the way they did without considering your decision procedure count as #2? Like, suppose almost all the other agents it expects to encounter are CDT agents that do give in to extortion, and it thinks the risk of nuclear war with the occasional rock or UDT agent is worth it.

I think the difference between real bureaucracies and HCH is that in real functioning bureaucracies should be elements capable to say "screw this arbitrary problem factorization, I'm doing what's useful" and bosses of bureaucracy should be able to say "we all understand that otherwise system wouldn't be able to work".

There is a conceptual path for interpretability to lead to reliability: you can understand model in sufficient details to know how it produces intelligence and then make another model out of interpreted details. Obviously, it's not something that we can expect to happen anytime soon, but it's something that army of interpretability geniuses in datacenter could do.

I think current Russia-Ukraine war is a perfect place to implement such system. It's an attrition war, there is not many goals which are not reduced to "kill and destroy as many as you can". There is a strategic aspect: Russia pays exorbitant compensations to families of killed soldiers, decades of income for poor regions. So, when Russian soldier dies, two things can happen:

  1. Russian government dumps unreasonable amount of money on the market, contributing to inflation;
  2. Russian government fails to pay (for 1000 and 1 stupid bureaucratic reasons), which erode
... (read more)
3Cole Wyeth
I agree, also I think the concerns of OP don’t seem well justified. This seems like a straightforwardly good system, modulo some obvious but probably avoidable goodharting risks 

By "artificial employee" I mean "something than can fully replace human employee, including their agentic capabilities". And, of course, it should be much more useful than generic AI chatbot, it should be useful like owning Walmart (1,200,000 employees) is useful.

2tailcalled
Ok, so then since one can't make artificial general agents, it's not so confusing that an AI-assisted human can't solve the task. I guess it's true though that my description needs to be amended to rule out things constrained by possibility, budget, or alignment.

"Set up a corporation with a million of artificial employees" is pretty legible, but human amount of agency is catastrophically insufficient for it.

2tailcalled
This statement is pretty ambiguous. "Artificial employee" makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn't set up the chatbot in this way. But only slightly. Though one could argue that since AI chatbots lack agency, they don't count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.

The emphasis here is not on properties of model behavior but on how developers relate to model testing/understanding.

2faul_sname
So would you say that the hypothetical incident happened because our org had a poor alignment posture with regards to the software we were shipping?

Recent update from OpenAI about 4o sycophancy surely looks like Standard Misalignment Scenario #325:

Our early assessment is that each of these changes, which had looked beneficial individually, may have played a part in tipping the scales on sycophancy when combined. 

<...>

One of the key problems with this launch was that our offline evaluations—especially those testing behavior—generally looked good. Similarly, the A/B tests seemed to indicate that the small number of users who tried the model liked it.

<...>

some expert testers had indicate

... (read more)
3Nullity
I don’t understand how this is an example of misalignment—are you suggesting that the model tried to be sycophantic only in deployment?
2faul_sname
Is every undesired behavior an AI system exhibits "misalignment", regardless of the cause? Concretely, let's consider the following hypothetical incident report. Hypothetical Incident Report: Interacting bugs and features in navigation app lead to 14 mile traffic jam  * Background We offer a GPS navigation app that provides real-time traffic updates and routing information based on user-contributed data. We recently released updates which made four significant changes: 1. Tweak routing algorithm to have a slightly stronger preference for routes with fewer turns 2. Update our traffic model to include collisions reported on social media and in the app 3. More aggressively route users away from places we predict there will be congestion based on our traffic model 4. Reduced the number of alternative routes shown to users to reduce clutter and cognitive load Our internal evaluations based on historical and simulated traffic data looked good, and A/B tests with our users indicated that most users liked these changes individually. A few users complained about the routes we suggested, but that happens on every update. We had monitoring metrics for the total number of vehicles diverted by a single collision, and checks to ensure that the road capacity of the road we were diverting users onto was sufficient to accommodate that many extra vehicles. However, we had no specific metrics monitoring the total expected extra traffic flow from all diversions combined. Incident On January 14, there was an icy section of road leading away from a major ski resort. There were 7 separate collisions within a 30 minute period on that section of road. Users were pushed to alternate routes to avoid these collisions. Over a 2 hour period, 5,000 vehicles were diverted onto a weather-affected county road with limited winter maintenance, leading to a 14 mile traffic jam and many subsequent breakdowns on that road, stranding h

Not really, classical universe can be spatially infinite and contain infinite number of your copies.

1atharva
Ooh that makes sense – thank you!

The other aspect is age. Russia in 1917 had half of population younger than 18, while modern Russia has median age of 39, with all implications about political stability and conservatism.

5Arjun Panickssery
Yeah I was reading the other day about the Treaty of Versailles and surrounding periods and saw a quote from a German minister about how an overseas empire would be good merely as an outlet for young men to find something productive to do. A totally different social and political environment when you have a young and growing population versus an aging and shrinking population. And Russia is still relatively young: Italy, Germany, Greece, Portugal, and Austria all have median ages of 45 or higher.

I'm not commenting anything about positive suggestions in this post, but it has some really strange understanding of what it argues with.

an 'Axiom of Rational Convergence'. This is the powerful idea that under sufficiently ideal epistemic conditions – ample time, information, reasoning ability, freedom from bias or coercion – rational agents will ultimately converge on a single, correct set of beliefs, values, or plans, effectively identifying 'the truth'.

There is no reason to "agree on values" (in a sense of actively seeking agreement instead of observing... (read more)

3Joel Z. Leibo
  Right, so this is the part we reject. In the long theory of appropriateness paper, instead of having a model with exogenous preferences (that's the term we use for the assumption that values are free parameters in rational behavior), we say that it's better to have a theory where preferences are endogenous so they can change as a function of the social mechanisms being modeled. So, in our theory, personal values are caused by social conventions, norms, and institutions. Combine this with the contrast between thick and thin morality, which we did mention in the post. You get the conclusion that it's very difficult for individuals to tell which of their personal values are part of 'thin' morality that applies cross culturally versus which are just part of their own culture's 'thick' morality. Another way of saying this is, we're surrounded by moral rules and morally-laden preferences and it's very difficult to tell from the inside of any given culture which of those rules are important versus which of them are silly. From the inside perspective they look exactly the same. Transgressions are punished in the same way, etc. Since we the AI safety community are ourselves inside a particular culture, when we talk about CEV as being "about acting where human values are coherent", we still mean that with an implicit "as measured and understood by us, here and now".  But, from the perspective of someone in a different culture, that makes it indistinguishable from "imposing coherence out of nowhere".  You could reply that I'm talking about practical operationalizations of CEV, not the abstract concept of CEV itself. And OK, sure, fair enough. But the abstract concept doesn't do anything on its own. You always have to operationalize it in some practical way. And all practical operationalizations will have this problem.
Load More