LESSWRONG
LW

All of Joel Burget's Comments + Replies

“Sharp Left Turn” discourse: An opinionated review

Thanks for your patience: I do think this message makes your point clearly. However, I'm sorry to say, I still don't think I was missing the point. I reviewed §1.5, still believe I understand the open-ended autonomous learning distribution shift, and also find it scary. I also reviewed §3.7, and found it to basically match my model, especially this bit:

Or, of course, it might be more gradual than literally a single run with a better setup. Hard to say for sure. My money would be on “more gradual than literally a single run”, but my cynical expectation is t

Joel Burget2moΩ261

For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?

Hm, I wouldn't have phrased it that way. Point (2) says nothing about the probability of there being a "left turn", just the speed at which it would happen. When I hear "sharp left turn", I picture something getting out of control overnight, so it's useful to contextualize how much compute you have to put in to get performance out, since this suggests that (... (read more)

7Steven Byrnes2mo

Thanks! I still feel like you’re missing my point, let me try again, thanks for being my guinea pig as I try to get to the bottom of it. :) In terms of the “genome = ML code” analogy (§3.1), humans today have the same compute as humans 100,000 years ago. But humans today have dramatically more capabilities—we have invented the scientific method and math and biology and nuclear weapons and condoms and Fortnite and so on, and we did all that, all by ourselves, autonomously, from scratch. There was no intelligent external non-human entity who was providing humans with bigger brains or new training data or new training setups or new inference setups or anything else. If you look at AI today, it’s very different from that. LLMs today work better than LLMs from six months ago, but only because there was an intelligent external entity, namely humans, who was providing the LLM with more layers, new training data, new training setups, new inference setups, etc. …And if you’re now thinking “ohhh, OK, Steve is just talking about AI doing AI research, like recursive self-improvement, yeah duh, I already mentioned that in my first comment” … then you’re still misunderstanding me! Again, think of the “genome = ML code” analogy (§3.1). In that analogy, * “AIs building better AIs by doing the exact same kinds of stuff that human researchers are doing today to build better AIs” * …would be analogous to… * “Early humans creating more intelligent descendants by doing biotech or selective breeding or experimentally-optimized child-rearing or whatever”. But humans didn’t do that. We still have basically the same brains as our ancestors 100,000 years ago. And yet humans were still able to dramatically autonomously improve their capabilities, compared to 100,000 years ago. We were making stone tools back then, we’re making nuclear weapons now. Thus, autonomous learning is a different axis of AI capabilities improvement. It’s unrelated to scaling, and it’s unrelated to “automa

“Sharp Left Turn” discourse: An opinionated review

Joel Burget2moΩ619-3

I like this post but I think it misses / barely covers two of the most important cases for optimism.

1. Detail of specification

Frontier LLMs have a very good understanding of humans, and seem to model them as well as or even better than other humans. I recall seeing repeated reports of Claude understanding its interlocutor faster than they thought was possible, as if it just "gets" them, e.g. from one Reddit thread I quickly found:

"sometimes, when i’m tired, i type some lousy prompts, full of typos, incomplete info etc, but Claude still gets me, on a deep f

... (read more)

1Kajus2mo

Interesting! I wonder what makes peopel feel like LLMs get them. I for sure don't feel like Claude gets me. If anything, the opposite. EDIT: Deepseek totally gets me tho

7Steven Byrnes2mo

For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right? I’m not really sure how to respond to that … I feel like you’re disagreeing with one of the main arguments of this post without engaging it. Umm, see §1. One key part is §1.5: …And then §3.7: This post is agnostic over whether the sharp left turn will be a big algorithmic advance (akin to switching from MuZero to LLMs, for example), versus a smaller training setup change (akin to o1 using RL in a different way than previous LLMs, for example). [I have opinions, but they’re out-of-scope.] A third option is “just scaling the popular LLM training techniques that are already in widespread use as of this writing”, but I don’t personally see how that option would lead to the (1-3) triad, for reasons in the excerpt above. (This is related to my expectation that LLM training techniques in widespread use as of this writing will not scale to AGI … which should not be a crazy hypothesis, given that LLM training techniques were different as recently as ≈6 months ago!) But even if you disagree, it still doesn’t really matter for this post. I’m focusing on the existence of the sharp left turn and its consequences, not what future programmers will do to precipitate it. ~~ For (1), I did mention that we can hope to do better than Ev (see §5.1.3), but I still feel like you didn’t even understand the major concern that I was trying to bring up in this post. Excerpting again: Again, the big claim of this post is that the sharp left turn has not happened yet. We can and should argue about whether we should feel optimistic or pessimistic about those “wrenching distribution shifts”, but those arguments are as yet untested, i.e. they cannot be resolved by observing today’s pre-sharp-left-turn LLMs. See what I mean?

Six Thoughts on AI Safety

Joel Burget2mo*10

scale up to superintelligence in parallel across many different projects / nations / factions, such that the power is distributed

This has always struck me as worryingly unstable. ETA: Because in this regime you're incentivized to pursue reckless behaviour to outcompete the other AIs, e.g. recursive self-improvement.

Is there a good post out there making a case for why this would work? A few possibilities:

The AIs are all relatively good / aligned. But they could be outcompeted by malevolent AIs. I guess this is what you're getting at with "most of the ASIs a

Joel Burget2mo30

this and this.

Both link to the same PDF.

quila's Shortform

Joel Burget3mo30

I'm fine. Don't worry to much about this. It just made me think, what am I doing here? For someone to single out my question and say "it's dumb to even ask such a thing" (and the community apparently agrees)... I just think I'll be better off not spending time here.

3[anonymous]3mo

I'd guess that most just skimmed what was visible from the hoverover, while under the impression it was what my text said. The engagement on your post itself is probably more representative. Did not mean to do that.

quila's Shortform

Joel Burget3mo80

My question specifically asks about the transition to ASI, which, while I think it's really hard to predict, seems likely to take years, during which time we have intelligences just a bit above human level, before they're truly world-changingly superintelligent. I understand this isn't everyone's model, and it's not necessarily mine, but I think it is plausible.
Asking "how could someone ask such a dumb question?" is a great way to ensure they leave the community. (Maybe you think that's a good thing?)

5[anonymous]3mo

I don't, sorry. (I'd encourage you not to leave just because of this, if it was just this. maybe LW mods can reactivate your account? @Habryka) Yeah looks like I misinterpreted it. I agree that time period will be important. I'll try to be more careful. Fwiw, I wasn't expecting this shortform to get much engagement, but given it did it probably feels like public shaming, if I imagine what it's like.

Economic Post-ASI Transition

Joel Burget3mo30

I should have included this in my list from the start. I basically agree with @Seth Herd that this is a promising direction but I'm concerned about the damage that could occur during takeoff, which could be a years-long period.

2Satron3mo

That's certainly a fair concern. The worst case scenario is where we have AGI that can displace human labour, but which can't solve economics, and a slow takeoff. Here are some of the things that work in our favor in that scenario: * Companies turned out to replace human workers much slower than I expected. This is purely anecdotal, but there are low-level jobs at my work which could be almost fully automated with just the technologies that we have now. But they still haven't been, mostly, I suspect, because of the convenience of relying on humans. * Under slow takeoff, jobs would mostly be replaced in groups, not all at once. For example, ChatGPT put heavy pressure on copywriters. After no longer being able to work as copywriters, some of them relocated to other jobs. So far, the effect was local and with slow takeoff, chances are the trend will continue. * Robotics are advancing much slower and much less dramatically than LLMs. If you are a former copywriter who is jobless, fields that require robotic work should be safe for at least some time. * "We've always managed in the past. Take the industrial revolution for example. People stop doing the work that's been automated and find new, usually better-compensated work to do." This argument is now back to working because we are talking about an AI that for the time being is clearly not better than humans at everything. * Even an AI that can't by itself solve economics, can help economists with their job. By the time it becomes relevant, AI would be better than what we have now. I am especially excited about its use as a quick lookup tool for specific information that's tricky to google. * Slow takeoff means economists and people on LessWrong have more time to think about solving post-ASI economics. We've came a long way since 2022 (when it all arguably blew up). And it has been just 2 years. * Slow takeoff also means that governments have more time to wake up to the potential economical problems that

Joel Burget's Shortform

Joel Burget3mo0-1

Pandora's box is a much better analogy for AI risk, nuclear energy / weapons, fossil fuels, and bioengineering than it was for anything in the ancient world. Nobody believes in Greek mythology these days but if anyone still did they'd surely use it as a reason that you should believe their religion.

2ChristianKl3mo

Egyptians felling all their trees and turning their environment into a desert feels quite similar to fossil fuels.

The Field of AI Alignment: A Postmortem, and What To Do About It

Joel Burget3mo*21

A different way to think about types of work is within current ML paradigms vs outside of them. If you believe that timelines are short (e.g. 5 years or less), it makes much more sense to work within current paradigms, otherwise there's very little chance your work will become adopted in time to matter. Mainstream AI, with all of its momentum, is not going to adopt a new paradigm overnight.

If I understand you correctly, there's a close (but not exact) correspondence between work I'd label in-paradigm and work you'd label as "streetlighting". On my model th... (read more)

Joel Burget3mo10

Hi Boaz, first let me say that I really like Deliberative Alignment. Introducing a system 2 element is great, not only for higher-quality reasoning, but also for producing a legible, auditable chain of though. That said, I have a couple questions I'm hoping you might be able to answer.

I read through the model spec (which DA uses, or at least a closely-related spec). It seems well-suited and fairly comprehensive for answering user questions, but not sufficient for a model acting as an agent (which I expect to see more and more). An agent acting in the real

... (read more)

Joel Burget3mo10

It's hard to compare across domains but isn't the FrontierMath result similarly impressive?

Alignment Faking in Large Language Models

Joel Burget3mo20

Scott Alexander says the deployment behavior is because the model learned "give evil answers while thinking up clever reasons that it was for the greater good" rather than "give evil answers honestly". To what degree do you endorse this interpretation?

8ryan_greenblatt3mo

I think this it probably mostly not what is going on, see my comment here.

3Petropolitan3mo

I think LGS proposed a much simpler explanation in terms of an assistant simulacrum inside a token-predicting shoggoth

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Joel Burget4mo83

Thanks for this! I just doubled my donation because of this answer and @kave's.

FWIW a lot of my understanding that Lighthaven was a burden comes from this section:

I initially read this as $3m for three interest payments. (Maybe change the wording so 2 and 3 don't both mention the interest payment?)

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Joel Burget4mo180

I donated $500. I get a lot of value from the website and think it's important for both the rationalist and AI safety communities. Two related things prevented me from donating more:

Though it's the website which I find important, as I understand it, the majority of this money will go towards supporting Lighthaven.
1. I could easily imagine, if I were currently in Berkeley, finding Lighthaven more important. My guess is that in general folks in Berkeley / the Bay Area will tend to value Lighthaven more highly than folks elsewhere. Whether this is because of Ber

... (read more)

habryka4mo222

Though it's the website which I find important, as I understand it, the majority of this money will go towards supporting Lighthaven.

I think this is backwards! As you can see in the budget I posted here, and also look at the "Economics of Lighthaven" section, Lighthaven itself is actually surprisingly close to financially breaking even. If you ignore our deferred 2024 interest payment, my guess is we will overall either lose or gain some relatively small amount on net (like $100k).

Most of the cost in that budget comes from LessWrong and our other gen... (read more)

kave4mo*110

as I understand it, the majority of this money will go towards supporting Lighthaven

I think if you take Habryka's numbers at face value, a hair under half of the money this year will go to Lighthaven (35% of core staff salaries@1.4M = 0.49M. 1M for a deferred interest payment. And then the claim that otherwise Lighthaven is breaking even). And in future years, well less than half.

I worry that the future of LW will be endangered by the financial burden of Lighthaven

I think this is a reasonable worry, but I again want to note that Habryka is projecting a neu... (read more)

DeepSeek beats o1-preview on math, ties on coding; will release weights

Joel Burget4mo*20

though with an occasional Chinese character once in a while

The Chinese characters sound potentially worrying. Do they make sense in context? I tried a few questions but didn't see any myself.

2Kabir Kumar4mo

I see them in o1-preview all the time as well. Also, french occasionally

4DragonGod4mo

o1's reasoning trace also does this for different languages (IIRC I've seen Chinese and Japanese and other languages I don't recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.

5David Matolcsi4mo

I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)

Rauno Arike4mo151

I saw them in 10-20% of the reasoning chains. I mostly played around with situational awareness-flavored questions, I don't know whether the Chinese characters are more or less frequent in the longer reasoning chains produced for difficult reasoning problems. Here are some examples:

The translation of the Chinese words here (according to GPT) is "admitting to being an AI."

This is the longest string in Chinese that I got. The English translation is "It's like when you see a realistic AI robot that looks very much like a human, but you understand that i... (read more)

the case for CoT unfaithfulness is overstated

Joel Burget6mo10

There are now two alleged instances of full chains of thought leaking (use an appropriate amount of spepticism), both of which seem coherent enough.

the case for CoT unfaithfulness is overstated

Joel Burget6mo*10

I think it's more likely that this is just a (non-model) bug in ChatGPT. In the examples you gave, it looks like there's always one step that comes completely out of nowhere and the rest of the chain of though would make sense without it. This reminds me of the bug where ChatGPT would show other users' conversations.

the case for CoT unfaithfulness is overstated

Joel Burget6mo32

I hesitate to draw any conclusions from the o1 CoT summary since it's passed through a summarizing model.

after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.

2quetzal_rainbow6mo

I agree that it is not as strong evidence as if we had access to original CoT, but I think that having deviations in CoT is more likely than summarizer fumbling that hard.

OpenAI o1

Joel Burget6mo32

o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.

https://x.com/sama/status/1834283103038439566

the Giga Press was a mistake

Joel Burget7mo93

Construction Physics has a very different take on the economics of the Giga-press.

Tesla was the first car manufacturer to adopt large castings, but the savings were so significant — an estimated 20 to 40% reduction in the cost of a car body — that they’re being adopted by many other car manufacturers, particularly Chinese ones. Large, complex castings have been described as a key tool for not only reducing cost but also good EV charging performance.

I think Construction Physics is usually pretty good. In this case my guess is that @bhauth has looked into th... (read more)

7bhauth7mo

Let's see... I think that should be "car frame" - the "body" includes things like doors. Anyway, I'm sure that was estimated by some people, but... Not really? Several major carmakers were considering using the same approach after Tesla did that, but last I heard they'd backed off. That's how big companies tend to work: executives see a competitor or startup doing something, and then they get some people (internal engineers, consultants, etc) to evaluate if they should be doing the same thing. Doesn't mean they actually will. To be clear, I'm not saying aluminum casting (or forging) is useless; there's a reason people make a lot of aluminum. Battery compartments are one of the better places to use it, because the high thermal conductivity is relevant. But that's different than casting large frame pieces or an entire frame. ---------------------------------------- As for very large presses for aluminum, those Heavy Press Program ones are several times bigger than Tesla's, and I think friction stir welding progress making it possible to weld aluminum alloys without making weak points might have been why people didn't keep going bigger - combined with even larger components being hard to transport, of course.

Extended Interview with Zhukeepa on Religion

Joel Burget7mo20

I wonder how much my reply to Adam Shai addresses your concerns?

Very helpful, thank you.

Extended Interview with Zhukeepa on Religion

Joel Burget7mo111

In physics, the objects of study are mass, velocity, energy, etc. It’s natural to quantify them, and as soon as you’ve done that you’ve taken the first step in applying math to physics. There are a couple reasons that this is a productive thing to do:

You already derive benefit from a very simple starting point.
There are strong feedback loops. You can make experimental predictions, test them, and refine your theories.

Together this means that you benefit from even very simple math and can scale up smoothly to more sophisticated. From simply adding masses to ... (read more)

1zhukeepa7mo

I'm not sure exactly what you're asking -- I wonder how much my reply to Adam Shai addresses your concerns? I will also mention this quote from the category theorist Lawvere, whose line of thinking I feel pretty aligned with: I think getting technical precision on philosophical concepts like these will play a crucial role in the kind of math I'm envisioning.

3Ben Pace7mo

It could also help for zhukeepa to give any single instance of such a 'Rosetta Stone' between different ideologies or narratives or (informal) worldviews. I do not currently know what to imagine, other than a series of loose analogies, which can be helpful, but are a bit of a difficult target to point at and I don't expect to find with a mathematical framework.

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Joel Burget8mo10

Re the choice of kernel, my intuition would have been that something smoother (e.g. approximating a Gaussian, or perhaps Epanechnikov) would have given the best results. Did you use rect just because it's very cheap or was there a theoretical reason?

2Senthooran Rajamanoharan8mo

Good question! The short answer is that we tried the simplest thing and it happened to work well. (The additional computational cost of higher order kernels is negligible.) We did look at other kernels but my intuition, borne out by the results (to be included in a v2 shortly, along with some typos and bugfixes to the pseudo-code), was that this would not make much difference. This is because we're using KDE in a very high variance regime already, and yet the SAEs seem to train fine: given a batch size of 4096 and bandwidth of 0.001, hardly any activations (even for the most frequent features) end up having kernels that capture the threshold (i.e. that are included in the gradient estimate). So it's not obvious that improving the bias-variance trade-off slightly by switching to a different kernel is going to make that much difference. In this sense, the way we use KDE here is very different from e.g. using KDE to visualise empirical distributions, and so we need to be careful about how we transfer intuitions between these domains.

Book review: The Quincunx

Joel Burget8mo10

Thanks for this! I ended up reading The Quincunx based on this review and really enjoyed it.

As an aside, I want to recommend a physical book instead of the Kindle version, for a couple reasons:

There are maps and genealogy diagrams interspersed between chapters, but they were difficult to impossible to read on the Kindle.
I discovered, only after finishing the book, that there's a list of characters at the back of the book. This would have been extremely helpful to refer to as I was reading. There are a lot of characters and I can't tell you how many times I

... (read more)

silentbob's Shortform

Joel Burget9mo*20

If, for instance, one minimum’s attractor basin has a radius that is just 0.00000001% larger than that of the other minimum, then its volume will be roughly 40 million times larger (if my Javascript code to calculate this is accurate enough, that is).

Could you share this code? I'd like to take a look.

3silentbob9mo

Maybe I accidentally overpromised here :D this code is just an expression, namely 1.0000000001 ** 175000000000, which, as wolframalpha agrees, yields 3.98e7.

Matthew Barnett's Shortform

Joel Burget9mo20

For others who want the resolution to this cliffhanger, what does Bostrom predict happens next?

The remainder of this section:

We observe here how it could be the case that when dumb, smarter is safer; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire. We may call the phenomenon the treacherous turn.
The treacherous turn — While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong — without wa

... (read more)

0Matthew Barnett9mo

LLMs are clearly not playing nice as part of a strategic decision to build strength while weak in order to strike later! Yet, Bostrom imagines that general AIs would do this, and uses it as part of his argument for why we might be lulled into a false sense of security. This means that current evidence is quite different from what's portrayed in the story. I claim LLMs are (1) general AIs that (2) are doing what we actually want them to do, rather than pretending to be nice because they don't yet have a decisive strategic advantage. These facts are crucial, and make a big difference. I am very familiar with these older arguments. I remember repeating them to people after reading Bostrom's book, years ago. What we are seeing with LLMs is clearly different than the picture presented in these arguments, in a way that critically affects the conclusion.

The Leopold Model: Analysis and Reactions

Joel Burget9mo61

A slight silver lining, I'm not sure if a world in which China "wins" the race is all that bad. I'm genuinely uncertain. Let's take Leopold's objections for example:

I genuinely do not know the intentions of the CCP and their authoritarian allies. But, as a reminder: the CCP is a regime founded on the continued worship of perhaps the greatest totalitarian mass-murderer in human history (“with estimates ranging from 40 to 80 million victims due to starvation, persecution, prison labor, and mass executions”); a regime that recently put a million Uyghurs in co

... (read more)

Thane Ruthenis9mo103

I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing

I'm very much worried that this sort of thinking is a severe case of Typical Mind Fallacy.

I think the main terminal values of the individuals constituting the CCP – and I do mean terminal, not instrumental – are the preservation of their personal status, power, and control, like the values of ~all dictatorships, and most politicians in general. Ideology is mostly just an aesthetics, a tool for internal and external propaganda/rhetoric, and the backdrop for... (read more)

5Andrew Burns9mo

I would argue that leaders like Xi would not immediately choose general human flourishing as the goal. Xi has a giant chip on his shoulder. I suspect (not with any real proof, but just from a general intuition) that he feels western powers humiliated imperial China and that permanently disabling them is the first order of business. That means immediately dissolving western governments and placing them under CCP control. Part of human flourishing is the feeling of agency. Having a foreign government use AI to remove their government is probably not conducive to human flourishing. Instead, it will produce utter despair and hopelessness. Consider what the US did with Native Americans using complete tech superiority. Subjugation and decimation in the name of "improvement" and "reeducation." Their governments were eliminated. They were often forcibly relocated at gunpoint. Schools were created to beat out "savage" habits from children. Their children were seized and rehomed with Whites. Their languages were forcibly suppresed and destroyed. Many killed themselves rather than submit. That is what I'd expect to happen to the West if China gets AGI. Unfortunately, given the rate at which things are moving, I expect the West's slight lead to evaporate. They've already fast copied SORA. The West is unprepared to contend with a fully operational China. The counter measures are half-hearted and too late. I foresee a very bleak future.

The Leopold Model: Analysis and Reactions

Joel Burget9mo58

My biggest problem with Leopold's project is this: in a world where his models hold up, where superintelligence is right around the corner, a US / China race is inevitable, and the winner really matters; in that world, publishing these essays on the open internet is very dangerous. It seems just as likely to help the Chinese side as to help the US.

If China prioritizes AI (if they decide that it's one tenth as important as Leopold suggests), I'd expect their administration to act more quickly and competently than the US. I don't have a good reason to think ... (read more)

6Joel Burget9mo

A slight silver lining, I'm not sure if a world in which China "wins" the race is all that bad. I'm genuinely uncertain. Let's take Leopold's objections for example: I agree that all of these are bad (very bad). But I think they're all means to preserve the CCP's control. With superintelligence, preservation of control is no longer a problem. I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing, that all they do to maintain control is to preserve communism, which exists to make a better life for their citizens. If that's the case, then if both sides are equally capable of building it, does it matter whether the instruction to maximize human flourishing comes from the US or China? (Again, I want to reiterate that I'm genuinely uncertain here.)

My AI Model Delta Compared To Christiano

Joel Burget9mo1413

Sorry, was in a hurry when I wrote this. What I meant / should have said is: it seems really valuable to me to understand how you can refute Paul's views so confidently and I'd love to hear more.

My AI Model Delta Compared To Christiano

Joel Burget10mo1212

I put approximately-zero probability on the possibility that Paul is basically right on this delta; I think he’s completely out to lunch.

Very strong claim which the post doesn't provide nearly enough evidence to support

johnswentworth10mo2623

I mean, yeah, convincing people of the truth that claim was not the point of the post.

Comments on Anthropic's Scaling Monosemanticity

Joel Burget10mo66

I decided to do a check by tallying the "More Safety Relevant Features" from the 1M SAE to see if they reoccur in the 34M SAE (in some related form).

I don't think we can interpret their list of safety-relevant features as exhaustive. I'd bet (80% confidence) that we could find 34M features corresponding to at least some of the 1M features you listed, given access to their UMAP browser. Unfortunately we can't do this without Anthropic support.

2Robert_AIZI10mo

Non-exhaustiveness seems plausible, but then I'm curious how they found these features. They don't seem to be constrained to an index range, and there seem to be nicely matched pairs like this, which I think isn't indicative of random checking: 1M/461441Criticism of left-wing politics / Democrats1M/77390Criticism of right-wing politics / Republicans

quila's Shortform

Joel Burget10mo20

Maybe you can say a bit about what background someone should have to be able to evaluate your idea.

If you are also the worst at politics

Joel Burget10mo40

Not a direct answer to your question but:

One article I (easily) found on prediction markets mentions Bryan Caplan but has no mention of Hanson
There are plenty of startups promoting prediction markets: Manifold, Kalshi, Polymarket, PredictIt, etc
There was a recent article Why prediction markets aren't popular, which gives plenty of good reasons but doesn't mention any Hanson headwind
Scott Alexander does regular "Mantic Monday" posts on prediction markets

If you are also the worst at politics

Joel Burget10mo51

I’m not sure about the premise that people are opposed to Hanson’s ideas because he said them. On the contrary, I’ve seen several people (now including you) mention that they’re fans of his ideas, and never seen anyone say that they dislike them.

My model is more that some ideas are more viral than others, some ideas have loud and enthusiastic champions, and some ideas are economically valuable. I don’t see most of Hanson’s ideas as particularly viral, don’t think he’s worked super hard to champion them, and they’re a mixed bag economically (eg prediction m... (read more)

1lemonhope10mo

Who is the new charismatic leader of prediction markets?

3lemonhope10mo

Hmm I was mainly thinking of the"redistribute sex" phrasing fiasco, slatestarcodex being contra hanson on healthcare, tyler cowen being contra hanson on the self evaluated property tax, and the brutal quote tweets. But maybe these are in fact symptoms of success and I have it partially backwards... Hmm

Joel Burget's Shortform

Joel Burget10mo10

Why does Golden Gate Claude act confused? My guess is that activating the Golden Gate Bridge feature so strongly is OOD. (This feature, by the way, is not exactly aligned with your conception of the Golden Gate Bridge or mine, so it might emphasize fog more or less than you would, but that’s not what I’m focusing on here). Anthropic probably added the bridge feature pretty strongly, so the model ends up in a state with a 10x larger Golden Gate Bridge activation than it’s built for, not to mention in the context of whatever unrelated prompt you’ve... (read more)

peterbarnett's Shortform

Joel Burget10mo142

The Anthropic post itself said more or less the same:

Testing for parallel reasoning in LLMs

Joel Burget10mo20

To me the strongest evidence that fine-tuning is based on LoRA or similar is the fact that pricing is based just on training and input / output and doesn't factor in the cost of storing your fine-tuned models. Llama-3-8b-instruct is ~16GB (I think this ought to be roughly comparable, at least in the same ballpark). You'd almost surely care if you were storing that much data for each fine-tune.

7gwern10mo

Yeah, that's part of why I'm suspicious. I remember the original OA finetuning as being quite expensive, but the current one is not that expensive. If a GPT-3 is like 100GB of weights, say, after optimization, and it's doing true finetuning, how is OA making it so cheap and so low-latency?

So What's Up With PUFAs Chemically?

Joel Burget1y30

Measuring the composition of fryer oil at different times certainly seems like a good way to test both the original hypothesis and the effect of altitude.

So What's Up With PUFAs Chemically?

Joel Burget1y10

You're right, my original wording was too strong. I edited it to say that it agrees with so many diets instead of explains why they work.

So What's Up With PUFAs Chemically?

Joel Burget1y*40

One thing I like about the PUFA breakdown theory is that it agrees with aspects of so many different diets.

Keto avoids fried food because usually the food being fried is carbs
Carnivore avoids vegetable oils because they're not meat
Paleo avoids vegetable oils because they weren't available in the ancestral environment
Vegans tend to emphasize raw food and fried foods often have meat or cheese in them
Low-fat diets avoid fat of all kinds
Ray Peat was perhaps the closest to the mark in emphasizing that saturated fats are more stable (he probably talked about PUF

... (read more)

2Brendan Long1y

Don't forget the standard diet advice of avoiding "processed foods". It's unclear what exactly the boundary is, but I think "oil that has been cooking for weeks" probably counts.

9J Bostock1y

I've also realized that it might explain the anomalous (i.e. after adjusting for confounders) effects of living at higher altitude. The lower the atmospheric pressure, the less oxygen available to oxidize the PUFAs. Of course some foods will be imported already full of oxidized FAs and that will be too late, but presumably a McDonalds deep fryer in Colorado Springs is producing less PUFAs/hour than a correspondingly-hot one in San Francisco. This feels too crazy to put in the original post but it's certainly interesting.

1Slapstick1y

I am confused by this sort of reasoning. As far as I'm aware, mainstream nutritional science/understanding already points towards avoiding refined oils (and refined sugars). There's already explainations for why cutting out refined oil is be beneficial. There are already reasonable explainations for why all of those diets might be reported to work, at least in the short term.

CTMU insight: maybe consciousness *can* affect quantum outcomes?

Joel Burget1y10

If this was true, how could we tell? In other words, is this a testable hypothesis?

What reason do we have to believe this might be true? Because we're in a world where it looks like we're going to develop superintelligence, so it would be a useful world to simulate?

2quiet_NaN1y

This. Physics runs on falsifiable predictions. If 'consciousness can affect quantum outcomes' is any more true than the classic 'there is an invisible dragon in my garage', then discovering that fact would seem easy from an experimentalist standpoint. Sources of quantum randomness (e.g. weak source+detector) are readily available, so any claimant who thinks they can predict or affect their outcomes could probably be tested initially for a few 100$. General remark: It is not clear to me that this would result in a lower Kolmogorov complexity at all. Such an algorithm could of course use a pseudo-random number generator for the vast majority quantum events which do not affect p(ASI) (like the creation of CMB photons), but this is orthogonal to someone nudging the relevant quantum events towards ASI. For these relevant events, I am not sure that the description "just do whatever favors ASI" is actually shorter than just the sequence of events. I mean, if we are simulated by a Turing Machine (which is equivalent to quantum events having a low Kolmogorov complexity), then a TM which just implements the true laws of physics (and cheats with a PNRG, not like the inhabitants would ever notice) is surely simpler than one which tries to optimize towards some distant outcome state. As an analogy, think about the Kolmogorov complexity of a transcript of a very long game of chess. If both opponents are following a simple algorithm of "determine the allowed moves, then use a PRNG to pick one of them", that should have a bound complexity. If both are chess AIs which want to win the game (i.e. optimize towards a certain state) and use a deterministic PRNG (lest we are incompressible), the size of your Turing Machine -- which /is/ the Kolmogorov complexity -- just explodes. Of course, if your goal is to build a universe which invents ASI, do you really need QM at all? Sure, some algorithms run faster in-universe on a QC, but if you cared about efficiency, you would not use

3zhukeepa1y

If we performed a trillion 50/50 quantum coin flips, and found a program with K-complexity far less than a trillion that could explain these outcomes, that would be an example of evidence in favor of this hypothesis. (I don't think it's very likely that we'll be able to find a positive result if we run that particular experiment; I'm naming it more to illustrate the kind of thing that would serve as evidence.) (EDIT: This would only serve as evidence against quantum outcomes being truly random. In order for it to serve as evidence in favor of quantum outcomes being impacted by consciousness, the low K-complexity program explaining these outcomes would need to route through the decisions of conscious beings somehow; it wouldn't work if the program were just printing out digits of pi in binary, for example.) My inside view doesn't currently lead me to put much credence on this picture of reality actually being true. My inside view is more like "huh, I notice I have become way more uncertain about the a priori arguments about what kind of universe we live in -- especially the arguments that we live in a universe in which quantum outcomes are supposed to be 'truly random' -- so I will expand my hypothesis space for what kinds of universes we might be living in".

Joel Burget's Shortform

Joel Burget1y*50

From the latest Conversations with Tyler interview of Peter Thiel

I feel like Thiel misrepresents Bostrom here. He doesn’t really want a centralized world government or think that’s "a set of things that make sense and that are good". He’s forced into world surveillance not because it’s good but because it’s the only alternative he sees to dangerous ASI being deployed.

I wouldn’t say he’s optimistic about human nature. In fact it’s almost the very opposite. He thinks that we’re doomed by our nature to create that which will destroy us.

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders

Joel Burget1y20

Three questions:

What format do you upload SAEs in?
What data do you run the SAEs over to generate the activations / samples?
How long of a delay is there between uploading an SAE and it being available to view?

2Joseph Bloom1y

Thanks for asking: 1. Currently we load SAEs into my codebase here. How hard this is will depend on how different your SAE architecture/forward pass is from what I currently support. We're planning to support users / do this ourselves for the first n users and once we can, we'll automate the process. So feel free to link us to huggingface or a public wandb artifact. 2. We run the SAEs over random samples from the same dataset on which the model was trained (with activations drawn from forward passes of the same length). Callum's SAE vis codebase has a demo where you can see how this works. 3. Since we're doing this manually, the delay will depend on the complexity on handling the SAEs and things like whether they're trained on a new model (not GPT2 small) and how busy we are with other people's SAEs or other features. We'll try our best and keep you in the loop. Ballpark is 1 -2 weeks not months. Possibly days (especially if the SAEs are very similar to those we are hosting already). We expect this to be much faster in the future. We've made the form in part to help us estimate the time / effort required to support SAEs of different kinds (eg: if we get lots of people who all have SAEs for the same model or with the same methodological variation, we can jump on that).

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders

Joel Burget1y21

This is fantastic. Thank you.

Highlights from Lex Fridman’s interview of Yann LeCun

Joel Burget1y10

Thanks! I added a note about LeCun's 100,000 claim and just dropped the Chollet reference since it was misleading.

Highlights from Lex Fridman’s interview of Yann LeCun

Joel Burget1y30

Thanks for the correction! I've updated the post.

Jimrandomh's Shortform

Joel Burget1y10

I assume the 44k PPM CO2 exhaled air is the product of respiration (I.e. the lungs have processed it), whereas the air used in mouth-to-mouth is quickly inhaled and exhaled.

2Gunnar_Zarncke1y

As the respirator still has to breathe regularly, there will be still a significantly higher CO2 in the air for respiration. I'd guess maybe half - 20k PPM. Interesting to see somebody measure that.

Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible

Joel Burget1y30

What's your best guess for what percentage of cells (in the brain) receive edits?

Are edits somehow targeted at brain cells in particular or do they run throughout the body?

4GeneSmith1y

From my conversations with people working on delivery vector, I'd guess you can probably get 20-40% with current AAVs and probably less with current lipid nanoparticles. But that percentage has been increasing over time, and I suspect it will continue to do so. That's why I estimated that brain cells would, on average, receive 50% of the edits we attempt to make; by the time we're ready to do animal trials that is probably roughly where we'll be at. This is addressed in the post, but I understand that many people may not have read the whole thing because it's so long. The set of cells that will be targeted depends on the delivery vector and on how it is customized. You can add custom peptides to both AAVs and lipid nanoparticles which will result in their uptake by a subset of tissues in the body. Most of the ones I have looked at are taken up by several tissues among which is the brain. This is probably fine, but as stated in the appendix there's a chance expression of Cas9 proteins in non-target tissues will trigger the adaptive immune system. If that did turn out to be a big issue, there are potential solutions which I only briefly touched on in the post. One is to just give someone an immunosuppressant for a few days while the editor proteins are floating around in the body. Another is to selectively express the editors in a specific tissue as specified by the mRNA transcribed uniquely in that cell type. The latter would be a general purpose solution to avoiding any edits in any tissues except the target type, but would reduce efficiency. So not something that would be desirable unless it's necessary.

My techno-optimism [By Vitalik Buterin]

Joel Burget1y21

I don't have a well-reasoned opinion here but I'm interested in hearing from those who disagree.

2Wei Dai1y

I didn't downvote/disagree-vote Ben's comment, but it doesn't unite the people who think that accelerating development of certain technologies isn't enough to (sufficiently) prevent doom, that we also need to slow down or pause development of certain other technologies.