For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?
Hm, I wouldn't have phrased it that way. Point (2) says nothing about the probability of there being a "left turn", just the speed at which it would happen. When I hear "sharp left turn", I picture something getting out of control overnight, so it's useful to contextualize how much compute you have to put in to get performance out, since this suggests that (...
I like this post but I think it misses / barely covers two of the most important cases for optimism.
Frontier LLMs have a very good understanding of humans, and seem to model them as well as or even better than other humans. I recall seeing repeated reports of Claude understanding its interlocutor faster than they thought was possible, as if it just "gets" them, e.g. from one Reddit thread I quickly found:
scale up to superintelligence in parallel across many different projects / nations / factions, such that the power is distributed
This has always struck me as worryingly unstable. ETA: Because in this regime you're incentivized to pursue reckless behaviour to outcompete the other AIs, e.g. recursive self-improvement.
Is there a good post out there making a case for why this would work? A few possibilities:
I'm fine. Don't worry to much about this. It just made me think, what am I doing here? For someone to single out my question and say "it's dumb to even ask such a thing" (and the community apparently agrees)... I just think I'll be better off not spending time here.
I should have included this in my list from the start. I basically agree with @Seth Herd that this is a promising direction but I'm concerned about the damage that could occur during takeoff, which could be a years-long period.
Pandora's box is a much better analogy for AI risk, nuclear energy / weapons, fossil fuels, and bioengineering than it was for anything in the ancient world. Nobody believes in Greek mythology these days but if anyone still did they'd surely use it as a reason that you should believe their religion.
A different way to think about types of work is within current ML paradigms vs outside of them. If you believe that timelines are short (e.g. 5 years or less), it makes much more sense to work within current paradigms, otherwise there's very little chance your work will become adopted in time to matter. Mainstream AI, with all of its momentum, is not going to adopt a new paradigm overnight.
If I understand you correctly, there's a close (but not exact) correspondence between work I'd label in-paradigm and work you'd label as "streetlighting". On my model th...
Hi Boaz, first let me say that I really like Deliberative Alignment. Introducing a system 2 element is great, not only for higher-quality reasoning, but also for producing a legible, auditable chain of though. That said, I have a couple questions I'm hoping you might be able to answer.
It's hard to compare across domains but isn't the FrontierMath result similarly impressive?
Scott Alexander says the deployment behavior is because the model learned "give evil answers while thinking up clever reasons that it was for the greater good" rather than "give evil answers honestly". To what degree do you endorse this interpretation?
Thanks for this! I just doubled my donation because of this answer and @kave's.
FWIW a lot of my understanding that Lighthaven was a burden comes from this section:
I initially read this as $3m for three interest payments. (Maybe change the wording so 2 and 3 don't both mention the interest payment?)
I donated $500. I get a lot of value from the website and think it's important for both the rationalist and AI safety communities. Two related things prevented me from donating more:
Though it's the website which I find important, as I understand it, the majority of this money will go towards supporting Lighthaven.
I think this is backwards! As you can see in the budget I posted here, and also look at the "Economics of Lighthaven" section, Lighthaven itself is actually surprisingly close to financially breaking even. If you ignore our deferred 2024 interest payment, my guess is we will overall either lose or gain some relatively small amount on net (like $100k).
Most of the cost in that budget comes from LessWrong and our other gen...
as I understand it, the majority of this money will go towards supporting Lighthaven
I think if you take Habryka's numbers at face value, a hair under half of the money this year will go to Lighthaven (35% of core staff salaries@1.4M = 0.49M. 1M for a deferred interest payment. And then the claim that otherwise Lighthaven is breaking even). And in future years, well less than half.
I worry that the future of LW will be endangered by the financial burden of Lighthaven
I think this is a reasonable worry, but I again want to note that Habryka is projecting a neu...
though with an occasional Chinese character once in a while
The Chinese characters sound potentially worrying. Do they make sense in context? I tried a few questions but didn't see any myself.
I saw them in 10-20% of the reasoning chains. I mostly played around with situational awareness-flavored questions, I don't know whether the Chinese characters are more or less frequent in the longer reasoning chains produced for difficult reasoning problems. Here are some examples:
The translation of the Chinese words here (according to GPT) is "admitting to being an AI."
This is the longest string in Chinese that I got. The English translation is "It's like when you see a realistic AI robot that looks very much like a human, but you understand that i...
There are now two alleged instances of full chains of thought leaking (use an appropriate amount of spepticism), both of which seem coherent enough.
I think it's more likely that this is just a (non-model) bug in ChatGPT. In the examples you gave, it looks like there's always one step that comes completely out of nowhere and the rest of the chain of though would make sense without it. This reminds me of the bug where ChatGPT would show other users' conversations.
I hesitate to draw any conclusions from the o1 CoT summary since it's passed through a summarizing model.
after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.
o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
https://x.com/sama/status/1834283103038439566
Construction Physics has a very different take on the economics of the Giga-press.
Tesla was the first car manufacturer to adopt large castings, but the savings were so significant — an estimated 20 to 40% reduction in the cost of a car body — that they’re being adopted by many other car manufacturers, particularly Chinese ones. Large, complex castings have been described as a key tool for not only reducing cost but also good EV charging performance.
I think Construction Physics is usually pretty good. In this case my guess is that @bhauth has looked into th...
In physics, the objects of study are mass, velocity, energy, etc. It’s natural to quantify them, and as soon as you’ve done that you’ve taken the first step in applying math to physics. There are a couple reasons that this is a productive thing to do:
Together this means that you benefit from even very simple math and can scale up smoothly to more sophisticated. From simply adding masses to ...
Re the choice of kernel, my intuition would have been that something smoother (e.g. approximating a Gaussian, or perhaps Epanechnikov) would have given the best results. Did you use rect
just because it's very cheap or was there a theoretical reason?
Thanks for this! I ended up reading The Quincunx based on this review and really enjoyed it.
As an aside, I want to recommend a physical book instead of the Kindle version, for a couple reasons:
If, for instance, one minimum’s attractor basin has a radius that is just 0.00000001% larger than that of the other minimum, then its volume will be roughly 40 million times larger (if my Javascript code to calculate this is accurate enough, that is).
Could you share this code? I'd like to take a look.
For others who want the resolution to this cliffhanger, what does Bostrom predict happens next?
The remainder of this section:
...We observe here how it could be the case that when dumb, smarter is safer; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire. We may call the phenomenon the treacherous turn.
The treacherous turn — While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong — without wa
A slight silver lining, I'm not sure if a world in which China "wins" the race is all that bad. I'm genuinely uncertain. Let's take Leopold's objections for example:
...I genuinely do not know the intentions of the CCP and their authoritarian allies. But, as a reminder: the CCP is a regime founded on the continued worship of perhaps the greatest totalitarian mass-murderer in human history (“with estimates ranging from 40 to 80 million victims due to starvation, persecution, prison labor, and mass executions”); a regime that recently put a million Uyghurs in co
I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing
I'm very much worried that this sort of thinking is a severe case of Typical Mind Fallacy.
I think the main terminal values of the individuals constituting the CCP – and I do mean terminal, not instrumental – are the preservation of their personal status, power, and control, like the values of ~all dictatorships, and most politicians in general. Ideology is mostly just an aesthetics, a tool for internal and external propaganda/rhetoric, and the backdrop for...
My biggest problem with Leopold's project is this: in a world where his models hold up, where superintelligence is right around the corner, a US / China race is inevitable, and the winner really matters; in that world, publishing these essays on the open internet is very dangerous. It seems just as likely to help the Chinese side as to help the US.
If China prioritizes AI (if they decide that it's one tenth as important as Leopold suggests), I'd expect their administration to act more quickly and competently than the US. I don't have a good reason to think ...
Sorry, was in a hurry when I wrote this. What I meant / should have said is: it seems really valuable to me to understand how you can refute Paul's views so confidently and I'd love to hear more.
I put approximately-zero probability on the possibility that Paul is basically right on this delta; I think he’s completely out to lunch.
Very strong claim which the post doesn't provide nearly enough evidence to support
I mean, yeah, convincing people of the truth that claim was not the point of the post.
I decided to do a check by tallying the "More Safety Relevant Features" from the 1M SAE to see if they reoccur in the 34M SAE (in some related form).
I don't think we can interpret their list of safety-relevant features as exhaustive. I'd bet (80% confidence) that we could find 34M features corresponding to at least some of the 1M features you listed, given access to their UMAP browser. Unfortunately we can't do this without Anthropic support.
Maybe you can say a bit about what background someone should have to be able to evaluate your idea.
Not a direct answer to your question but:
I’m not sure about the premise that people are opposed to Hanson’s ideas because he said them. On the contrary, I’ve seen several people (now including you) mention that they’re fans of his ideas, and never seen anyone say that they dislike them.
My model is more that some ideas are more viral than others, some ideas have loud and enthusiastic champions, and some ideas are economically valuable. I don’t see most of Hanson’s ideas as particularly viral, don’t think he’s worked super hard to champion them, and they’re a mixed bag economically (eg prediction m...
Why does Golden Gate Claude act confused? My guess is that activating the Golden Gate Bridge feature so strongly is OOD. (This feature, by the way, is not exactly aligned with your conception of the Golden Gate Bridge or mine, so it might emphasize fog more or less than you would, but that’s not what I’m focusing on here). Anthropic probably added the bridge feature pretty strongly, so the model ends up in a state with a 10x larger Golden Gate Bridge activation than it’s built for, not to mention in the context of whatever unrelated prompt you’ve...
The Anthropic post itself said more or less the same:
To me the strongest evidence that fine-tuning is based on LoRA or similar is the fact that pricing is based just on training and input / output and doesn't factor in the cost of storing your fine-tuned models. Llama-3-8b-instruct is ~16GB (I think this ought to be roughly comparable, at least in the same ballpark). You'd almost surely care if you were storing that much data for each fine-tune.
Measuring the composition of fryer oil at different times certainly seems like a good way to test both the original hypothesis and the effect of altitude.
You're right, my original wording was too strong. I edited it to say that it agrees with so many diets instead of explains why they work.
One thing I like about the PUFA breakdown theory is that it agrees with aspects of so many different diets.
If this was true, how could we tell? In other words, is this a testable hypothesis?
What reason do we have to believe this might be true? Because we're in a world where it looks like we're going to develop superintelligence, so it would be a useful world to simulate?
From the latest Conversations with Tyler interview of Peter Thiel
I feel like Thiel misrepresents Bostrom here. He doesn’t really want a centralized world government or think that’s "a set of things that make sense and that are good". He’s forced into world surveillance not because it’s good but because it’s the only alternative he sees to dangerous ASI being deployed.
I wouldn’t say he’s optimistic about human nature. In fact it’s almost the very opposite. He thinks that we’re doomed by our nature to create that which will destroy us.
Three questions:
This is fantastic. Thank you.
Thanks! I added a note about LeCun's 100,000 claim and just dropped the Chollet reference since it was misleading.
Thanks for the correction! I've updated the post.
I assume the 44k PPM CO2 exhaled air is the product of respiration (I.e. the lungs have processed it), whereas the air used in mouth-to-mouth is quickly inhaled and exhaled.
What's your best guess for what percentage of cells (in the brain) receive edits?
Are edits somehow targeted at brain cells in particular or do they run throughout the body?
I don't have a well-reasoned opinion here but I'm interested in hearing from those who disagree.
Thanks for your patience: I do think this message makes your point clearly. However, I'm sorry to say, I still don't think I was missing the point. I reviewed §1.5, still believe I understand the open-ended autonomous learning distribution shift, and also find it scary. I also reviewed §3.7, and found it to basically match my model, especially this bit:
... (read more)