I think the starting point of this kind of discourse should be different. We should start with "ends", not with "means".
As Michael Nielsen says in https://x.com/michael_nielsen/status/1772821788852146226
As far as I can see, alignment isn't a property of an AI system. It's a property of the entire world, and if you are trying to discuss it as a [single AI] system property you will inevitably end up making bad mistakes
So the starting point should really be: what kind of properties do we want the world to have?
And then the next point should be taking into...
The standard reference for this topic is https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness
The key point of that post is that people are fundamentally divided into 2 camps, and this creates difficulties in conversations about this topic. This is an important meta-consideration for this type of conversation.
This particular post is written by someone from Camp 1, and both camps are already present in the comments.
23andme link points to https://waymo.com/blog/2025/03/next-stop-for-waymo-one-washingtondc instead
It should be a different word to avoid confusion with reward models (standard terminology for models used to predict the reward in some ML contexts)
One assumption that is very questionable is that it would be difficult to create “multiple new people” with drastically different thinking styles and different approaches to research.
This seems to be an important crux.
collateral damage
then it would be better to use an example not directly aimed against “our atoms” (e.g. if they don’t care about us and other animals we’ll probably perish from unintentional changes in air composition, or smth like that)
but the bulk of the risk would be a miscalculation which would be big enough to kill them as well (mucking with quantum gravity too recklessly, or smth in that spirit)
which is why we want to 1) give birth to AIs competent enough to at least solve their own existential risk problem, and 2) to also sustainably include us i...
this isn't an "attack", it's "go[ing] straight for execution on its primary instrumental goal
yes, the OP is ambiguous in this sense
I've first wrote my comment, then reread the (tail end of the) post again, and did not post it, because I thought it could have been formulated this way, that this is just an instrumental goal
then I've reread the (tail end of the) post one more time, and decided that no, the post does actually make it a "power play", that's how it is actually written, in terms of "us vs them", not in terms of ASI's own goals, and then I post...
Two main objections to (the tail end of) this story are:
On one hand, it's not clear if a system needs to be all that super-smart to design a devastating attack of this kind (we are already at risk of fairly devastating tech-assisted attacks in that general spirit (mostly with synthetic biological viruses at the moment), and those risks are growing regardless of the AGI/superintelligence angle; ordinary tech progress is quite sufficient in this sense)
If one has a rapidly self-improving strongly super-intelligent distributed system, it's unlikely that
Ah, it's mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of f(g(h... (x)))
, so it de-emphasizes the fact that each of these Transformer Block transformations is shaped like x=x+function(x)
)
yeah... not trying for a complete analysis here, but one thing which is missing is the all-important residual stream. It has been rather downplayed in the original "Attention is all you need" paper, and has been greatly emphasized in https://transformer-circuits.pub/2021/framework/index.html
but I have to admit that I've only started to feel that I more-or-less understand principal aspects of Transformer architecture after I've spent some quality time with the pedagogical implementation of GPT-2 by Andrej Karpathy, https://github.com/karpathy/minGPT, speci...
Mmm... if we are not talking about full automation, but about being helpful, the ability to do 1-hour software engineering tasks ("train classifier") is already useful.
Moreover, we had seen a recent flood of rather inexpensive fine-tunings of reasoning models for a particular benchmark.
Perhaps, what one can do is to perform a (somewhat more expensive, but still not too difficult) fine-tuning to create a model to help with a particular relatively narrow class of meaningful problems (which would be more general than tuning for particular benchmarks, but stil...
which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?
"the road to superintelligence goes not via human equivalence, but around it"
so, yes, it's reasonable to expect to have wildly superintelligent AI systems (e.g. clearly superintelligent AI researchers and software engineers) before all important AI deficits compared to human abilities are patched
Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards
does this effectively imply that the notion of alignment in this context needs to be non-anthropocentric and not formulated in terms of human values?
(I mean, the whole approach assumes that "alien Space-Faring Civilizations" would do fine (more or less), and it's important not to create something hostile to them.)
Thanks!
So, the claim here is that this is a better "artificial AI scientist" compared to what we've seen so far.
There is a tech report https://github.com/IntologyAI/Zochi/blob/main/Zochi_Technical_Report.pdf, but the "AI scientist" itself is not open source, and the tech report does not disclose much (besides confirming that this is a multi-agent thing).
This might end up being a new milestone (but it's too early to conclude that; the comparison is not quite "apple-to-apple", there is human feedback in the process of its work, and humans make edits to the f...
Thanks for writing this.
We estimate that before hitting limits, the software feedback loop could increase effective compute by ~13 orders of magnitude (“OOMs”)
This is one place where I am not quite sure we have the right language. On one hand, the overall methodology pushes us towards talking in terms of "orders of magnitude of improvement", a factor of improvement which might be very large, but it is a large constant.
On the other hand, algorithmic improvements are often improvements in algorithmic complexity (e.g. something is no longer exponential, o...
They should actually reference Yudkowsky.
I don't see them referencing Yudkowsky, even though their paper https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf lists over 70 references, but I don't see them mentioning Yudkowsky (someone should tell Schmidhuber ;-)).
This branch of the official science is younger than 10 years (and started as a fairly non-orthodox one, it's only recently that this has started to feel like the official one; certainly no earlier than formation of Anthropic, and probably quite a bit later than that).
And this part is what Robin Hanson predicted about a decade ago. If I remember it correctly, he wrote that AI Safety was a low-status thing, therefore everyone associated with it was low-status. And if AI Safety ever becomes a high-status thing, then the people in the field will not want to be associated with their low-status predecessors. So instead of referencing them, an alternative history will be established, where someone high-status will be credited for creating the field from scratch (maybe using some inspiration from high-status people in adjacent fields).
This is probably correct, but also this is a report about the previous administration.
Normally, there is a lot of continuity in institutional knowledge between administrations, but this current transition is an exception, as the new admin has decided to deliberately break continuity as much as it can (this is very unusual).
And with the new admin, it's really difficult to say what they think. Vance publicly expresses an opinion worthy of Zuck, only more radical (gas pedal to the floor, forget about brakes). He is someone who believes at the same time that 1...
except easier, because it requires no internal source of discipline
Actually, a number of things reducing the requirements for having an internal source of discipline do make things easier.
For example, deliberately maintaining a particular breath pattern (e.g. the so-called "consciously connected breath"/"circular breath", that is breathing without pauses between inhalations and exhalations, ideally with equal length for an inhale and an exhale) makes maintaining one's focus on the breath much easier.
It's a very natural AI application, but why would this be called "alignment", and how is this related to the usual meanings of "AI alignment"?
To a smaller extent, we already have this problem among humans: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness. This stratification into "two camps" is rather spectacular.
But a realistic pathway towards eventually solving the "hard problem of consciousness" is likely to include tight coupling between biological and electronic entities resulting in some kind of "hybrid consciousness" which would be more amenable to empirical study.
Usually one assumes that this kind of research would be initiated by humans tryi...
Artificial Static Place Intelligence
This would be a better title (this points to the actual proposal here)
a future garbage-collected language in the vein of Swift, Scala, C#, or Java, but better
Have you looked at Julia?
Julia does establish a very strong baseline, if one is OK with an "intermediate discipline between dynamic typing and static typing"[1].
(Julia is also a counter-example to some of your thoughts in the sense that they have managed to grow a strong startup around an open-source programming language and a vibrant community. But the starting point was indeed an academic collaboration; only when they had started to experience success they started ...
Did they have one? Or is it the first time they are filling this position?
I'd say that the ability to produce more energy overall than what is being spend on the whole cycle would count as a "GPT-3 moment". No price constraints, so it does not need to reach the level of "economically feasible", but it should stop being "net negative" energy-wise (when one honestly counts all energy inputs needed to make it work).
I, of course, don't know how to translate Q into this. GPT-4o tells me that it thinks that Q=10 is what is approximately needed for that (for "Engineering Break-even (reactor-level energy balance)"), at least for some of...
In the AI community, the transition from the prevailing spirit of cooperation to a very competitive situation happened around the GPT-3 revolution. GPT-3 brought unexpected progress in the few-shot learning and in program synthesis, and that was the moment when it became clear to many people that AI was working, that its goals were technologically achievable, and many players in the industry started to estimate time horizons as being rather short.
Fusion has not reached its GPT-3 moment yet; that's one key difference. Helion has signed a contract selling so...
Note that OpenAI has reported an outdated baseline for the GAIA benchmark.
A few days before Deep Research presentation, a new GAIA benchmark SOTA has been established (the validation tab of https://huggingface.co/spaces/gaia-benchmark/leaderboard).
The actual SOTA (Jan 29, 2025, Trase Agent v0.3) is 70.3 average, 83.02 Level 1, 69.77 Level 2, 46.15 Level 3.
In the relatively easiest Tier 1 category, this SOTA is clearly better than the numbers reported even for Deep Research (pass@64), and this SOTA is generally slightly better than Deep Research (pass@1) except for Level 3.
Yes, the technique of formal proofs, in effect, involves translation of high-level proofs into arithmetic.
So self-reference is fully present (that's why we have Gödel's results and other similar results).
What this implies, in particular, is that one can reduce a "real proof" to the arithmetic; this would be ugly, and one should not do it in one's informal mathematical practice; but your post is not talking about pragmatics, you are referencing "fundamental limit of self-reference".
And, certainly, there are some interesting fundamental limits of self-refere...
When a solution is formalized inside a theorem prover, it is reduced to the level of arithmetic (a theorem prover is an arithmetic-level machine).
So a theory might be a very high-brow math, but a formal derivation is still arithmetic (if one just focuses on the syntax and the formal rules, and not on the presumed semantics).
The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.
Re DeepSeek cost-efficiency, we are seeing more claims pointing in that direction.
In a similarly unverified claim, the founder of 01.ai (who is sufficiently known in the US according to https://en...
However, I don't view safe tiling as the primary obstacle to alignment. Constructing even a modestly superhuman agent which is aligned to human values would put us in a drastically stronger position and currently seems out of reach. If necessary, we might like that agent to recursively self-improve safely, but that is an additional and distinct obstacle. It is not clear that we need to deal with recursive self-improvement below human level.
I am not sure that treating recursive self-improvement via tiling frameworks is necessarily a good idea, but settin...
I think this is a misleading clickbait title. It references a popular article with the same misleading clickbait title, and the only thing that popular article references is a youtube video with the misleading clickbait title, "Chinese Researchers Just CRACKED OpenAI's AGI Secrets!"
However, the description of that youtube video does reference the paper in question and a twitter thread describing this paper:
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective, https://arxiv.org/abs/2412.14135
Right. We should probably introduce a new name, something like narrow AGI, to denote a system which is AGI-level in coding and math.
This kind of system will be "AGI" as redefined by Tom Davidson in https://www.lesswrong.com/posts/Nsmabb9fhpLuLdtLE/takeoff-speeds-presentation-at-anthropic:
“AGI” (=AI that could fully automate AI R&D)
This is what matters for AI R&D speed and for almost all recursive self-improvement.
Zvi is not quite correct when he is saying
If o3 was as good on most tasks as it is at coding or math, then it would be AGI.
o3 is ...
Indeed
METR releases a report, Evaluating frontier AI R&D capabilities of language model agents against human experts: https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/
Daniel Kokotajlo and Eli Lifland both feel that one should update towards shorter timelines remaining until the start of rapid acceleration via AIs doing AI research based on this report:
Somewhat pedantic correction: they don’t say “one should update”. They say they update (plus some caveats).
the meetup page says 7:30pm, but actually the building asks people to leave by 9pm
Gwern was on Dwarkesh yesterday: https://www.dwarkeshpatel.com/p/gwern-branwen
We recorded this conversation in person. In order to protect Gwern’s anonymity, we created this avatar. This isn’t his voice. This isn’t his face. But these are his words.
Thanks, that's very useful.
If one decides to use galantamine, is it known if one should take it right before bedtime, or anytime during the preceding day, or in some other fashion?
I think it's a good idea to include links to the originals:
https://arxiv.org/abs/2408.08152 - "DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search"
Scott Alexander wrote a very interesting post covering the details of the political fight around SB 1047 a few days ago: https://www.astralcodexten.com/p/sb-1047-our-side-of-the-story
I've learned a lot of things new to me reading it (which is remarkable given how much material related to SB 1047 I have seen before)
the potential of focusing on chemotherapy treatment timing
More concretely (this is someone's else old idea), what I think is still not done is the following. Chemo kills dividing cells, this is why the rapidly renewing tissues and cell populations are particularly vulnerable.
If one wants to spare one of those cell types (say, a particular population of immune cells), one should take the typical period of its renewal, and use that as a period of chemo sessions (time between chemo sessions, a "resonance" of sorts between that and the period of the cell p...
This depends on many things (one's skills, one's circumstances, one's preferences and inclinations (the efficiency of one's contributions greatly depends on one's preferences and inclinations)).
I have stage 4 cancer, so statistically, my time may be more limited than most. I’m a PhD student in Computer Science with a strong background in math (Masters).
In your case, there are several strong arguments for you to focus on research efforts which can improve your chances of curing it (or, at least, of being able to maintain the situation for a long time), ...
Thanks for the references.
Yes, the first two of those do mention co-occurring anxiety in the title.
The third study suggests a possibility that it might just work as an effective anti-depressant as well. (I hope there will be further studies like that; yes, this might be a sufficient reason to try it for depression, even if one does not have anxiety. It might work, but it's clearly not a common knowledge yet.)
Your consideration seems to assume that the AI is an individual, not a phenomenon of "distributed intelligence":
The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.
etc. That is, indeed, the only case we are at least starting to understand well (unfortunately, our understanding of situations where AIs are not individuals seems to be extremely rudimentary).
If the AI is an individual, then one can consider a case of a "singleton" or a "multipolar case".
In some sense, for a self-improving ec...
One might consider that some people have strong preferences for the outcome of an election and some people have weak preferences, but that there is usually no way to express the strength of one's preferences during a vote, and the probability that one would actually go ahead and vote in a race does correlate with the strength of one's preferences.
So, perhaps, this is indeed working as intended. People who have stronger preferences are more likely to vote, and so their preferences are more likely to be taken into account in a statistical sense.
It seems that the strength of one's preferences is (automatically, but imperfectly) taken into account via this statistical mechanism.
Thanks for the great post!
Also it’s California, so there’s some chance this happens, seriously please don’t do it, nothing is so bad that you have to resort to a ballot proposition, choose life
Why are you saying this? In what sense "nothing is so bad"?
The reason why people who have libertarian sensibilities, distrust for government track record in general and specifically for its track record in tech regulation are making exception in this case is the future AI strong potential for catastrophic and existential risks.
So, why people who generally dislike...
Silexan
For anxiety treatment only, if I understand it correctly.
There is no claim that it works as an antidepressant, as far as I know.
No, not microscopic.
Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.
Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).
It's just... having a proof is supposed to boost our confidence that the conclusion is correct...
if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what's the point of that "proof"?
how does having this kind of "proof" increase our confidence in what seems informally correct for a single branch reality (and rather uncer...
...Roon: Unfortunately, I don’t think building nice AI products today or making them widely available matters very much. Minor improvements in DAU or usability especially doesn’t matter. Close to 100% of the fruits of AI are in the future, from self-improving superintelligence [ASI].
Every model until then is a minor demo/pitch deck to hopefully help raise capital for ever larger datacenters. People need to look at the accelerating arc of recent progress and remember that core algorithmic and step-change progress towards self-improvement is what matters.
One a
Yeah, if one considers not "AGI" per se, but a self-modifying AI or, more likely, a self-modifying ecosystem consisting of a changing population of AIs, it is likely to be feasible to maintain only those properties invariant through the expected drastic self-modifications which AIs would be interested in for their own intrinsic reasons.
It is unlikely that any properties can be "forcefully imposed from the outside" and kept invariant for a long time during drastic self-modification.
So one needs to find properties which AIs would be intrinsically interested ... (read more)