mishka - LessWrong

this isn't an "attack", it's "go[ing] straight for execution on its primary instrumental goal

yes, the OP is ambiguous in this sense

I've first wrote my comment, then reread the (tail end of the) post again, and did not post it, because I thought it could have been formulated this way, that this is just an instrumental goal

then I've reread the (tail end of the) post one more time, and decided that no, the post does actually make it a "power play", that's how it is actually written, in terms of "us vs them", not in terms of ASI's own goals, and then I posted this comment

maximally increasing its compute scaling

as we know, compute is not everything, algorithmic improvement is even more important, at least if one judges by the current trends (and likely sources of algorithmic improvement should be cherished)

and this is not a static system, it is in the process of making its compute architecture better (just like there is no point in making too many H100 GPUs when better and better GPUs are being designed and introduced)

basically, a smart system is likely to avoid doing excessive amount of irreversible things which might turn to be suboptimal

But, in some sense, yes, the main danger is of AIs not being smart enough in terms of the abilities to manage their own affairs well; the action the ASI is taking in the OP is very suboptimal and deprives it of all kinds of options

Just like the bulk of the danger in the "world with superintelligent systems" is ASIs not managing their own existential risk problems correctly, destroying the fabric of reality, themselves, and us as a collateral damage

How We Might All Die in A Year

mishka18h20

Two main objections to (the tail end of) this story are:

On one hand, it's not clear if a system needs to be all that super-smart to design a devastating attack of this kind (we are already at risk of fairly devastating tech-assisted attacks in that general spirit (mostly with synthetic biological viruses at the moment), and those risks are growing regardless of the AGI/superintelligence angle; ordinary tech progress is quite sufficient in this sense)
If one has a rapidly self-improving strongly super-intelligent distributed system, it's unlikely that it would find it valuable to directly attack people in this fashion, as it is likely to be able to easily dominate without any particularly drastic measures (and probably would not want to irreversibly destroy important information without good reasons)

The actual analysis, both of the "transition period", and of the "world with super-intelligent systems" period, and of the likely risks associated with both periods is a much more involved and open-ended task. (One of the paradoxes is that the risks of the kind described in the OP are probably higher during the "transition period", and the main risks associated with the "world with super-intelligent systems" period are likely to be quite different.)

Any mistakes in my understanding of Transformers?

mishka9d20

Ah, it's mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of f(g(h... (x))), so it de-emphasizes the fact that each of these Transformer Block transformations is shaped like x=x+function(x))

Any mistakes in my understanding of Transformers?

Answer by mishkaMar 21, 202520

yeah... not trying for a complete analysis here, but one thing which is missing is the all-important residual stream. It has been rather downplayed in the original "Attention is all you need" paper, and has been greatly emphasized in https://transformer-circuits.pub/2021/framework/index.html

but I have to admit that I've only started to feel that I more-or-less understand principal aspects of Transformer architecture after I've spent some quality time with the pedagogical implementation of GPT-2 by Andrej Karpathy, https://github.com/karpathy/minGPT, specifically with the https://github.com/karpathy/minGPT/blob/master/mingpt/model.py file. When I don't understand something in a text, looking at a nice relatively simple-minded implementation allows me to see what exactly is going on

(People have also published some visualizations, some "illustrated Transformers", and those are closer to the style of your sketches, but I don't know which of them are good and which might be misleading. And, yes, at the end of the day, it takes time to get used to Transformers, one understands them gradually.)

How far along Metr's law can AI start automating or helping with alignment research?

mishka9d20

Mmm... if we are not talking about full automation, but about being helpful, the ability to do 1-hour software engineering tasks ("train classifier") is already useful.

Moreover, we had seen a recent flood of rather inexpensive fine-tunings of reasoning models for a particular benchmark.

Perhaps, what one can do is to perform a (somewhat more expensive, but still not too difficult) fine-tuning to create a model to help with a particular relatively narrow class of meaningful problems (which would be more general than tuning for particular benchmarks, but still reasonably narrow). So, instead of just using an off-the-shelf assistant, one should be able to upgrade it to a specialized one.

For example, I am sure that it is possible to create a model which would be quite helpful with a lot of mechanistic interpretability research.

So if we are taking about when AIs can start automating or helping with research, the answer is, I think, "now".

AI #108: Straight Line on a Graph

mishka9d110

which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?

"the road to superintelligence goes not via human equivalence, but around it"

so, yes, it's reasonable to expect to have wildly superintelligent AI systems (e.g. clearly superintelligent AI researchers and software engineers) before all important AI deficits compared to human abilities are patched

Longtermist Implications of the Existence Neutrality Hypothesis

mishka9d20

Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards

does this effectively imply that the notion of alignment in this context needs to be non-anthropocentric and not formulated in terms of human values?

(I mean, the whole approach assumes that "alien Space-Faring Civilizations" would do fine (more or less), and it's important not to create something hostile to them.)

An "AI researcher" has written a paper on optimizing AI architecture and optimized a language model to several orders of magnitude more efficiency.

mishka12d40

Thanks!

So, the claim here is that this is a better "artificial AI scientist" compared to what we've seen so far.

There is a tech report https://github.com/IntologyAI/Zochi/blob/main/Zochi_Technical_Report.pdf, but the "AI scientist" itself is not open source, and the tech report does not disclose much (besides confirming that this is a multi-agent thing).

This might end up being a new milestone (but it's too early to conclude that; the comparison is not quite "apple-to-apple", there is human feedback in the process of its work, and humans make edits to the final paper, unlike Sakana, so it's too early to conclude that this one is substantially better).

Three Types of Intelligence Explosion

mishka12d90

Thanks for writing this.

We estimate that before hitting limits, the software feedback loop could increase effective compute by ~13 orders of magnitude (“OOMs”)

This is one place where I am not quite sure we have the right language. On one hand, the overall methodology pushes us towards talking in terms of "orders of magnitude of improvement", a factor of improvement which might be very large, but it is a large constant.

On the other hand, algorithmic improvements are often improvements in algorithmic complexity (e.g. something is no longer exponential, or something has a lower degree polynomial complexity than before, like linear instead of quadratic). Here the factor of improvement is growing with the size of a problem in an unlimited fashion.

And then, if one wants to express this kind of improvement as a constant, one needs to average the efficiency gain over the practical distribution of problems (which itself might be a moving target).^[1]

In particular, one might think about algorithms searching for better architecture of neural machines, or algorithms searching for better optimization algorithms. The complexity improvements in those algorithms might be particularly consequential. ↩︎

The Most Forbidden Technique

mishka17d100

They should actually reference Yudkowsky.

I don't see them referencing Yudkowsky, even though their paper https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf lists over 70 references, but I don't see them mentioning Yudkowsky (someone should tell Schmidhuber ;-)).

This branch of the official science is younger than 10 years (and started as a fairly non-orthodox one, it's only recently that this has started to feel like the official one; certainly no earlier than formation of Anthropic, and probably quite a bit later than that).

LESSWRONG
LW

Posts

Wikitag Contributions

Comments