1 min read

5

This is a special post for quick takes by Cole Wyeth. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
88 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

A fun illustration of survivorship/selection bias is that nearly every time I find myself reading an older paper, I find it insightful, cogent, and clearly written.

Selection bias isn't the whole story. The median paper in almost every field is notably worse than it was in, say, 1985. Academia is less selective than it used to be—in the U.S., there are more PhDs per capita, and the average IQ/test scores/whatever metric has dropped for every level of educational attainment.

Grab a journal that's been around for a long time, read a few old papers and a few new papers at random, and you'll notice the difference.

1David James
To what degree is this true regarding elite-level Ph.D. programs that are likely to lead to publication in (i) mathematics and/or (ii) computer science? Separately, we should remember that academic selection is a relative metric, i.e. graded on a curve. So, when it comes to Ph.D. programs, is the median 2024 Ph.D. graduate more capable (however you want to define it) than the corresponding graduate from 1985? This is complex, involving their intellectual foundations, depth of their specialized knowledge, various forms of raw intelligence, attention span, collaborative skills, communication ability (including writing skills), and computational tools? I realize what I'm about to say next may not be representative of the median Ph.D. student, but it feels to me the 2024 graduates of, say, Berkeley or MIT (not to mention, say, Thomas Jefferson High School) are significantly more capable than the corresponding 1985 graduates. Does my sentiment resonate with others and/or correspond to some objective metrics?
4wonder
Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed

The primary optimization target for LLM companies/engineers seems to be making them seem smart to humans, particularly the nerds who seem prone to using them frequently. A lot of money and talent is being spent on this. It seems reasonable to expect that they are less smart than they seem to you, particularly if you are in the target category. This is a type of Goodharting. 

In fact, I am beginning to suspect that they aren't really good for anything except seeming smart, and most rationalists have totally fallen for it, for example Zvi insisting that anyone who is not using LLMs to multiply their productivity is not serious (this is a vibe not a direct quote but I think it's a fair representation of his writing over the last year). If I had to guess, LLMs have 0.99x'ed my productivity by occasionally convincing me to try to use them which is not quite paid for by very rarely fixing a bug in my code. The number is close to 1x because I don't use them much, not because they're almost useful. Lots of other people seem to have much worse ratios because LLMs act as a superstimulus for them (not primarily a productivity tool). 

Certainly this is an impressive technology, surpris... (read more)

8Alexander Gietelink Oldenziel
I use LLMs throughout my personal and professional life. The productivity gains are immense. Yes hallucination is a problem but it's just as spam/ads/misinformation on wikipedia/internet - an small drawback that doesn't oblivate the ginormous potential of the internet/LLMs I am 95% certain you are leaving value on the table.  I do agree straight LLMs are not generally intelligent (in the sense of universal intelligence/AIXI) and therefore not completely comparable to humans. 
2ZY
On LLMs vs search on internet: agree that LLMs are very helpful in many ways, both personally and professionally, but the worse parts of the misinformation in LLM comparing to wikipedia/internets in my opinion includes: 1) it is relatively more unpredictable when the model will hallucinate, whereas for wikipedia/internet, you would generally expect higher accuracy for simpler/purely factual/mathematical information. 2) it is harder to judge the credibility without knowing the source of the information, whereas on the internet, we could get some signals where the website domain, etc.
8abramdemski
From my personal experience, I agree. I find myself unexcited about trying the newest LLM models. My main use-case in practice these days is Perplexity, and I only use it when I don't care much about the accuracy of the results (which ends up being a lot, actually... maybe too much). Perplexity confabulates quite often even with accurate references in hand (but at least I can check the references). And it is worse than me at the basics of googling things, so it isn't as if I expect it to find better references than me; the main value-add is in quickly reading and summarizing search results (although the new Deep Research option on Perplexity will at least iterate through several attempted searches, so it might actually find things that I wouldn't have). I have been relatively persistent about trying to use LLMs for actual research purposes, but the hallucination rate seems to go to 100% almost whenever an accurate result would be useful to me.  The hallucination rate does seem adequately low when talking about established mathematics (so long as you don't ask for novel implications, such as applying ideas to new examples). For this and for other reasons I think they can be quite helpful for people trying to get oriented to a subfield they aren't familiar with -- it can make for a great study partner, so long as you verify what it says be checking other references.  Also decent for coding, of course, although the same caveat applies -- coders who are already an expert in what they are trying to do will get much less utility out of it. I recently spoke to someone who made a plausible claim that LLMs were 10xing their productivity in communicating technical ideas in AI alignment with something like the following workflow: * Take a specific cluster of failure modes for thinking about alignment which you've seen often. * Hand-write a large, careful prompt document about the cluster of alignment failure modes, which includes many specific trigger-action patterns (i
6Vladimir_Nesov
Found the following in the Jan 23 newsletter:
6Vladimir_Nesov
I expect he'd disagree, for example I vaguely recall him mentioning that LLMs are not useful in a productivity-changing way for his own work. And 10x specifically seems clearly too high for most things even where LLMs are very useful, other bottlenecks will dominate before that happens.
1Cole Wyeth
10x was probably too strong but his posts are very clear he things it's a large productivity multiplier. I'll try to remember to link the next instance I see. 

Mathematics students are often annoyed that they have to worry about "bizarre or unnatural" counterexamples when proving things. For instance, differentiable functions without continuous derivative  are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don't show up physically. But these adversarial examples show up constantly in the practice of mathematics - when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don't naturally apply I'll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations - that's why you can't just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design - otherwise someone will misuse it. 

7Noosphere89
While I think the reaction against pathological examples can definitely make sense, and in particular there is a bad habit of some people to overfocus on pathological examples, I do think mathematics is quite different from other fields in that you want to prove that a property holds for all objects with a certain property, or prove that there exists an object with a certain property, and in these cases you can't ignore the pathological examples, because they can provide you with either solutions to your problem, or show why your approach can't work. This is why I didn't exactly like Dalcy's point 3 here: https://www.lesswrong.com/posts/GG2NFdgtxxjEssyiE/dalcy-s-shortform#qp2zv9FrkaSdnG6XQ
2cubefox
There is also the reverse case, where it is often common practice in math or logic to ignore bizarre and unnatural counterexamples. For example, first-order Peano arithmetic is often identified with Peano arithmetic in general, even though the first order theory allows the existence of highly "unnatural" numbers which are certainly not natural numbers, which are the subject of Peano arithmetic. Another example is the power set axiom in set theory. It is usually assumed to imply the existence of the power set of each infinite set. But the axiom only implies that the existence of such power sets is possible, i.e. that they can exist (in some models), not that they exist full stop. In general, non-categorical theories are often tacitly assumed to talk about some intuitive standard model, even though the axioms don't specify it. Eliezer talks about both cases in his Highly Advanced Epistemology 101 for Beginners sequence.

Particularly after my last post, I think my lesswrong writing has had bit too high of a confidence / effort ratio. Possibly I just know the norms of this site well enough lately that I don't feel as much pressure to write carefully. I think I'll limit my posting rate a bit while I figure this out.

LW doesn't punish, it upvotes-if-interesting and then silently judges.

confidence / effort ratio

(Effort is not a measure of value, it's a measure of cost.)

5Cole Wyeth
Yeah, I was thinking greater effort is actually necessary in this case. For context, my lower effort posts are usually more popular. Also the ones that focus on LLMs which is really not my area of expertise.

For context, my lower effort posts are usually more popular.

mood

Perhaps LLM's are starting to approach the intelligence of today's average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)

This doesn't seem to be reflected in the general opinion here, but it seems to me that LLM's are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they're heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account - however, this is outweighed by my personal experience with LLM's. I just don't find them useful for practically ... (read more)

Huh o1 and the latest Claude were quite huge advances to me. Basically within the last year LLMs for coding went to "occasionally helpful, maybe like a 5-10% productivity improvement" to "my job now is basically to instruct LLMs to do things, depending on the task a 30% to 2x productivity improvement".

1Cole Wyeth
I'm in Canada so can't access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I'm not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine. 
6Vladimir_Nesov
Use Chatbot Arena, both versions of Claude 3.5 Sonnet are accessible in Direct Chat (third tab). There's even o1-preview in Battle Mode (first tab), you just need to keep asking the question until you get o1-preview. In general Battle Mode (for a fixed question you keep asking for multiple rounds) is a great tool for developing intuition about model capabilities, since it also hides the model name from you while you are evaluating the response.
3core_admiral
Just an FYI unrelated to the discussion - all versions of Claude are available in Canada through Anthropic, you don't even need third party services like Poe anymore.  Source: https://www.anthropic.com/news/introducing-claude-to-canada 
4Vladimir_Nesov
Base model scale has only increased maybe 3-5x in the last 2 years, from 2e25 FLOPs (original GPT-4) up to maybe 1e26 FLOPs[1]. So I think to a significant extent the experiment of further scaling hasn't been run, and the 100K H100s clusters that have just started training new models in the last few months promise another 3-5x increase in scale, to 2e26-6e26 FLOPs. Right, the metrics don't quite capture how smart a model is, and the models haven't been getting much smarter for a while now. But it might be simply because they weren't scaled much further (compared to original GPT-4) in all this time. We'll see in the next few months as the labs deploy the models trained on 100K H100s (and whatever systems Google has). ---------------------------------------- 1. This is 3 months on 30K H100s, $140 million at $2 per H100-hour, which is plausible, but not rumored about specific models. Llama-3-405B is 4e25 FLOPs, but not MoE. Could well be that 6e25 FLOPs is the most anyone trained for with models deployed so far. ↩︎
3cdt
I've noticed they perform much better on graduate-level ecology/evolution questions (in a qualitative sense - they provide answers that are more 'full' as well as technically accurate). I think translating that into a "usefulness" metric is always going to be difficult though.
3eigen
The last few weeks I felt the opposite of this. I kind of go back and forth on thinking they are plateauing and then I get surprised with the new Sonnet version or o1-preview. I also experiment with my own prompting a lot.
1Cole Wyeth
I've noticed occasional surprises in that direction, but none of them seem to shake out into utility for me.
2cubefox
Is this a reaction to OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows?
1Cole Wyeth
No that seems paywalled, curious though?
1Cole Wyeth
I've been waiting to say this until OpenAI's next larger model dropped, but this has now failed to happen for so long that it's become it's own update, and I'd like to state my prediction before it becomes obvious. 

@Thomas Kwa will we see task length evaluations for Claude Opus 4 soon?

Anthropic reports that Claude can work on software engineering tasks coherently for hours, but it’s not clear if this means it can actually perform tasks that would take a human hours. I am slightly suspicious because they reported that Claude was making better use of memory on Pokémon, but this did not actually cash out as improved play. This seems like a fairly decisive test of my prediction that task lengths would stagnate at this point; if it does succeed at hours long tasks, I will... (read more)

I don't run the evaluations but probably we will; no timeframe yet though as we would need to do elicitation first. Claude's SWE-bench Verified scores suggest that it will be above 2 hours on the METR task set; the benchmarks are pretty similar apart from their different time annotations.

3Aaron Staley
That's a bit higher than I would have guessed.  I compared the known data points that have SWE-bench and METR medians (sonnet 3.5,3.6,3.7, o1, o3, o4-mini) and got an r^2 = 0.96 model assuming linearity between log(METR_median) and log(swe-bench-error). That gives an estimate more like 110 minutes for an Swe-bench score of 72.7%. Which works out to a sonnet doubling time of ~3.3 months.   (If I throw out o4-mini, estimator is ~117 minutes.. still below 120) Also would imply an 85% swe-bench score is something like a 6-6.5 hour METR median.
3Vladimir_Nesov
Since reasoning trace length increases with more steps of RL training (unless intentionally constrained), probably underlying scaling of RL training by AI companies will be observable in the form of longer reasoning traces. Claude 4 is more obviously a pretrained model update, not necessarily a major RLVR update (compared to Claude 3.7), and coherent long task performance seems like something that would greatly benefit from RLVR if it applies at all (which it plausibly does). So I don't particularly expect Claude 4 to be much better on this metric, but some later Claude ~4.2-4.5 update with more RLVR post-training released in a few months might do much better.
3Cole Wyeth
We can still check if it lies on the projected slower exponential curve before reasoning models were introduced.

Sure, but trends like this only say anything meaningful across multiple years, any one datapoint adds almost no signal, in either direction. This is what makes scaling laws much more predictive, even as they are predicting the wrong things. So far there are no published scaling laws for RLVR, the literature is still developing a non-terrible stable recipe for the first few thousand training steps.

It looks like Gemini is self-improving in a meaningful sense:

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Some quick thoughts:

This has been going on for months; on the bullish side (for ai progress, not human survival) this means some form of self-improvement is well behind the capability frontier. On the bearish side, we may not expect a further speed up on the log scale (since it’s already factored in to some calculations).

I did not expect this degree of progress so soon; I am now much ... (read more)

1Person
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI But I do have quick thoughts as well; Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).  It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information: * The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement. * They have not tried to distill all that data into a new model yet, which seems strange to me considering they've had it for a year now. * They say that a lot of improvements come from the base model's quality. * They do present the whole thing as part of research rather than a product So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of

Unfortunate consequence of sycophantic ~intelligent chatbots: everyone can get their theories parroted back to them and validated. Particularly risky for AGI, where the chatbot can even pretend to be running your cognitive architecture. Want to build a neuro-quantum-symbolic-emergent-consciousness-strange-loop AGI? Why bother, when you can just put that all in a prompt!

A lot of new user submissions these days to LW are clearly some poor person who was sycophantically encouraged by an AI to post their crazy theory of cognition or consciousness or recursion or social coordination on LessWrong after telling them their ideas are great. When we send them moderation messages we frequently get LLM-co-written responses, and sometimes they send us quotes from an AI that has evaluated their research as promising and high-quality as proof that they are not a crackpot.

Basic sanity check: We can align human children, but can we align any other animals? NOT to the extent that we would trust them with arbitrary amounts of power, since they obviously aren't smart enough for this question to make much sense. Just like, are there other animals that we've made care about us at least "a little bit?" Can dogs be "well trained" in a way where they actually form bonds with humans and will go to obvious personal risk to protect us, or not eat us even if they're really hungry and clearly could? How about species further on the evolutionary tree like hunting falcons? Where specifically is the line?

Sometimes I wonder if people who obsess over the "paradox of free will" are having some "universal human experience" that I am missing out on. It has never seemed intuitively paradoxical to me, and all of the arguments about it seem either obvious or totally alien. Learning more about agency has illuminated some of the structure of decision making for me, but hasn't really effected this (apparently) fundamental inferential gap. Do some people really have this overwhelming gut feeling of free will that makes it repulsive to accept a lawful universe? 

I used to, as a child. I did accept a lawful universe, but I thought my perception of free will was in tension with that, so that perception must be "an illusion". 

My mother kept trying to explain to me that there was no tension between these things, because it was correct that my mind made its own decisions rather than some outside force. I didn't understand what she was saying though. I thought she was just redefining 'free will' from a claim that human brains effectively had a magical ability to spontaneously ignore the laws of physics to a boring tautological claim that human decisions are made by humans rather than something else.

I changed my mind on this as a teenager. I don't quite remember how, it might have been the sequences or HPMOR again. I realised that my imagination had still been partially conceptualising the "laws of physics" as some sort of outside force, a set of strings pulling my atoms around, rather than as a predictive description of me and the universe. Saying "the laws of physics make my decisions, not me" made about as much sense as saying "my fingers didn't move, my hand did." That was what my mother had been trying to tell me.

3ProgramCrafter
I don't think so as I had success explaining away the paradox with concept of "different levels of detail" - saying that free will is a very high-level concept and further observations reveal a lower-level view, calling upon analogy with algorithmic programming's segment tree. (Segment tree is a data structure that replaces an array, allowing to modify its values and compute a given function over all array elements efficiently. It is based on tree of nodes, each of those representing a certain subarray; each position is therefore handled by several - specifically, O(logn) nodes.)
2Viliam
This might be related to whether you see yourself as a part of the universe, or as an observer. If you are an observer, the objection is like "if I watch a movie, everything in the movie follows the script, but I am outside the movie, therefore outside the influence of the script". If you are religious, I guess your body is a part of the universe (obeys the laws of gravity etc.), but your soul is the impartial observer. Here the religion basically codifies the existing human intuitions. It might also depend on how much you are aware of the effects of your environment on you. This is a learned skill; for example little kids do not realize that they are hungry... they just get kinda angry without knowing why. It requires some learning to realize "this feeling I have right now -- it is hunger, and it will probably go away if I eat something". And I guess the more knowledge of this kind you accumulate, the easier it is to see yourself as a part of the universe, rather than being outside of it and only moved by "inherently mysterious" forces.

To what extent would a proof about AIXI’s behavior be normative advice?

Though AIXI itself is not computable, we can prove some properties of the agent - unfortunately, there are fairly few examples because of the “bad universal priors” barrier discovered by Jan Leike. In the sequential case we only know things like e.g. it will not indefinitely keep trying an action that yields minimal reward, though we can say more when the horizon is 1 (which reduces to the predictive case in a sense). And there are lots of interesting results about the behavior of Solom... (read more)

Can AI X-risk be effectively communicated by analogy to climate change? That is, the threat isn’t manifesting itself clearly yet, but experts tell us it will if we continue along the current path.

Though there are various disanalogies, this specific comparison seems both honest and likely to be persuasive to the left?

4MondSemmel
I don't like it. Among various issues, people already muddy the waters by erroneously calling climate change an existential risk (rather than what it was, a merely catastrophic one, before AI timelines made any worries about climate change in the year 2100 entirely irrelevant), and it's extremely partisan-coded. And you're likely to hear that any mention of AI x-risk is a distraction from the real issues, which are whatever the people cared about previously. I prefer an analogy to gain-of-function research. As in, scientists grow viruses/AIs in the lab, with promises of societal benefits, but without any commensurate acknowledgment of the risks. And you can't trust the bio/AI labs to manage these risks, e.g. even high biosafety levels can't entirely prevent outbreaks.
2cdt
I agree that there is a consistent message here, and I think it is one of the most practical analogies, but I get the strong impression that tech experts do not want to be associated with environmentalists.
1Ariel Cheng
I think it would be persuasive to the left, but I'm worried that comparing AI x-risk to climate change would make it a left-wing issue to care about, which would make right-wingers automatically oppose it (upon hearing "it's like climate change"). Generally it seems difficult to make comparisons/analogies to issues that (1) people are familiar with and think are very important and (2) not already politicized.
1CstineSublime
I'm looking at this not from a CompSci point of view by a rhetoric point of view: Isn't it much easier to make tenuous or even flat out wrong links between Climate Change and highly publicized Natural Disaster events that have lot's of dramatic, visceral footage than it is to ascribe danger to a machine that hasn't been invented yet, that we don't know the nature or inclinations of? I don't know about nowadays but for me the two main pop-culture touchstones for me for "evil AI" are Skynet in Terminator, or HAL 9000 in 2001: A Space Odyssey (and by inversion - the Butlerian Jihad in Dune). Wouldn't it be more expedient to leverage those? (Expedient - I didn't say accurate)

Most ordinary people don't know that no one understands how neural networks work (or even that modern "Generative A.I." is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high. 

It's hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points: 

1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possibl... (read more)

"Optimization power" is not a scalar multiplying the "objective" vector. There are different types. It's not enough to say that evolution has had longer to optimize things but humans are now "better" optimizers:  Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other - they're radically different sorts of things.

Evolution optimizes things in a massively parallel way, so that they're robustly good at lots of different selectively relevant things ... (read more)

Optimality is about winning. Rationality is about optimality.  

I guess Dwarkesh believes ~everything I do about LLMs and still think we probably get AGI by 2032:

https://www.dwarkesh.com/p/timelines-june-2025

2Noosphere89
@ryan_greenblatt made a claim that continual learning/online training can already be done, but that right now it's not super-high returns and requires annoying logistical/practical work to be done, and right now AI issues are elsewhere like sample efficiency and robust self-verification. That would explain the likelihood of getting AGI by the 2030s being pretty high: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#pEBbFmMm9bvmgotyZ Ryan Greenblatt's original comment: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#xMSjPgiFEk8sKFTWt
1Roman Malov
What are your timelines?
2Cole Wyeth
My distribution is pretty wide, but I think probably not before 2040. 

I still don't think that a bunch of free-associating inner monologues talking to each other gives you AGI, and it still seems to be an open question whether adding RL on top just works.

The "hallucinations" of the latest reasoning models look more like capability failures than alignment failures to me, and I think this points towards "no." But my credences are very unstable; if METR task length projections hold up or the next reasoning model easily zero-shots Pokemon I will just about convert. 

2Cole Wyeth
Investigating preliminary evaluations of o3 and o4-mini I am more convinced that task length is scaling as projected.  Pokémon has fallen, but as far as I can tell this relied on scaffolding improvements for Gemini 2.5 pro customized during the run, NOT a new smarter model. Overall, I am already questioning my position one week later.
7Thane Ruthenis
Pokémon is actually load-bearing for your models? I'm imagining a counterfactual world in which Sonnet 3.7's initial report involved it beating Pokémon Red, and I don't think my present-day position would've been any different in it. Even aside from tons of walkthrough information present in LLMs' training set, and iterative prompting allowing to identify and patch holes in LLMs' pretrained instinctive game knowledge, Pokémon is simply not a good test of open-ended agency. At the macro-scale, the game state can only progress forward, and progressing it requires solving relatively closed-form combat/navigational challenges. Which means if you're not too unlikely to blunder through each of those isolated challenges, you're fated to "fail upwards". The game-state topology doesn't allow you to progress backward or get stuck in a dead end: you can't lose a badge or un-win a boss battle. I. e.: there's basically an implicit "long-horizon agency scaffold" built into the game. Which means what this tests is mainly the ability to solve somewhat-diverse isolated challenges in sequence. But not the ability to autonomously decompose long-term tasks into said isolated challenges in a way such that the sequence of isolated challenges implacably points at the long-term task's accomplishment.
4Cole Wyeth
Hmm, maybe I’m suffering from having never played Pokémon… who would’ve thought that could be an important hole in my education? 
2Noosphere89
I think the hallucinations/reward hacking is actually a real alignment failure, but an alignment failure that happens to degrade capabilities a lot, though at least some of the misbehavior is probably due to context, but I have seen evidence that the alignment failures are more deliberate than regular capabilities failures. That said, if this keeps happening, the likely answer is because capabilities progress is to a significant degree bottlenecked on alignment progress, such that you need a significant degree of progress on preventing specification gaming to get new capabilities, and this would definitely be a good world for misalignment issues if the hypothesis is true (which I put some weight on) (Also, it's telling that the areas where RL has worked best are areas where you can basically create unhackable reward models like many games/puzzles, and once reward hacking is on the table, capabilities start to decrease).

GDM has a new model: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#advanced-coding

At a glance, it is (pretty convincingly) the smartest model overall. But progress still looks incremental, and I continue to be unconvinced that this paradigm scales to AGI. If so, the takeoff is surprisingly slow. 

Back-of-the-envelope math indicates that an ordinary NPC in our world needs to double their power like 20 times over to become a PC. That’s a tough ask. I guess the lesson is either give up or go all in. 

4Phiwip
Can you expand on this? I'm not sure what you mean but am curious about it.
2Cole Wyeth
There are around 8 billion humans, so an ordinary person has a very small fraction of the power needed to steer humanity in any particular direction. A very large number of doublings are required to be a relevant factor. 
2Viliam
That's an interesting idea. However, people who read this comments probably already have power much greater than the baseline -- a developed country, high intelligence, education, enough money and free time to read websites... Not sure how many of those 20 doublings still remain.
2Cole Wyeth
I thought the statement was pretty clearly not about the average lesswronger.  But in terms of the “call to action” - 20 was pretty conservative, so I think it’s still in that range, and doesn’t change the conclusions one should draw much. 

That moment when you want to be updateless about risk but updateful about ignorance, but the basis of your epistemology is to dissolve the distinction between risk and ignorance.

(Kind of inspired by @Diffractor)

Did a podcast interview with Ayush Prakash on the AIXI model (and modern AI), very introductory/non-technical:

2Cole Wyeth
Some errata: The bat thing might have just been Thomas Nagel, I can't find the source I thought I remembered. At one point I said LLMs forget everything they thought previously between predicting (say) token six and seven and half to work from scratch. Because of the way the attention mechanism works it is actually a little more complicated (see the top comment from hmys). What I said is (I believe) still overall right but I would put that detail less strongly.      Hofstadter apparently was the one who said a human-level chess AI would rather talk about poetry.

Gary Kasparov would beat me at chess in some way I can't predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material - massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you're not as smart as ALL of humanity - it's a disadvantage relative to even the... (read more)

I suspect that human minds are vast (more like little worlds of our own than clockwork baubles) and even a superintelligence would have trouble predicting our outputs accurately from (even quite) a few conversations (without direct microscopic access) as a matter of sample complexity.

Considering the standard rhetoric about boxed A.I.'s, this might have belonged in my list of heresies: https://www.lesswrong.com/posts/kzqZ5FJLfrpasiWNt/heresies-in-the-shadow-of-the-sequences

3CstineSublime
There is a large body of non-AI literature that already addresses this, for example the research of Gerd Gigerenzer which shows that often heuristics and "fast and frugal" decision trees substantially outperform fine grained analysis because of the sample complexity matter you mention. Pop frameworks which elaborate on this, and how it may be applied include David Snowden's  Cynefin framework which is geared for government and organizations and of course Nicholas Nassim Taleb's Incerto.  I seem to recall also that the gist of Dunbar's Number, and the reason why certain Parrots and Corvids seem to have larger pre-frontal-crotex equivalents than non-monogamous birds, is basically so that they can have a internal model of their mating partner. (This is very interesting to think about in terms of intimate human relationships, what I'd poetically describe as the "telepathy" when wordlessly you communicate, intuit, and predict a wide range of each-other's complex and specific desires and actions because you've spent enough time together). The scary thought to me is that a superintelligence would quite simply not need to accurately model us, it would just need to fine tune it's models in a way not dissimilar from the psychographic models utilized by marketers. Of course that operates at scale so the margin of error is much greater but more 'acceptable'. Indeed dumb algorithms already to this very well - think about how 'addictive' people claim their TikTok or Facebook feeds are. The rudimentary sensationalist clickbait that ensures eyeballs and clicks. A superintelligence doesn't need accurate modelling - this is without having individual conversations with us, to my knowledge (or rather my experience) most social media algorithms are really bad at taking the information on your profile and using things like sentiment and discourse analysis to make decisions about which content to feed you; they rely on engagement like sharing, clicking like, watch time and rudimentary
2Alexander Gietelink Oldenziel
One can showcase very simple examples of data that is easy to generate ( simple data soirce) yet very hard to predict. E.g. there is a 2-state generating hidden markov model whose optimal prediction hidden markov model is infinite. Ive heard it explained as follows: it's much harder for the fox to predict where the hare is going than it is for the hare to decide where to go to shake off the fox.

I'm starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added - also some people outside of lesswrong are interested. 

Presented the Sherlockian abduction master list at a Socratica node:

Image

In MTG terms, I think Mountainhead is the clearest example I’ve seen of a mono-blue dystopia.

@Duncan Sabien (Inactive) 

I seem to recall EY once claiming that insofar as any learning method works, it is for Bayesian reasons. It just occurred to me that even after studying various representation and complete class theorems I am not sure how this claim can be justified - certainly one can construct working predictors for many problems that are far from explicitly Bayesian. What might he have had in mind?

A "Christmas edition" of the new book on AIXI is freely available in pdf form at http://www.hutter1.net/publ/uaibook2.pdf 

Over-fascination with beautiful mathematical notation is idol worship. 

6Seth Herd
So is the fascination with applying math to complex real-world problems (like alignment) when the necessary assumptions don't really fit the real-world problem.
3gwern
(Not "idle worship"?)
2Hastings
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.

I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?

Curated and popular this week