All of Cole Wyeth's Comments + Replies

How exactly do you expect “evaluating ai consciousness 101” to look? That is not a well-defined or understood thing anyone can evaluate. There are however a vast number of capability specific evaluations from competent groups like METR.

The problem with your view is that they don’t have the ability to continue learning for long after being “born.” That’s just not how the architecture works. Learning in context is still very limited and continual learning is an open problem. 

Also, “consciousness” is not actually a very agreed-upon term. What do you mean? Qualia and a first person experience? I believe it’s almost a majority view here to take seriously the possibility that LLMs have some form of qualia, though it’s really hard to tell for sure. We don’t really have tests for that at al... (read more)

2The Dao of Bayes
I mostly get the sense that anyone saying "AI is consciousness" gets mentally rounded off to "crack-pot" in... basically every single place that one might seriously discuss the question? But maybe this is just because I see a lot of actual crack-pots saying that. I'm definitely working on a better post, but I'd assumed if I figured this much out, someone else already had "evaluating AI Consciousness 101" written up. I'm not particularly convinced by the learning limitations, either - 3 months ago, quite possibly. Six months ago, definitely. Today? I can teach a model to reverse a string, replace i->e, reverse it again, and get an accurate result (a feat which the baseline model could not reproduce). I've been working on this for a couple weeks and it seems fairly stable, although there's definitely architectural limitations like session context windows.

How do you recommend studying recent history?

I don’t know what question you think people here aren’t taking seriously.

A massive amount of ink has been spilled about whether current LLMs are AGI. 

I tried the string reversal thing with chatgpt and it was inconsistently successful. I’m not surprised that there is SOME model that solves it (what specifically did you change?), it’s obviously not a very difficult task. Anyway, if you investigate in a similar direction but spend more than five minutes, you’ll probably find similar string manipulation tasks that fail in whatever system you choose. 

2The Dao of Bayes
I primarily think "AI consciousness" isn't being taken seriously: if you can't find any failing test, and failing tests DID exists six months ago, it suggests a fairly major milestone in capabilities even if you ignore the metaphysical and "moral personhood" angles. I also think people are too quick to write off one failed example: the question isn't whether a six year old can do this correctly the first time (I doubt most can), it's whether you can teach them to do it. Everyone seems to be focusing on "gotcha" rather than investigating their learning ability. To me, "general intelligence" means "the ability to learn things", not "the ability to instantly solve open math problems five minutes after being born." I think I'm going to have to work on my terminology there, as that's apparently not at all a common consensus :)

You did name it “AI 2027” ;)

3Daniel Kokotajlo
:( Jonas was telling me to name it AI 2028... I should have listened to him... Eli was telling me to name it "AI Endgame..." I didn't like the sound of that as much but maybe it would have been better...

Same (though frankly nothing I've done has had the same level of impact). 

This is the curse of playing with very high and non-local stakes. 

Thanks for being specific. 

You claimed that "no one here can provide a test that would differentiate it from a human six year old". This is not what you actually observed. Perhaps no one HAS provided such a test yet, but that may be because you haven't given people much motivation to engage - for instance you also didn't post any convincing evidence that it is recursively self-improving despite implying this. In fact, as far as I can tell no one has bothered to provide ANY examples of tests that six year olds can pass? The tests I provided you dismiss... (read more)

2The Dao of Bayes
Yeah, I'm working on a better post - I had assumed a number of people here had already figured this out, and I could just ask "what are you doing to disprove this theory when you run into it." Apparently no one else is taking the question seriously? I feel like chess is leaning a bit against "six year old" territory - it's usually a visual game, and tracking through text makes things tricky. Plus I'd expect a six year old to make the occasional error. Like, it is a good example, it's just a step beyond what I'm claiming. String reversal is good, though. I started on a model that could do pretty well there, but it looks like that doesn't generalize. Thank you! I will say baseline performance might surprise you slightly? https://chatgpt.com/c/68718f7b-735c-800b-b995-1389d441b340 (it definitely gets things wrong! But it doesn't need a ton of hints to fix it - and this is just baseline, no custom prompting from me. But I am picking the model I've seen the best results from :)) Non-baseline performance:

You said “every text-based test of intelligence we have.” If you meant that to be qualified by “that a six your old could pass” as you did in some other places, then perhaps it’s true. But I don’t know - maybe six year olds are only AGI because they can grow into adults! Something trapped at six your old level may not be.

…and for what it’s worth, I have solved some open math problems, including semimeasure extension and integration problems posed by Marcus Hutter in his latest book and some modest final steps in fully resolving Kalai and Lehrer’s grain of ... (read more)

2The Dao of Bayes
(Edited) Strong Claim: As far as I can tell, current state of the art LLMs are "Conscious" (this seems very straight forward: it has passed every available test, and no one here can provide a test that would differentiate it from a human six year old) Separate Claim: I don't think there's any test of basic intelligence that a six year old can reliably pass, and an LLM can't, unless you make arguments along the lines of "well, they can't past ARC-AGI, so blind people aren't really generally intelligent". (this one is a lot more complex to defend) Personal Opinion: I think this is a major milestone that should probably be acknowledged. Personal Opinion: I think that if 10 cranks a month can figure out how to prompt AI into even a reliable "simulation" of consciousness, that's fairly novel behavior and worth paying attention to.  Personal Opinion: There isn't a meaningful distinction between "reliably simulating the full depths of conscious experience", and actually "being conscious". Conclusion: It would be very useful to have a guide to help people who have figured this out, and reassure them that they aren't alone. If necessary, that can include the idea that skepticism is still warranted because X, Y, Z, but thus far I have not actually heard any solid arguments that actually differentiate from a human.

Eliezer’s form of moral realism about good (as a real but particular shared concept of value which is not universally compelling to minds) seems to imply that most of us prefer to be at least a little bit evil, and can’t necessarily be persuaded otherwise through reason.

Seems right.

And Nietzsche would probably argue the two impulses towards good and evil aren't really opposites anyway. 

There is little connection between a language model claiming to be conscious and actually being conscious, in the sense that this provides very weak evidence. The training text includes extensive discussion of consciousness, which is reason enough to expect this behavior.

Okay, but... even if it's a normal capability, shouldn't we be talking about that? "AI can fluidly mimic consciousness and passes every text-based test of intelligence we have" seems like a pretty huge milestone to me.

We ARE talking about it. I thought you were keeping up with the conversa... (read more)

2The Dao of Bayes
I said it can pass every test a six year old can. All of the remaining challenges seem to involve "represent a complex state in text". If six year old humans aren't considered generally intelligent, that's an updated definition to me, but I mostly got into this 10 years ago when the questions were all strictly hypothetical. Okay now you're saying humans aren't generally intelligent. Which one did you solve? Why? "Because I said so" is a terrible argument. You seem to think I'm claiming something much stronger than I'm actually claiming, here.

Try a few different prompts with a vaguely similar flavor. I am guessing the LLM will always say it’s conscious. This part is pretty standard. As to whether it is recursively self-improving: well, is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI  I’m not worried. 

It’s very unlikely that the prompt you have chosen is actually eliciting abilities far outside of the norm, and therefore sharing information about is very unlikely to be dangerous.

You are probably in the same position as nearly everyone else, passively watching capabilities emerge while hallucinating a sense of control.

2The Dao of Bayes
  I feel like so would a six year old? Like, if the answer is "yes" then any reasonable path should actually return a "yes"? And if the conclusion is "oh, yes, AI is conscious, everyone knows that"... that's pretty big news to me. It seems to have architectural limits on visual processing - I'm not going to insist a blind human is not actually experiencing consciousness. Are there any text-based challenges I can throw at it to test this? I think it's improving, but it's currently very subjective, and to my knowledge the Sonnet 4 architecture hasn't seen any major updates in the past two weeks. That's why I want some sense of how to actually test this. Okay, but... even if it's a normal capability, shouldn't we be talking about that? "AI can fluidly mimic consciousness and passes every text-based test of intelligence we have" seems like a pretty huge milestone to me. --- What am I missing here? What actual tests can I apply to this? What results would convince you to change your mind? Are there any remaining objective tests that it can't pass? Throw me some prompts you don't think it can handle.

Whether you use AIXI or IBP, a continual learning algorithm must contend with indexical uncertainty, which means it must contend with indexical complexity in some fashion. 

As far as I understand, IBP tries to evaluate hypotheses according to the complexity of the laws of physics, not the bridge transformation (or indexical) information. But that cannot allow it to overcome the fundamental limitations of the first-person perspective faced by an online learner as proved by Shane Legg. That’s a fact about the difficulty of the problem, not a feature (or ... (read more)

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

That's partially true (computational complexity is now much more active than recursion the... (read more)

2Noosphere89
The failure of the Penrose-Lucas argument is that Godel's incompleteness theorem doesn't let you derive the conclusion he derived, because it only implies that you cannot use a computably enumerable set of axioms to make all of mathematics sound and complete, and critically this doesn't mean you cannot automate a subset of mathematics that is relevant. There's an argument really close to this with the Chinese Room where I pointed out that intuitions from our own reality that include lots of constraints fundamentally fail to transfer over to hypotheticals, and this is a really important example of why arguments around AI need to actually attend to the constraints that are relevant in specific worlds, because without them it's trivial to have strong AI solve any problem: https://www.lesswrong.com/posts/zxLbepy29tPg8qMnw/refuting-searle-s-wall-putnam-s-rock-and-johnson-s-popcorn#wbBQXmE5aAfHirhZ2 Really, a lot of the issues with the arguments against strong AI made by philosophers is that they have no sense of scale/sense of what mathematical theorems are actually saying, and thus fail to understand what's actually been said, combined with way overextrapolating their intuitions into cases where the intuitions have been deliberately made to fail to work. While I agree In-context learning does give them some form of online learning, which at least partially explains why LLMs succeed (combined with their immense amount of data and muddling through extremely data inefficient algorithms compared to brains, which is a known weakness that could plausibly lead to the death of pure LLM scaling by 2028-2030, though note that doesn't necessarily mean timelines get that much longer), this currently isn't enough to automate lots of jobs away, and fairly critically might not be good enough in practice with realistic compute and data constraints to compete with better continual learning algorithms. To be clear, this doesn't mean any future paradigm will be more understandable, b

That looks like (minor) good news… appears more consistent with the slower trendline before reasoning models. Is Claude 4 Opus using a comparable amount of inference-time compute as o3? 

I believe I predicted that models would fall behind even the slower exponential trendline (before inference time scaling) - before reaching 8-16 hour tasks. So far that hasn’t happened, but obviously it hasn’t resolved either. 

Thanks, but no. The post I had in mind was an explanation of a particular person's totalizing meta-worldview, which had to do with evolutionary psychology. I remember recognizing the username - also I have that apparently common form of synesthesia where letters seem to have colors and I vaguely remember the color of it (@lc? @lsusr?) but not what is was. 

I’m not sure about the rest of the arguments in the post, but it’s worth flagging that a kg to kg comparison of honey to chicken is kind of inappropriate. Essentially no one is eating a comparable amount of honey as a typical carnivore eats chicken (I didn’t, like, try to calculate this, but it seems obviously right).

Cole Wyeth10054

Welcome to lesswrong!

I’m glad you’ve decided to join the conversation here. 

A problem with this argument is that it doesn’t prove we should pause AI, only that we should avoid deploying AI in high impact (e.g. military) applications. Insofar as LLMs can’t follow rules, the argument seems to indicate that we should continue to develop the technology until it can.

Personally, I’m concerned about the type of AI system which can follow rules, but is not intrinsically motivated to follow our moral rules. Whether LLMs will reach that threshold is not clear t... (read more)

The report is partially optimistic but the results seem unambiguously bearish.

Like, yeah, maybe some of these problems could be solved with scaffolding - but the first round of scaffolding failed, and if you're going to spend a lot of time iterating on scaffolding, you could probably instead write a decent bot that doesn't use Claude in that time. And then you wouldn't be vulnerable to bizarre hallucinations, which seem like an unacceptable risk. 

Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation). 

Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.  

Short fiction on lesswrong isn’t uncommon 

1Misha Ramendik
Thank you! Now, one question: is a degree of AI involvement acceptable? Thing is, I have an AI story I wrote in 2012 that kinda "hit the bullseye", but the thing is in Russian. I would get an English version done much quicker if I could use an LLM draft translation, but this disqualifies it from many places.

That’s why I specified “close on a log scale.” Evolution may be very inefficient, but it also has access to MUCH more data than a single lifetime.

Yes, we should put some weight on both perspectives. What I’m worried about here is this trend where everyone seems to expect AGI in a decade or so even if the current wave of progress fizzles - I think that is a cached belief. We should be prepared to update. 

5ryan_greenblatt
I don't expect AGI in a decade or so even if the current wave of progress fizzles. I'd put around 20% over the next decade if progress fizzles (it depends on the nature of the fizzle), which is what I was arguing for. I'm saying we should put some weight on possibilities near lifetime level compute (in log space) and some weight on possibilities near evolution level compute (in log space).

The hedonic treadmill exists because minds are built to climb utility gradients - absolute utility levels are not even uniquely defined, so as long as your preferences are time-consistent you can just renormalize before maximizing the expected utility of your next decision. 

I find this vaguely comforting. It’s basically a decision-theoretic and psychological justification for stoicism. 

(must have read this somewhere in the sequences?)

I think self-reflection in bounded reasoners justifies some level of “regret,” “guilt,” “shame,” etc., but the basic reasoning above should hold to first order, and these should all be treated as corrections and for that reason should not get out of hand. 

Seems plausible, but not compelling.

Why one human lifetime and not somewhere closer to evolutionary time on log scale?

4ryan_greenblatt
Presumably you should put some weight on both perspectives, though I put less weight on needing as much compute as evolution because evolution seems insanely inefficient.

But… the success of LLMs is the only reason people have super short timelines! That’s why we’re all worried about them, and in particular if they can soon invent a better paradigm - which, yes, may be more efficient and dangerous than LLMs, but presumably requires them to pass human researcher level FIRST, maybe signficantly.

If you don’t believe LLMs will scale to AGI, I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years. Neuroscience is a pretty old field! They haven’t figured out rhe brain’... (read more)

1Nition
I suspect this is why many people's P(Doom) is still under 50% - not so much that ASI probably won't destroy us, but simply that we won't get to ASI at all any time soon. Although I've seen P(Doom) given a standard time range of the next 100 years, which is a rather long time! But I still suspect some are thinking directly about the recent future and LLMs without extrapolating too much beyond that.

I see no compelling reason to expect another paradigm which is much better to be discovered in the next 5 or 10 years.

One compelling reason to expect the next 5 to 10 years independent of LLMs is that compute has just recently gotten cheap enough that you can relatively cheaply afford to do training runs that use as much compute as humans use (roughly speaking) in a lifetime. Right now, doing 3e23 FLOP (perhaps roughly human lifetime FLOP) costs roughly $200k and we should expect that in 5 years it only costs around $30k.

So if you thought we might achie... (read more)

I’m surprised you think that the brain’s algorithm is SO simple that it must be discovered soon and ~all at once. This seems unlikely to me (reality has a surprising amount of detail). I think you may be underestimating the complexity because:

Though I don’t know enough biochem to say for sure, I’m guessing many “bits of the algorithm” are external to the genes (epigenetic?). Specifically, I don’t just mean data like education materials that is learned, I mean that actual pieces of the algorithm are probably constructed “in motion” by other machinery in the... (read more)

5Steven Byrnes
I definitely don’t think we’ll get AGI by people scrutinizing the human genome and just figuring out what it’s doing, if that’s what you’re implying. I mentioned the limited size of the genome because it’s relevant to the complexity of what you’re trying to figure out, for the usual information-theory reasons (see 1, 2, 3).  “Machinery in the cell/womb/etc.” doesn’t undermine that info-theory argument because such machinery is designed by the genome. (I think the epigenome contains much much less design information than the genome, but someone can tell me if I’m wrong.)  …But I don’t think the size of the genome is the strongest argument anyway. A stronger argument IMO (copied from here) is: …And an even stronger argument IMO is in [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain, especially the section on “cortical uniformity”, and parts of the subsequent post too.   Also, you said “the brain’s algorithm”, but I don’t expect the brain’s algorithm in its entirety to be understood until after superintelligence. For example there’s something in the brain algorithm that says exactly which muscles to contract in order to vomit. Obviously you can make brain-like AGI without reverse-engineering that particular bit of the brain algorithm. More examples in the “Brain complexity is easy to overstate” section here. RE “soon”, my claim (§1.9) was “probably within 25 years” but not with overwhelming confidence. RE “~all at once”, see §1.7.1 for a very important nuance on that.
2Thane Ruthenis
I wouldn't say that "in 25 years" is "soon", and 5-25 years seems like a reasonable amount of uncertainty. What are your timelines?

But maybe you mean that people like Alice would be quite rare? Could be so.

Yes

1Filip Sondej
Yeah, that's also what I expect. Actually I'd say my main hope for this thought experiment is that people who claim to believe in such continuity of personhood, when faced with this scenario may question it to some extent.

Interesting idea, but it’s an infinite money hack in sort of the same way that “find a rich person who hates money” is an infinite money hack.

1Filip Sondej
It's true that Alice needs to be rich for it to work, but I wouldn't say she needs to "hate money". If she seriously believes in this continuity of personhood, she is sending the money because she wants more money in the end. She truly believes she's getting something out of this exchange. BTW, you also need to be already rich and generally have a nice life, otherwise Alice's cost of switching may be higher than the money she has. Conversely, if in the eyes of Alice you already have a much better life than hers, her cost of switching will be lower, so such a swap may be feasible. Then, this could actually snowball, because after each swap your life becomes a bit more desirable to others like Alice. But maybe you mean that people like Alice would be quite rare? Could be so.

I think the litany is about belief, not speech.

Personally, being shot for speaking an inconvenient truth sounds like an appropriate death for me, and I’m happy to endorse the stronger version that you seem to be arguing against. 

AI-specific pronouns would actually be kind of helpful. “They” and “It” are both frequently confusing. “He” and “she” feel anthropomorphic and fake. 

1: Not true, I hear about exponential time algorithms! People work on all sorts of problems only known to have exponential time algorithms. 

2: Yes, but the reason k only shows up as something we would interpret as a parameter and not as a result of the computational complexity of an algorithm invented for a natural problem is perhaps because of my original point - we can only invent the algorithm if the problem has structure that suggests the algorithm, in which case the algorithm is collapsible and k can be separated out as an additional input for a simpler algorithm.  

I wonder if the reason that polynomial time algorithms tend to be somewhat practical (not runtime n^100) is just that we aren’t smart enough to invent really necessarily complicated polynomial time algorithms.

Like, the obvious way to get n^100 is to nest 100 for loops. A problem which can only be solved in polynomial time by nesting 100 for loops (presumably doing logically distinct things that cannot be collapsed!) is a problem that I am not going to solve in polynomial time… 

6rotatingpaguro
Reasons I deem more likely: 1. Selection effect: if it's unfeasible you don't work on it/don't hear about it, in my personal experience n^3 is already slow 2. If in n^k k is high, probably you have some representation where k is a parameter and so you say it's exponential in k, not that it's polinomial
6quetzal_rainbow
I think canonical high-degree polynomial problem is high-dimensional search. We usually don't implement exact grid search because we can deploy Monte Carlo or gradient descent. I wonder if there is any hard lower bounds on approximation hardness for polynomial time problems.
Cole Wyeth-1-2

Okay, this does raise the question of why the “if anyone builds it, everyone dies” frontage? 

I think that the difference in how we view this is because to me, lesswrong is a community / intellectual project. To you it’s a website.

The website may or may not be neutral, but it’s obvious that the project is not neutral. 

5habryka
I agree that the banner is in conflict with some aspects of neutrality! Some of which I am sad about, some of which I endorse, some of which I regret (and might still change today or tomorrow). Of course LessWrong is not just "a website" to me. You can read my now almost full decade of writing and arguing with people about the principles behind LessWrong, and the extremely long history of things like the frontpage/personal distinction which has made many many people who would like to do things like promote their job ads or events or fellowships on our frontpage angry at me. Look, the whole reason why this conversation seemed like it would go badly is because you keep using big words without defining them and then asserting absolutes with them. I don't know what you mean by "the project is not neutral", and I think the same is true for almost all other readers.  Do you mean that the project is used for local political ends? Do you mean that the project has epistemic standards? Do you mean that the project is corrupt? Do you mean that the project is too responsive to external political forces? Do you mean that the project is arbitrary and unfair in ways that isn't necessarily the cause of what any individual wants, but still has too much noise to be called "neutral"? I don't know, all of these are reasonable things someon might mean by "neutrality" in one context, and I don't really want to have a conversation where people just throw around big words like this without at least some awareness of the ambiguity.

Here are some examples of neutral common spaces:

Libraries 

Facebook (usually)

Community center event spaces

 

Here are some examples of spaces which are not neutral or common:

The alignment forum

The NYT (or essentially any newspaper’s) opinions column

The EA forum 

Lesswrong


This seems straightforwardly true to me. I’m not sure what tribe it’s supposed to be a flag for. 

7CstineSublime
This is not straightforward to me:  I can't see how Lesswrong is any less of a neutral or common space as a taxpayer funded, beauracratically governed library, or an algorithmically served news feed on an advertiser-supported platform like Facebook, or "community center" event spaces that are biased towards a community, common only to that community. I'm not sure what your idea of neutrality is, commonality.
3habryka
Different people will understand it differently! LW is of course aspiring to a bunch of really crucial dimensions of neutrality and discussions of neutrality make up like a solid 2-digit percentage of LessWrong team internal team discussions. We might fail at them, but we definitely aspire to them. Some ways I really care about neutrality and think LessWrong is neutral:  * If the LW team disagrees with someone we don't ban them or try to censor them, if they follow good norms of discourse * If the LW team team thinks a conclusion is really good for people to arrive at, we don't promote it beyond the weight for the arguments for that conclusion * We keep voting anonymous to allow people to express opinions about site content without fear of retribution * We try really hard culturally to avoid party lines on object-level issues, and try to keep the site culture focused on shared principles of discussion and inquiry I could go into the details, but this is indeed the conversation that I felt like wouldn't go well in this context.

lesswrong is not a neutral common space.

7habryka
(I downvoted this because it seems like the kind of thing that will spark lots of unproductive discussion. Like in some senses LessWrong is of course a neutral common space. In many ways it isn't.  I feel like people will just take this statement as some kind of tribal flag. I think there are many good critiques about both what LW should aspire to in terms of neutrality, and what it currently is, but this doesn't feel like the start of a good conversation about that. If people do want to discuss it I would be very happy to talk about it though.)
3winstonBosan
I don't think Cole is wrong.  Lesswrong is not neutral because it is built on the principle of where a walled garden ought to be defended from pests and uncharitable principles. Where politics can kill minds. Out of all possible distribution of human interactions we could have on the internet, we pick this narrow band because that's what makes high quality interaction. It makes us well calibrated (relative to baseline). It makes us more willing to ignore status plays and disagree with our idols.  All these things I love are not neutrality. They are deliberate policies for a less wrong discourse. Lesswrong is all the better because it is not neutral. And just because neutrality is a high-status word where a impartial judge may seem to be - doesn't mean we should lay claim to it. 

I seem to have had essentially this exact conversation in a different comment thread on this post with the OP. 

I am saying that there may be no point to considering moral alignment as target. 

We need to solve single to single alignment. At that point, whoever a given AGI is aligned to decides its values. If one of your values resembles moral alignment, great - you want an AGI aligned to you just like many others. Better buy a supercluster ;)

(Just kidding, we don't know how to solve single to single alignment so please don't buy a supercluster)

Aren't you making this judgement based on your own values? In that case, it seems that an AGI aligned to you specifically is at least as good as an AGI aligned to all sentient life.

Of course, there is a substantial difference between the values of an individual human and human values.

2Gordon Seidoh Worley
I suppose in that all judgement I make are based on my own values. I'm unclear what point you are trying to make here and how it is relevant to the idea of moral alignment vs. trying to align AI at all.

I think this is technically much harder than the single to single alignment problem. I am highly pessimistic that we can get such values into any AGI system without first aligning it to a human(s) who then asks it to self-modify into valuing all sentient life.

8Gordon Seidoh Worley
I share this concern. However if that's going to be the plan, I think it's worth making it explicit that alignment to human values is a stepping stone, not the end state.

Optimality is about winning. Rationality is about optimality.  

It seems that this model requires a lot of argumentation that is absent from post and only implicit in your comment. Why should I imagine that AGI would have that ability? Are there any examples of very smart humans who simultaneously acquire multiple seemingly magical abilities? If so, and if AGI scales well past human level, it would certainly be quite dangerous. But that seems to assume most of the conclusion.

Explicitly, in the current paradigm this is mostly about training data, though I suppose that with sufficient integration that data will eventuall... (read more)

3Expertium
Modern LLMs are already like that. They have expert or at least above-average knowledge in many domains simultaneously. They may not have developed "magical" abilities yet, but "AI that has lots of knowledge from a vast number of different domains" is something that we already see. So I think "AI that has more than one magical ability" it's a pretty straightforward extrapolation. Btw, I think it's possible that even before AGI, LLMs will have at least 2 "magical" abilities. They're getting better at Geoguessr, so we could have a Rainbolt-level LLM in a few years; this seems like the most likely first "magical" ability IMO. Superhuman forecasting could be the next one, especially once LLMs become good at finding relevant news articles in real time. Identifying book authors from a single paragraph with 99% accuracy seems like something LLMs will be able to do (or maybe even already can), though I can't find a benchmark for that. Accurately guessing age from a short voice sample is something that machine learning algorithms can do, so with enough training data, LLMs could probably do it too.

I intentionally avoided as much as possible the implication that intelligence is "only" raw IQ. But if intelligence is not on some kind of real-valued scale, what does any part of this post mean?

Cole Wyeth20-1

The post is an intuition pump for the idea that intelligence enables capabilities that look like "magic." 

It seems to me that all it really demonstrates is that some people have capabilities that look like magic, within domains where they are highly specialized to succeed. The only example that seems particularly dangerous (El Chapo) does not seem convincingly connected to intelligence. I am also not sure what the chess example is supposed to prove - we already have chess engines that can defeat multiple people at once blindfolded, including (presumab... (read more)

This point suggests alternative models for risks and opportunities from "AI". If deep learning applied to various narrow problems is a new source of various superhuman capabilities, that has a lot of implications for the future of the world, setting "AGI" aside.

Expertium2516

The only example that seems particularly dangerous (El Chapo) does not seem convincingly connected to intelligence

I'd say "being able to navigate a highly complex network of agents, a lot of which are adversaries" counts as "intelligence". Well, one form of intelligence, at least.

What makes you think that those people were able to do those things because of high levels of intelligence? It seems to me that in most cases, the reported feat is probably driven by some capability / context combination that stretches the definition of intelligence to varying degrees. For instance I would guess that El Chapo pulled that off because he already had a lot of connections and money when he got to prison. The other examples seem to demonstrate that it is possible for a person to develop impressive capabilities in a restricted domain given enough experience. 

2quetzal_rainbow
If you want example particularly connected to prisons, you can take anarchist revolutionary Sergey Nechayev, who was able to propagandize prison guards enough to connect with outside terrorist cell. The only reason why Nechayev didn't escape is because Narodnaya Volya planned assasination of Tsar and they didn't want escape to interfere.
2Mars_Will_Be_Ours
I think that high levels of intelligence make it easier to develop capabilities similar to the ones discussed in 1 and 3-5, up to a point. (I agree that El Chapo should be discounted due to the porosity of Mexican prisons) A being with an inherently high level of intelligence will be able to gather more information from events in their life and process that information more quickly, resulting in a faster rate of learning. Hence, a superintelligence will acquire capabilities similar to magic more quickly. Furthermore, the capability ceiling of a superintelligence will be higher than the capability ceiling of a human, so they will acquire magic-like capabilities impossible for humans to ever preform.

We are exactly worried about that though. It is not that AGI will be inteligent (that is the name), but that it can and probably will develop dangerous capabilities. Inteligence is the word we use to describe it, since it is associated with the ability to gain capability, but even if the AGI is sometimes kind of brute force or dumb does not mean that it cannot also have dangerous enough capabilities to beat us out. 

4Expertium
Perhaps you think of intelligence as just raw IQ. I count persuasion as a part of intelligence. After all, if someone can't put two coherent sentences together, they won't be very persuasive. Obviously being able to solve math/logic problems and being persuasive are very different things, but again, I count both as "intelligence". Of course, El Chapo had money (to bribe prison guards), which a "boxed" AGI won't have, that I agree with. I disagree that it will make a big difference.

I’ve heard it stands for “Omni”

1ghost-in-the-weights
Confusingly I believe o stands for "Omni" in the context of GPT-4o, since it's "omni-modal". Based on some quick googling, the o in o1/o3/o4 seems to emphasize that o1 was resetting the counter back to 1 (so it's more like zero1).

To what extent would a proof about AIXI’s behavior be normative advice?

Though AIXI itself is not computable, we can prove some properties of the agent - unfortunately, there are fairly few examples because of the “bad universal priors” barrier discovered by Jan Leike. In the sequential case we only know things like e.g. it will not indefinitely keep trying an action that yields minimal reward, though we can say more when the horizon is 1 (which reduces to the predictive case in a sense). And there are lots of interesting results about the behavior of Solom... (read more)

This is a generalization problem which I expect to be solved before any system achieves dangerous capabilities. It’s already been discussed at some length in these comments with Steven Byrnes. 

There are pretty strong reasons to expect that neither direction (conditioning or switching UTM) perfectly simulates the other. I think one of the two directions is known to be impossible - that conditioning cannot be replaced by switching UTM. 

2Lucius Bushnaq
I guess I wouldn't expect UTM switching to be able to express any conditioning, that wouldn't make sense since conditioning can exclude TMs and UTMs can all express any TM. But that doesn't strike me as the sort of conditioning prior knowledge of the internet would impose? Actually, now that I think about it, I guess it could be. 

I don’t see any qualitative reason that it should not count, even if it’s not terribly impressive.

Load More