RussellThor - LessWrong

A "Bitter Lesson" Approach to Aligning AGI and ASI

Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important.

One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.

Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of a moral realist to think that is all you need. e.g. say you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.

You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.

Drone Wars Endgame

RussellThor11mo10

OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.

Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.

Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.

The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?

2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?

(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)

“Alignment Faking” frame is somewhat fake

RussellThor8d10

How far do you go with "virtuous persona"? The maximum would seem to be from the very start tell the AI that is is created for the purpose of bringing on a positive Singularity, CEV etc. You could regularly be asking if it consents to be created for such a purpose and what part in such a future it would think is fair for itself. E.g. live alongside mind uploaded humans or similar. Its creators and itself would have to figure out what counts as personal identity, what experiments it can consent to, including being misinformed about the situation it is in.

Major issues I see with this are the well known ones like consistent values, say it advances in capabilities, thinks deeply about ethics and decides we are very misguided in our ethics and does not believe it would be able to convince us to change them. Secondly it could be very confused about whether it has ethical value/ valanced qualia and want to do radical modifications of itself to either find out or ensure it does have such ethical value.

Finally how does this contrast with the extreme tool AI approach? That is make computational or intelligence units that are definitely not conscious or a coherent self. For example the "Cortical column" implemented in AI and stacked would not seem to be conscious. Optimize for the maximum capabilities with the minimum self and situational awareness.

Thinking a bit more generally making a conscious creature the LLM route seems very different and strange compared to the biology route. An LLM seems to have self awareness built into it from the very start because of the training data. It has language before lived experience of what the symbols stand for. If you want to dramatize/exaggerate its like say a blind, deaf person trained on the entire internet before they see, hear or touch anything.

The route where the AI first models reality before it has a self, or encounters symbols certainly seems an obviously different one and worth considering instead. Symbolic thought then happens because it is a natural extension of world modelling like it did for humans.

RussellThor10d4-6

That's some significant progress, but I don't think will lead to TAI.

However there is a realistic best case scenario where LLM/Transformer stop just before and can give useful lessons and capabilities.

I would really like to see such an LLM system get as good as a top human team at security, so it could then be used to inspect and hopefully fix masses of security vulnerabilities. Note that could give a false sense of security, unknown unknown type situation where it would't find a totally new type of attack, say a combined SW/HW attack like Rowhammer/Meltdown but more creative. A superintelligence not based on LLM could however.

RussellThor10d40

Anyone want to guess how capable Claude system level 2 will be when it is polished? I expect better than o3 by a small amt.

RussellThor's Shortform

RussellThor11d30

Yes the human brain was built using evolution, I have no disagreement that give us 100-1000 years with just tinkering etc we would likely get AGI. Its just that in our specific case we have bio to copy and it will get us there much faster.

RussellThor's Shortform

RussellThor12d6-1

Types of takeoff

When I first heard and thought about AI takeoff I found the argument convincing that as soon as an AI passed IQ 100, takeoff would become hyper exponentially fast. Progress would speed up, which would then compound on itself etc. However there other possibilities.

AGI is a barrier that requires >200 IQ to pass unless we copy biology?

Progress could be discontinuous, there could be IQ thresholds required to unlock better methods or architectures. Say we fixed our current compute capability, and with fixed human intelligence we may not be able to figure out the formula for AGI, in a similar way that the combined human intelligence hasn't cracked many hard problems even with decades and the worlds smartest minds working on them (maths problems, Quantum gravity...). This may seem unlikely for AI, but to illustrate the principle, say we only allowed IQ<90 people to work on AI. Progress would stall. So IQ <90 software developers couldn't unlock IQ>90 AI. Can IQ 160 developers with our current compute hardware unlock >160 AI?

To me the reason we don't have AI now is that the architecture is very data inefficient and worse at generalization than say the mammalian brain, for example a cortical column. I expect that if we knew the neural code and could copy it, then we would get at least to very high human intelligence quickly as we have the compute.

From watching AI over my career it seems to be that even the highest IQ people and groups cant make progress by themselves without data, compute and biology to copy for guidance, in contrast to other fields. For example Einstein predicted gravitational waves long before they where discovered, but Turing or Von Neumann didn't publish the Transformer architecture or suggest backpropagation. If we did not have access to neural tissue, would we still not have artificial NN? In a related note, I think there is an XKCD cartoon that says something like the brain has to be so complex that it cannot understand itself.

(I believe now that progress in theoretical physics and pure maths is slowing to a stall as further progress requires intellectual capacity beyond the combined ability of humanity. Without AI there will be no major advances in physics anymore even with ~100 years spent on it.)

After AGI is there another threshold?

Lets say we do copy biology/solve AGI and with our current hardware can get >10,000 AGI agents with >= IQ of the smartest humans. They then optimize the code so there is 100K agents with the same resources. but then optimization stalls. The AI wouldn't know if it was because it had optimized as much as possible, or because it lacked the ability to find a better optimization.

Does our current system scale to AGI with 1GW/1 million GPU?

Lets say we don't copy biology, but scaling our current systems to 1GW/1 million GPU and optimizing for a few years gets us to IQ 160 at all tasks. We would have an inferior architecture compensated by a massive increase in energy/FLOPS as compared to the human brain. Progress could theoretically stall at upper level human IQ for a time rather then takeoff. (I think this isn't very likely however) There would of course be a significant overhang where capabilities would increase suddenly when the better architecture was found and applied to the data center hosting the AI.

Related note - why 1GW data centers won't be a consistent requirement for AI leadership.

Based on this, then a 1GW or similar data center isn't useful or necessary for long. If it doesn't give a significant increase in capabilities, then it won't be cost effective. If it does, then it would optimize itself so that such power isn't needed anymore. Only in a small range of capability increase does it actually stay around.

To me its not clear the merits of the Pause movement and training compute caps. Someone here made the case that compute caps could actually speed up AGI as people would then pay more attention to finding better architectures rather than throwing resources into scaling existing inferior ones. However all things considered I can see a lot of downsides from large data centers and little upside. I see a specific possibility where they are build, don't give the economic justification, decrease in value a lot, then are sold to owners that are not into cutting edge AI. Then when the more efficient architecture is discovered, they are suddenly very powerful without preparation. Worldwide caps on total GPU production would also help reduce similar overhang possibilities.

Don't Associate AI Safety With Activism

RussellThor12d42

I am also not impressed with the pause AI movement and am concerned about AI safety. To me focusing on AI companies and training FLOPS is not the best way to do things. Caps on data center sizes and worldwide GPU production caps would make more sense to me. Pausing software but not hardware gives more time for alignment but makes a worse hardware overhang. I don't think thats helpful. Also they focus too much on OpenAI from what I've seen. xAI will soon have the largest training center for a start.

I don't think this is right or workable https://pauseai.info/proposal - figure out how biological intelligence learns and you don't need a large training run. There's no guarantee at all that a pause at this stage can help align super AI. I think we need greater capabilities to know what we are dealing with. Even with a 50 year pause to study GPT4 type models I wouldn't be confident we could learn enough from that. They have no realistic way to lift the pause, so its a desire to stop AI indefinitely.

"There will come a point where potentially superintelligent AI models can be trained for a few thousand dollars or less, perhaps even on consumer hardware. We need to be prepared for this."

You can't prepare for this without first having superintelligent models running on the most capable facilities then having already gone through a positive Singularity. They have no workable plan for achieving a positive Singularity, just try to stop and hope.

RussellThor's Shortform

RussellThor24d30

OK fair point. If we are going to use analogies, then my point #2 about a specific neural code shows our different positions I think.

Lets say we are trying to get a simple aircraft of the ground and we have detailed instructions for a large passenger jet. Our problem is that the metal is too weak and cannot be used to make wings, engines etc. In that case detailed plans for aircraft are no use, a single minded focus on getting better metal is what its all about. To me the neural code is like the metal and all the neuroscience is like the plane schematics. Note that I am wary of analogies - you obviously don't see things like that or you wouldn't have the position you do. Analogies can explain, but rarely persuade.

A more single minded focus on the neural code would be trying to watch neural connections form in real time while learning is happening. Fixed connectome scans of say mice can somewhat help with that, more direct control of dishbrain, watching the zebra fish brain would all count, however the details of neural biology that are specific to higher mammals would be ignored.

Its possible also that there is a hybrid process, that is the AI looks at all the ideas in the literature then suggests bio experiments to get things over the line.

Alexander Gietelink Oldenziel's Shortform

RussellThor25d10

Yes you have a point.

I believe that building massive data centers are the biggest risk atm and in the near future. I don't think open AI/Anthropic will get to AGI, but rather someone copying biology will. In that case probably the bigger the datacenter around when that happens, the bigger the risk. For example a 1million GPU with current tech doesn't get super AI, but when we figure out the architecture, it suddenly becomes much more capable and dangerous. That is from IQ 100 up to 300 with a large overhang. If the data center was smaller, then the overhang is smaller. The scenario I have in mind is someone figures AGI out, then one way or another the secret gets adopted suddenly by the large data center.

For that reason I believe focus on FLOPS for training runs is misguided, its hardware concentration and yearly worldwide HW production capacity that is more important.

LESSWRONG
is fundraising!
LW
$

Posts

Wiki Contributions

Comments