RussellThor

Wikitag Contributions

Comments

Sorted by

Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important. 

One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.

Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of  a moral realist to think that is all you need. e.g. say  you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.

You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.

OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.

Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.

Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.

The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?

2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?

(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)

Good work, thinking ahead to TAI - Is there an issue with the self-other distinction when it comes to moral weightings. For example if it weights the human as self rather than other, but with 1% of its own moral weighting, then its behavior would still not be what we want. Additionally are there situations or actions it could take where it would increase its relative moral weight further - the most obvious just being taking more compute. This is assuming it believed it was conscious and of similar moral value to us. Finally if it suddenly decided it had moral value or much more moral value then it previously did, then that would still give a sudden change in behavior.

Yes to much  of this.  For small tasks or where I don't have specialist knowledge I can get 10* speed increase - on average I would put 20%. Smart autocomplete like Cursor is undoubtably a speedup with no apparent downside. The LLM is still especially weak where I am doing data science or algorithm type work where you need to plot the results and look at the graph to  know if you are making progress.

Things the OP is concerned about like 

"What makes this transition particularly hard to resist is that pressures on each societal system bleed into the others. For example, we might attempt to use state power and cultural attitudes to preserve human economic power. However, the economic incentives for companies to replace humans with AI will also push them to influence states and culture to support this change, using their growing economic power to shape both policy and public opinion, which will in turn allow those companies to accrue even greater economic power."

This all gets easier the smaller the society is. Coordination problems get harder the more parties involved. There will be pressure from motivated people to make the smallest viable colony in terms of people, which makes it easier to resist such things. For example there is much less effective cultural influence from the AI culture if the colony is founded by a small group of people with shared human affirming culture. Even if 99% of the state they come from is disempowered if small numbers can leave, they can create their own culture and set it up to be resistant to such things. Small groups of people have left decaying cultures throughout history and founded greater empires.

The  OP is specifically about gradual disempowerment. Conditional on gradual disempowerment, it would help and not be decades away. Now we may both think that sudden disempowerment is much more likely. However in a gradual disempowerment world, such colonies would be viable much sooner as AI could be used to help build them, in the early stages of such disempowerment when humans could still command resources. 

In a gradual disempowerment scenario vs no super AI scenario, humanities speed to deploy such colonies starts the same before AI can be used, then increases significantly compared to the no AI world as AI becomes available but before significant disempowerment, then drops to zero with complete disempowerment. The space capabilities area under the curve in the gradual disempowerment scenario is ahead of baseline for some time, enabling viable colonies to be constructed sooner than if there was no AI.

Space colonies are a potential way out - if a small group of people can make their own colony then they start out in control. The post assumes a world like it is now where you can't just leave. Historically speaking that is perhaps unusual - much of the time in the last 10,000 years it was possible for some groups to leave and start anew.

This does seem different however https://solarfoods.com/ - they are competing with food not fuel which can't be done synthetically (well if at all). Also widely distributed capability like this helps make humanity more resilient e.g. against nuke winter, extreme climate change, space habitats

Thanks for this article, upvoted.

Firstly Magma sounds most like Anthropic, especially the combination of Heuristic #1 Scale AI capabilities and also publishing safety work.

In general I like the approach, especially the balance between realism and not embracing fatalism. This is opposed to say MIRI, Pause AI and at the other end, e/acc. (I belong to EA, however they don’t seem to have a coherent plan I can get behind) I like the realization that in a dangerous situation doing dangerous things can be justified. Its easy to be “moral” and just say “stop” however its another matter entirely if that helps now.

I consider the pause around TEDAI to be important, though I would like to see it just before TEDAI (>3* alignment speed) not after. I am unsure how to achieve such a thing, do we have to lay the groundwork now? When I suggest such a thing elsewhere on this site, however it gets downvoted:

https://www.lesswrong.com/posts/ynsjJWTAMhTogLHm6/?commentId=krYhuadYNnr3deamT

Goal #2: Magma might also reduce risks posed by other AI developers

In terms of what people not directly doing AI research can do, I think a lot can be done reducing risks by other AI models. To me it would be highly desirable if AI (N-1) is deployed as quickly as possible into society and understood while AI(N) is still being tested. This clearly isn’t the case with critical security. Similarly,

AI defense: Harden the world against unsafe AI 

In terms of preparation, it would be good if critical companies were required to quickly deploy AGI security tools as they become available. That is have the organization setup so that when new capabilities emerge and the new model finds potential vulnerabilities, experts in the company quickly assess them, and deploy timely fixes.

Your idea of acquiring market share in high risk domains? Haven't seen that mentioned before. It seems hard to pull off - hard to gain share in electricity grid software or similar.

Someone will no doubt bring up the more black hat approach to harden the world:

Soon after a new safety tool is released, a controlled hacking agent takes down a company in a neutral country with a very public hack, with the message if you don’t asap use these security tools, then all other similar companies suffer and they have been warned.

Thats not a valid criticism if we are simply about choosing one action to reduce X-risk. Consider for example the cold war - the guys with nukes did the most to endanger humanity however it was most important that they cooperated to reduce it.

Load More