LESSWRONG
LW

1105
Signer
595Ω314370
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
4Signer's Shortform
2y
1
4Signer's Shortform
2y
1
[Question] What the discontinuity is, if not FOOM?
Signer14h10

If all AIs are scheming, they can take over together. If a world with a powerful AI that is actually on humanity's side is assumed instead, then at some level of power of friendly AI you probably can run unaligned AI and it will not be able to do much harm. But just assuming there being many AIs doesn't solve scheming by itself - if training actually works as bad as predicted, then no AI of many would be aligned enough.

Reply
Beyond the Zombie Argument
Signer1d10

Russelian monism struggles with Epiphenomenality: if the measurable, structural properties are sufficient to predict what happens, the the phenomenal properties are along for the ride.

I mean, it's monism - it supposed to only has one type of stuff, obviously structural properties only work, because of underlying phenomenal/physical substrate.

furthermore, since mental states are ultimately identical to physical brain states, they share the causal powers of brain states (again without the need to posit special explanatory apparatus such as “psychophysical laws”), and in that way epiphenomenalism is avoided.

I don't see how having two special maps has anything to do with monistic ontology, that enables casual closure. What's the problem with just having neutral-monistic ontology, like you say Dual-aspect neutral monism has, and use normal physical epistemology?

the epistemic irreducibility of the mental to the physical is also accepted.

Why? If ontologically there is only one type of stuff, then you can reduce mental description to physical, because they describe one reality. Same way you reduce old physical theory to a new one.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1d10

Why would that be discontinuous?

Because incremental progress missed deception.

I’m arguing against 99%

I agree such confidence lacks justification.

Reply
Beyond the Zombie Argument
Signer1d1-9

I don't think there is a need to qualify it as a potential solution - Russellian Monism just solves the Hard Problem.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1d10

I don't think anyone is against incremental progress. It's just that if after incremental progress AI takes over, then it's not good enough alignment. And what's the source of confidence in it being enough?

"Final or nonexistent" seems to be appropriate for scheming detection - if you missed only one way for AI to hide it's intentions, it will take over. So yes, degree of scheming in broad sense and how much you can prevent it is a crux and other things depend on it. Again, I don't see how you can be confident that future AI wouldn't scheme.

Reply
A non-review of "If Anyone Builds It, Everyone Dies"
Signer2d20

I just think that it wouldn’t be the case that we had one shot but we missed it, but rather had many shots and missed them all.

This interpretation only works if by missed shots you mean "missed opportunities to completely solve alignment". Otherwise you can observe multiple failures along the way and fix observable scheming, but you only need to miss one alignment failure on the last capability level. The point is just that your monitoring methods, even improved after many failures to catch scheming in pre-takeover regime, are finally tested only when AI is really can take over. Because real ability to take over is hard to fake. And you can't repeat this test after you improved your monitoring, if you failed. Maybe your alignment training after previous observed failure in pre-takeover regime really made AI non-scheming. But maybe you just missed some short thought where AI decided to not think about takeover since it can't win yet. And you'll need to rely on your monitoring without actually testing whether it can catch all such possibilities that depend on actual environment that allows takeover.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer2d90

if ASI is developed gradually , alignment can be tweaked as you go along.

The whole problem is that alignment, as in "AI doesn't want to take over in a bad way" is not assumed to be solved. So you think your alignment training works for your current version of pre-takeover ASI, but actually previous versions already schemed for a long time, so running a version capable of takeover suddenly for you creates a discontinuity, where ASI takes over because it now can. It means all your previous alignment work and scheming detection is finally tested when you run a version capable of takeover and you can only fail once on this test. And training against scheming is predicted to not work and just create stealthier schemers. And "AI can take over" is predicted to be hard to fake for AI so you can't confidently check for scheming just by observing what it would do in fake environment.

Reply
Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
Signer2d88

The technical intuitions we gained from this process, is the real reason for our particularly strong confidence in this problem being hard.

I don't understand why anyone would expect such reason to be persuasive to other people. Like, to rely on illegible intuitions in the matters of human extinction just feels crazy. Yes, certainty doesn't matter, we need to stop either way. But still - is it even rational to be so confident when you rely on illegible intuitions? Why don't check yourself with something more robust, like actually writing your hypotheses, reasoning, and counting evidence? Sure there is something better than saying "I base my extreme confidence on intuitions".

And it's not only about corrigibility - "you don’t get what you train for" being universal law of intelligence in the real world, or utility maximization, especially in the limit, being good model of real things, or pivotal real world science being definitely so hard you can't possibly be distracted even once and still figure it out - everything is insufficiently justified.

Reply2
The title is reasonable
Signer11d21

Once you do that, it’s a fact of the universe, that the programmers can’t change, that “you’d do better at these goals if you didn’t have to be fully obedient”, and while programmers can install various safeguards, those safeguards are pumping upstream and will have to pump harder and harder as the AI gets more intelligent. And if you want it to make at least as much progress as a decent AI researcher, it needs to be quite smart.

Is there a place where this whole hypothesis about deep laws of intelligence is connected to reality? Like, how hard they have to pump? What's exactly the evidence that they will have to pump harder? Why "quite smart" point can't be when safeguards still work? Right now it's not different from saying "world is NP-hard, so ASI will have to try harder and harder to solve problems, and killing humanity is quite hard".

If there were a natural shape for AIs that let you fix mistakes you made along the way, you might hope to find a simple mathematical reflection of that shape in toy models. All the difficulties that crop up in every corner when working with toy models are suggestive of difficulties that will crop up in real life; all the extra complications in the real world don’t make the problem easier.

If there were a natural shape for AIs that don't wirehead, you might hope to find a simple mathematical reflection of that shape in toy models. So MIRI failing to find such a model means NNs are anti-natural. Again, what's the justification for significant update from MIRI failing to find a mathematical model?

Reply
Arguments About AI Consciousness Seem Highly Motivated And At Best Overconfident
[+]Signer1mo-5-5
Load More