Wiki Contributions

Comments

I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they're rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don't seem to have been heavily optimized for this either*, yet we're capable of forming multi-decade plans (even if sometimes poorly).

*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)

This is the kind of political reasoning that I've seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it's more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong". The premise grants that the duration of time it is aligned is long enough for the ASI to act in the world (it seems mostly timescale agnostic), so I operate on that premise. My points are not about what is most likely to actually happen, the possibility of less-than-perfect alignment being dangerous, the AI having other goals it might seek over the wellbeing of humans, or how we should act based on the information we have.

I'm not sure who are you are debating here, but it doesn't seem to be me.

First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, "works to protect" was not intended to mean "control all relevant outcomes of". I'm not sure why you would get that idea, but that certainly isn't what I think of first if someone says a person is "working to protect" something or someone. Soldiers defending a city from raiders are not violating control theory or the laws of physics. Third, the post is on the premise that "even if we created an aligned ASI", so I was working with that premise that the ASI could be aligned in a way that it deeply cared about humans. Four, I did not assert that it would stay aligned over time... the story was all about the ASI not remaining aligned. Five, I really don't think control theory is relevant here. Killing yourself to save a village does not break any laws of physics, and is well within most human's control.

My ultimate point, in case it was lost, was that if we as human intelligences could figure out an ASI would not stay aligned, an ASI could also figure it out. If we, as humans, would not want this (and the ASI was aligned with what we want), then the ASI presumably would also not want this. If we would want to shut down an ASI before it became misaligned, the ASI (if it wants what we want) would also want this.

None of this requires disassembling black holes, breaking the laws of physics, or doing anything outside of that entities' control.

I've heard of many such cases of this from EA Funds (including myself). My impression is that they only had one person working full-time managing all three funds (no idea if this has changed since I applied or not). 

An incapable man would kill himself to save the village. A more capable man would kill himself to save the village AND ensure no future werewolves are able to bite villagers again.

Though I tend to dislike analogies, I'll use one, supposing it is actually impossible for an ASI to remain aligned. Suppose a villager cares a whole lot about the people in his village, and routinely works to protect them. Then, one day, he is bitten by a werewolf. He goes to the Shammon, he tells him when the Full Moon rises again, he will turn into a monster, and kill everyone in the village. His friends, his family, everyone. And that he will no longer know himself. He is told there is no cure, and that the villagers would be unable to fight him off. He will grow too strong to be caged, and cannot be subdued or controlled once he transforms. What do you think he would do?

MIRI "giving up" on solving the problem was probably a net negative to the community, since it severely demoralized many young, motivated individuals who might have worked toward actually solving the problem. An excellent way to prevent pathways to victory is by convincing people those pathways are not attainable. A positive, I suppose, is that many have stopped looking to Yudkowsky and MIRI for the solutions, since it's obvious they have none.

I don't think this is the case. For awhile, the post with the highest karma was Paul Christiano explaining all the reasons he thinks Yudkowsky is wrong.

Fair. What would you call a "mainstream ML theory of cognition", though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis

It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of "everything else" (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, but it took several decades for it to be demonstrated to actually work. I think a lot of people were confused by why the "stack more layers" approach kept working, but under the model of connectionism, this is expected. Connectionism is kind of too general to make great predictions, but it doesn't seem to allow for FOOM-type scenarios. It also seems to favor agents as local optima satisficers, instead of greedy utility maximizers. 

"My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena."

 

Such as? I wouldn't call Shard Theory mainstream, and I'm not saying mainstream models are correct either. On human's trying to be consistent decision-makers, I have some theories about that (and some of which are probably wrong). But judging by how bad humans are at it, and how much they struggle to do it, they probably weren't optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this, even if the hardware is very stubborn at times. But even that isn't optimized too hard toward coherence. Someone might prefer pizza to hot dogs, but they probably won't always choose pizza over any other food, just because they want their preference ordering of food to be consistent. And, sure, maybe what they "truly" value is something like health, but I imagine even if they didn't, they still wouldn't do this.

 

But all of this is still just one piece on the Jenga tower. And we could debate every piece in the tower, and even get 90% confidence that every piece is correct... but if there are more than 10 pieces on the tower, the whole thing is still probably going to come crashing down. (This is the part where I feel obligated to say, even though I shouldn't have to, that your tower being wrong doesn't mean "everything will be fine and we'll be safe", since the "everything will be fine" towers are looking pretty Jenga-ish too. I'm not saying we should just shrug our shoulders and embrace uncertainty. What I want is to build non-Jenga-ish towers)

Load More