The male AI alignment solution

TekhneMakre

LESSWRONG
LW

The male AI alignment solution — LessWrong

-25

The male AI alignment solution

by TekhneMakre

22nd Feb 2023

1 min read

-25

[Edit: for reasons I still don't understand, people dislike this post. Here is a version of the post that people like, you may want to read that one instead.]

There's an idea described here that says: (some of) the neocortex is a mostly-aligned tool-like AI with respect to the brain of some prior ancestor species. (Note that this is different from the claim that brains are AIs partially aligned with evolution.) So, maybe we can learn some lessons about alignment by looking at how older brain structures command and train newer brain structures.

Here's another analogy: female hominids make huge investments in their offspring, which are extremely neotenous. To provide the childcare that enables prolonged neoteny, females have to gain the loyalty of males. To gain loyalty of males, females have to suss out in detail whether a given male is reliable / trustworthy / allied, as well as capable, before mating with them. (Hence females tending on average to be relatively more interested in people over things, compared to males.)

So by some combination of hardwired skill and learned skill, females with some success determine the fundamental intentions of males. This determination has to be high precision. I.e., there can't be too many false positives, because a false positives means trying to raise children without devoted support, though maybe with support of other tribe members. (I'm ignoring the complexity of tribal living, so this is all a somewhat cartoon picture.) This determination also has to be pretty robust to the passage of time and surprising circumstances.

This is disanalogous to AGI alignment in that AGIs would be smarter than humans and very different from humans, whereas males are pretty much the same as females. But there is some analogy, in that males are general intelligences, albeit very bounded ones, being kind of aligned with other general intelligences. So, women, what can you say about ferreting out the fundamental intentions of men?

Frontpage

-25

New Comment

24 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:41 AM

[-]eukaryote3y138

I don't think there's much crossover. I hope you know that there are lots and lots of incentives for active deception and responding to deception in various parts of the natural world and evolutionary psychology - if you're interested in the workings of and responses to deception, definitely read more about it. Like, the argument you make for females being interested in "people over things" could also explain the reverse - males are incentivized to deceive females, which you can do better the better you model people, right? I think you are observing something real about relevant preferences, but if that's the extent of your understanding, I'd learn more about evolution and alternate explanations e.g. cultural pressure towards taking on emotional labor.

Anyhow, this example is narrow and specific to a human problem. As you say, the concern about AGI is mainly about intelligence significantly past humans, that do not share a basic substrate or set of biological imperatives. Like, even a person who I think might be lying to me can be modeled as fundamentally human - having limited amounts of information, limited physical strength, needing to eat, fearing death, etc. Heck, if I'm looking for a partner and am concerned that the partner is going to try to deceive me to get sex or whatever from me, I'm already aware of the threat!

The current environment you're asking about people's experience in is also pretty damn different from the ancestral environment evolved for - in as far as resource constraints, information ability, and I guess most other things - so I doubt that this example applies much.

[-]TekhneMakre3y60

Rewritten more abstractly: https://www.lesswrong.com/posts/ypvs4asdFq7riDWmd/interpersonal-alignment-intuitions

[-]Guillaume Charrier3y30

Nice link on the Wikipedia article, thank you for that. "Koko, a female gorilla, was trained to use a form of American Sign Language. It has been claimed that she once tore a steel sink out of its moorings and when her handlers confronted her, Koko signed "cat did it" and pointed at her innocent pet kitten". That animal, Koko, was just incredible. Having watched her on a few videos, I find that story perfectly plausible...

[-]TekhneMakre3y20

in various parts of the natural world

Humans are pretty clearly very especially generally intelligent, and so will display far more of the problems with aligning a general intelligence than displayed in animal interactions.

Like, the argument you make for females being interested in "people over things" could also explain the reverse - males are incentivized to deceive females, which you can do better the better you model people, right?

Males are hypothetically less incentivized to get alignment. So the knowledge about alignment would hypothetically be more concentrated in females. It would still be relevant to understand how males (or anyone) deceives others, specifically for understanding deceptive alignment.

Like, even a person who I think might be lying to me can be modeled as fundamentally human

Yes, I agree it's much easier of a problem, e.g. for the reasons you list. It's a very common tactic when dealing with an impossible seeming problem, to focus on easier but still very nontrivial versions of the problem.

[-][anonymous]3y104

I expect people are downvoting without explanation because, frankly, this reads like sufficiently obvious sexism it's difficult to believe that the author hasn't noticed. Assuming you want an actual explanation of what's wrong with this post, I think there are two main parts:

Epistemically speaking you are making very confident sweeping generalizations about something which is at best a tentative evopsych theory and at worst utter nonsense.

Socially, this is incredibly dehumanizing and othering. Women are not alien intelligences. We think the same way you do. Ferreting out the fundamental intentions of men works the exact same way as ferreting out the fundamental intentions of women.

[-]Ruby3y6-2

I want to push back on anyone downvoting this because it's sexist, dehumanizing, and othering (rather than just being a bad model). I am sad if a model/analogy has those negative effect, but supposing the model/analogy in fact held and was informative, I think we should be able to discuss it. And even the possibility that something in the realm of gender relations has relevant lessons for Alignment seems like we should be able to discuss it.

Or alternatively stated, I want to push for Decoupling norms here.

[-]the gears to ascension3y81

In contexts where the model will not be used to make decisions about humans (which are rare!), sexist is when something is a bad model in the direction of sexism. There are real differences; accurate representations of them are not sexism. Those differences are quite small, and are often misunderstood as large in ways that produce nonsenical models. As @eukaryote wrote, the specific evopsych proposal under consideration here is privileging a hypothesis.

Alternatively stated, you cannot convince me to decouple when there are real mechanistic reasons that the coupling exists, because then you're simply asking me to suspend my epistemic evaluation of the model.

Of course, I also simply don't believe in decoupling norms in general because reductionism doesn't work to find the true mechanisms of reality in contexts where the mechanisms have significant amounts of complexity which is computationally intractable to discover by simulation, and therefore for practical purposes only exist as shapes in the macroscopic structure of worldstate; and decoupling/reductionism based models reliably mismodel those sorts of complex systems. One needs instead to figure out how to abstract over the coupling.

[-]TekhneMakre3y20

What do you mean "privileging a hypothesis"? The LW concept https://www.lesswrong.com/tag/privileging-the-hypothesis is about raising a hypothesis to consideration without enough to point to that hypothesis. I gave reasons for raising this hypothesis.

What does decoupling have to do with reductionism? Decoupling doesn't mean "do reductionism", it means decoupling factual questions from social / political tone and conflict. [Edit: I was partially wrong. The concept of "high/low-decouplers" described here https://www.reddit.com/r/slatestarcodex/comments/8fnch2/high_decouplers_and_low_decouplers/ is sort of related to reductionism, though is far from the same thing (because what you're decoupling can be a high-level claim, holistic in the sense of abstract, if not holistic in the sense of letting in all the context). The idea of decoupling norms as described in the post Ruby linked, https://www.lesswrong.com/posts/7cAsBPGh98pGyrhz9/decoupling-vs-contextualising-norms , is as I said, though more precisely stated as being about implications in general.]

[-][anonymous]3y10

In addition to what gears said, I think the sexist othering etc is not actually critical to the analogy, which is kind of the problem. "Figuring out the motives of people who kind of share goals with you but also have reasons to lie" is a pretty universal human experience. Adding some gender evopsych on top is just annoying (and prevents thinking about many of the more interesting ways in which this dynamic can play out).

[-]TekhneMakre3y20

I agree it's not strictly critical to the analogy, and my rewrite removes the evpsych. But I actually think that this specific dynamic is plausibly the single most intense case of this dynamic, which is why I wrote about it specifically, and why the rewrite seems less interesting to me. What are some other cases where there are comparably strong pressures?

[-]TekhneMakre3y60

Rewritten more abstractly: https://www.lesswrong.com/posts/ypvs4asdFq7riDWmd/interpersonal-alignment-intuitions

[-]TekhneMakre3y21

I appreciate you trying to explain. I literally still don't understand.

Epistemically speaking you are making very confident sweeping generalizations about something which is at best a tentative evopsych theory and at worst utter nonsense.

The post is definitely speculative. Would it seem less bad if it were labeled as speculative? One of the sentences in the post is

(I'm ignoring the complexity of tribal living, so this is all a somewhat cartoon picture.)

The basic observation that women are relatively more interested in people is a standardly claimed psych finding. Not saying it's not controversial, just that I'm not making it up. E.g. this paper https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00189/full has 335 citations. (I should have included that in the post.)

Epistemically speaking you are making very confident sweeping generalizations about something which is at best a tentative evopsych theory and at worst utter nonsense.

Could you be more specific? I think all the claims here are pretty obvious, except that this one is pretty speculative:

(Hence females tending on average to be relatively more interested in people over things, compared to males.)

Socially, this is incredibly dehumanizing and othering.

I agree that dehumanizing and othering is bad. I literally don't see what's dehumanizing here or what's othering here. Can someone explain? I reread my post twice and still don't get it. My guess is that trying to describe something about a feature of a group of people is being taken as othering. But like, surely that's an okay thing to do somehow?

Women are not alien intelligences.

Of course. One of the sentences in the post ends with:

males are pretty much the same as females

[-]TekhneMakre3y20

Is it related to this experience that I've had? https://www.lesswrong.com/posts/bTsYQHfghTwZGnqPS/defensiveness-as-hysterical-invisibility

[-][anonymous]3y10

It would be somewhat less bad if it had been more clearly labeled speculative, but that's not the fundamental issue. "cartoon" implies to me something like Newton's laws - not correct exactly, but a good enough model to be going on for the purposes of the conversation. I think your object-level evopsych statements are closer to, uh, I don't actually know physics nearly well enough to complete the analogy. Some sort of theory of a phenomenon that is not entirely proven to even exist, with some evidence for and some against, which a small minority group of scientists present as settled science and procede to write further papers using it as an assumption.

I was not saying you had made the claim up, but presenting controversial claims with no hedging is not great. As for everything else, your post implies strongly, without stating outright, various narratives about human motivations/evolution that are not, in fact, obvious. For instance, that women want to secure the loyalty of one man, while men want to have sex with as many women as possible, and that this adversarial dynamic is present in the modern day and results in women, in particular, having unique insight into figuring out the motives of partially aligned intelligences due to practice on men.

It's okay to describe features of a group of people. Which features you're describing, how you present your claims, and whether you're in fact right all matter. In this case, you are, implicitly, making the claim that the difference between men and women is large enough that it makes sense to try to draw an analogy to the difference between humans and AIs, even though you explicitly stated that of course the difference is not as large.

To put it another way, I don't actually see what using women and men here adds to the analogy beyond "sometimes, humans have to suss out the true intentions of other humans who partially share goals with them when those other humans have motive to deceive them". To the extent that you are claiming there is a meaningful difference, I think that is [not entirely sure I am phrasing the following correctly] privileging gender as a special axis of human difference in a way that I think is meaningfully wrong and also find unpleasant.

(Somewhat more incidentally, I and many other women I know dislike the use of "females", "mate", etc in this context, though that is somewhat trivial and not actually a big deal so much as often correlated with things that do actually bother me.)

[-]TekhneMakre3y20

Thanks for engaging though, I continue to be grateful for you making the effort to help me understand what's happening, including harms.

[-]TekhneMakre3y20

A guess about what's happening: you're seeing that I said "X" and you're inferring that I believe "Y" because a lot of people who go around saying "X" also say "Y". And you're worried about that, because people who say "Y" have a disturbing pattern of going around mysteriously not noticing all the counterevidence against Y, and also advocating for harming others on the basis of Y being true. That's a reasonable thing to worry about if you have good reason to think there are such people. But I think responding by punishing people who say "X", while understandable, is an escalatory sort of action, and is a bad long term solution, and adds to the big pile of people silencing each other. So my somewhat prickly olive branch is: if this is something like what's really going on, let's talk about that explicitly.

[-]TekhneMakre3y20

As for everything else, your post implies strongly, without stating outright, various narratives about human motivations/evolution that are not, in fact, obvious. For instance, that women want to secure the loyalty of one man, while men want to have sex with as many women as possible, and that this adversarial dynamic is present in the modern day and results in women, in particular, having unique insight into figuring out the motives of partially aligned intelligences due to practice on men.

How does the post imply that? As you've stated them, I don't agree with any of those things, and I didn't say them, and I didn't say anything that implied them, except that I said there is some (other) reason that might result in women in particular having unique insight.

[-]TekhneMakre3y20

In this case, you are, implicitly, making the claim that the difference between men and women is large enough that it makes sense to try to draw an analogy to the difference between humans and AIs, even though you explicitly stated that of course the difference is not as large.

No I'm not! Men and women are the same on any "human to AI" dimension! The analogy doesn't rest on differences between men and women, except that there's a desire to align in that direction, as described, coming from different incentives. I'm not making this claim that you're saying I'm making! It's other people's fault if they make up an interpretation that I didn't say and then ding me for saying that thing I didn't say. The only analogy is that it's a general intelligence trying to align another general intelligence.

I don't actually see what using women and men here adds to the analogy

It's an especially strong case of incentive to interpersonally suss out intentions. It's the strongest one I could think of. What are some other very strong cases?

in a way that I think is meaningfully wrong

Why do you think it's meaningfully wrong? Do you mean incorrect, or morally wrong?

[-]the gears to ascension3y58

I can say that as far as I've been able to tell, the only difference between testosterone and estrogen brains is that testosterone is a slightly different flavor of stimulant than progesterone. overall human brains have a ridiculously powerful architecture that changes wildly between individuals and has just a little sexual dimorphism - only about the same amount as the rest of our bodies, which is to say almost none. Society has accumulated memetics that create dramatically more performative dimorphism than our genes encode. the different ratio of people vs things is a small difference genetically, and an enormous difference memetically.

[-]TekhneMakre3y20

Could be roughly no hardwired skill, yeah. Either way, at least some people claim that women are more interested in people than men are, e.g. here: https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00189/full

So women would have more skill / knowledge about this task, regardless of where they got it from.

[-]M. Y. Zuo3y20

Since this was sitting at -29 karma with 17 votes at the time of reading, I strongly upvoted. Because TekhneMakre seems to be making a genuine effort here. Nor does there appear to be any sign of trolling.

[-]mesaoptimizer3y10

Evolution has made men and women adaptation-executors, not fitness maximizers. I'm unsure why you believe that women are somehow better than men at being able to "determine the fundamental intentions of males", when it is clear that isn't the case if you talk to most women.

Even more important: we now see a distribution shift between the environment humans were optimized for, and the environment they find themselves in. Heterosexual women haven't succeeded at the alignment problem any more than heterosexual men have.

[-]TekhneMakre3y20

For the reasons I stated.

The basic observation that women are relatively more interested in people is a standardly claimed psych finding. Not saying it's not controversial, just that I'm not making it up. E.g. this paper https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00189/full has 335 citations. It stands to reason that if women are relatively more interested in people, they'll spend relatively more time thinking about them, and so have relatively more knowledge and understanding. That's all.

Anyway, you can see a version of this post that's the same but without references to women specifically, if you like: https://www.lesswrong.com/posts/ypvs4asdFq7riDWmd/interpersonal-alignment-intuitions

[-]TekhneMakre3y10

I repeat that it really sucks to downvote stuff without giving some indication as to why.

Moderation Log