All of tailcalled's Comments + Replies

Actually, even if your personality is good enough, you should probably still pretend to be Flynn Rider, because his personality is better. It was, after all, carefully designed by a crack team of imagineers. Was yours? Didn't think so.

Personalities don't just fall into a linear ranking from worse to better.

Imagineers' job isn't to design a good personality for a friendless nerd, it's to come up with children's stories that inspire and entertain parents and which they proudly want their children to consume.

The parents think they should try to balance the de... (read more)

I'm not trying to present johnswentworth's position, I'm trying to present my position.

tailcalled3-20

The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it's all fake.

2Steven Byrnes
(IMO this is kinda unrelated to the OP, but I want to continue this thread.) Have you elaborated on this anywhere? Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.”  ;-) I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
1Kajus
I don't think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real

The big picture is plausible but one major error you make is assuming "academics" will be a solid bastion of opposition. My understanding is that academics are often some of the first ones to fall (like when teachers struggle with students who use ChatGPT to cheat on homework), and many of the academic complaints about AI are just as slop-y as what the AI produces.

Maybe someone who believes in following the will of the majority even if he/she disagrees (and could easily become a dictator)?

Do you mean "resigns from a presidential position/declines a dictatorial position because they disagree with the will of the people" or "makes policy they know will be bad because the people demand it"?

Maybe a good parent who listens to his/her child's dreams?

Can you expand on this?

1Knight Lee
Maybe someone like George Washington who was so popular he could easily stay in power, but still chose to make America democratic. Let's hope it stays democratic :/ No human is 100% corrigible and would do anything that someone else wants. But a good parent might help his/her child get into sports and so forth but if the child says he/she wants to be a singer instead the parent helps him/her on that instead. The outcome the parent wants depends on what the child wants, and the child can change his/her mind.

Can you give 1 example of a person choosing to be corrigible to someone they are not dependent upon for resources/information and who they have much more expertise than?

1Knight Lee
* Maybe someone who believes in following the will of the majority even if he/she disagrees (and could easily become a dictator)? * Maybe a good parent who listens to his/her child's dreams? Very good question though. Humans usually aren't very corrigible, and there aren't many examples!

I feel like "evil" and "corruption" mean something different.

Corruption is about selfish people exchanging their power within a system for favors (often outside the system) when they're not supposed to according to the rules of the system. For example a policeman taking bribes. It's something the creators/owners of the system should try to eliminate, but if the system itself is bad (e.g. Nazi Germany during the Holocaust), corruption might be something you sometimes ought to seek out instead of to avoid, like with Schindler saving his Jews.

"Evil" I've in t... (read more)

If the AI can't do much without coordinating with a logistics and intelligence network and collaborating with a number of other agents, and its contact to this network routes through a commanding agent that is as capable if not more capable than the AI itself, then sure, it may be relatively feasible to make the AI corrigible to said commanding agent, if that is what you want it to be.

(This is meant to be analogous to the soldier-commander example.)

But was that the AI regime you expect to find yourself working with? In particular I'd expect you expect that the commanding agent would be another AI, in which case being corrigible to them is not sufficient.

1Knight Lee
Oops I didn't mean that analogy. It's not necessarily a commander, but any individual that a human chooses to be corrigible/loyal to. A human is capable of being corrigible/loyal to one person (or group), without accruing the risk of listening to prompt injections, because a human has enough general intelligence/common sense to know what is a prompt injection and what is a request from the person he is corrigible/loyal to. As AI approach human intelligence, they would be capable of this too.

Discriminating on the basis of the creators vs a random guy on the street helps with many of the easiest cases, but in an adversarial context, it's not enough to have something that works for all the easiest cases, you need something that can't predictably made to fail by a highly motivated adversary.

Like you could easily do some sort of data augmentation to add attempts at invoking the corrigibility system from random guys on the street, and then train it not to respond to that. But there'll still be lots of other vulnerabilities.

1Knight Lee
I still think, once the AI approaches human intelligence (and beyond), this problem should start to go away, since a human soldier can choose to be corrigible to his commander and not the enemy, even in very complex environments. I still feel the main problem is "the AI doesn't want to be corrigible," rather than "making the AI corrigible enables prompt injections." It's like that with humans. That said, I'm highly uncertain about all of this and I could easily be wrong.

Let's say you are using the AI for some highly sensitive matter where it's important that it resists prompt-hacking - e.g. driving a car (prompt injections could trigger car crashes), something where it makes financial transactions on the basis of public information (online websites might scam it), or military drones (the enemy might be able to convince the AI to attack the country that sent it).

A general method for ensuring corrigibility is to be eager to follow anything instruction-like that you see. However, this interferes with being good at resisting prompt-hacking.

1Knight Lee
I think the problem you mention is a real challenge, but not the main limitation of this idea. The problem you mention actually decreases with greater intelligence and capabilities, since a smarter AI clearly understands the concept of being corrigible to its creators vs. a random guy on the street, just like a human does. The main problem is still how reinforcement learning trains the AI behaviours which actually maximize reward, while corrigibility only trains the AI behaviours which appear corrigibile.

My current best guess is that:

  • Like for most other concepts, we don't have rigorous statistics and measurements showing that there is a natural clustering of autism symptoms, (there are some non-rigorous ones though)
  • When various schools of psychotherapy, psychiatry and pediatrics sorted children with behavioral issues together, they often ended up with an autistic group,
  • Each school has their own diagnosis on what exactly is wrong in the case of autism, and presumably they aren't all correct about all autistic people, so to know the True Reason autism is "a
... (read more)

https://www.lesswrong.com/posts/gebzzEwn2TaA6rGkc/deep-learning-systems-are-not-less-interpretable-than-logic

The assumption of virtue ethics isn't that virtue is unknown and must be discovered - it's that it's known and must be pursued.

If it is known, then why do you not ever answer my queries about providing an explicit algorithm for converting intelligence into virtuous agency, instead running in circles about how There Must Be A Utility Function!?

If the virtuous action, as you posit, is to consume ice cream, intelligence would allow an agent to acquire more ice cream, eat more over time by not making themselves sick, etc.

I'm not disagreeing with this, I'm sayi... (read more)

No, that's not my argument.

Let's imagine that True Virtue is seeking and eating ice cream, but that you don't know what true virtue is for some reason.

Now let's imagine that we have some algorithm for turning intelligence into virtuous agency. (This is not an assumption that I'm willing to grant (since you haven't given something like argmax for virtue), and really that's the biggest issue with my proposal, but let's entertain it to see my point.)

If the algorithm is run on the basis of some implementation of intelligence that is not good enough, then the r... (read more)

2Davidmanheim
The assumption of virtue ethics isn't that virtue is unknown and must be discovered - it's that it's known and must be pursued. If the virtuous action, as you posit, is to consume ice cream, intelligence would allow an agent to acquire more ice cream, eat more over time by not making themselves sick, etc. But any such decision algorithm, for a virtue ethicist, is routing through continued re-evaluation of whether the acts are virtuous, in the current context, not embracing some farcical LDT version of needing to pursue ice cream at all costs. There is an implicit utility function which values intelligence, but it's not then inferring back what virtue is, as you seem to claim. Your assumption, which is evidently that the entire thing turns into a compressed and decontextualized utility function ("algorithm") is ignoring the entire hypothetical.

I didn't say you need to understand what an argument is, I said you need to understand your own argument.

It is true that if the utility functions cover a sufficiently broad set of possibilities, any "reasonable" policy (for a controversial definition of "reasonable") maximizes a utility function, and if the utility functions cover an even broader set of possibilities, literally any policy maximizes a utility function.

But, if you want to reference these facts, you should know why they are true. For instance, here's a rough sketch of a method for finding a u... (read more)

2Davidmanheim
OK, so your argument against my claim is that a stupid and biased decision procedure wouldn't know that intelligence would make it more effective at being virtuous. And sure, that seems true, and I was wrong to assert unconditionally that "for virtue ethics, the derivative of that utility with respect to intelligence is positive." I should have instead clarified that I meant that any not idiotic virtue ethics decision procedure would have a positive first derivative in intelligence - because as your claim seems to admit, a less stupid decision procedure would not make that mistake, and would then value intelligence as it bootstrapped its way to greater intelligence.

I'm showing that the assumptions necessary for your argument don't hold, so you need to better understand your own argument.

2Davidmanheim
I understand what an argument is, but I don't understand why you think that converting policies to.utility functions needs to assume no systematic errors, or why, if true, that would make it incompatible with varying intelligence.

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

2Davidmanheim
I don't understand your argument here.

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circum... (read more)

I didn't claim virtue ethics says not to predict consequences of actions. I said that a virtue is more like a procedure than it is like a utility function. A procedure can include a subroutine predicting the consequences of actions and it doesn't become any more of a utility function by that.

The notion that "intelligence is channeled differently" under virtue ethics requires some sort of rule, like the consequentialist argmax or Bayes, for converting intelligence into ways of choosing.

2Davidmanheim
Yes, virtue ethics implies a utility function, because anything that outputs decisions implies a utility function. In this case, I'm noting that for virtue ethics, the derivative of that utility with respect to intelligence is positive. 
tailcalled18-3

Consequentialism is an approach for converting intelligence (the ability to make use of symmetries to e.g. generalize information from one context into predictions in another context or to e.g. search through highly structured search spaces) into agency, as one can use the intelligence to predict the consequences of actions and find a policy which achieves some criterion unusually well.

While it seems intuitively appealing that non-consequentialist approaches could be used to convert intelligence into agency, I have tried a lot and not been able to come up ... (read more)

4Davidmanheim
I think this is confused about how virtue ethics works. Virtue ethics  is centered on the virtues of the moral agent, but it certainly does not say not to predict consequences of actions. In fact, one aspect of virtue, in the Aristotelian system, is "practical wisdom," i.e. intelligence which is critical for navigating choices - because practical wisdom includes an understanding of what consequences will follow actions. It's more accurate to say that intelligence is channeled differently — not toward optimizing outcomes, but toward choosing in a way consistent with one's virtues. And even if virtues are thought of as policies, as in the "loyal friend" example, the policies for being a good friend require interpretation and context-sensitive application. Intelligence is crucial for that.

Not sure what you mean. Are you doing a definitional dispute about what counts as the "standard" definition of Bayesian networks?

Your linked paper is kind of long - is there a single part of it that summarizes the scoring so I don't have to read all of it?

Either way, yes, it does seem plausible that one could create a market structure that supports latent variables without rewarding people in the way I described it.

1Abhimanyu Pallavi Sudhir
No; I mean a standard Bayesian network wouldn't work for latents.

I'm not convinced Scott Alexander's mistakes page accurately tracks his mistakes. E.g. the mistake on it I know the most about is this one:

56: (5/27/23) In Raise Your Threshold For Accusing People Of Faking Bisexuality, I cited a study finding that most men’s genital arousal tracked their stated sexual orientation (ie straight men were aroused by women, gay men were aroused by men, bi men were aroused by either), but women’s genital arousal seemed to follow a bisexual pattern regardless of what orientation they thought they were - and concluded that althou

... (read more)
1Mo Putera
Thanks, good example.

I mean I don't really believe the premises of the question. But I took "Even if you're not a fan of automating alignment, if we do make it to that point we might as well give it a shot!" to imply that even in such a circumstance, you still want me to come up with some sort of answer.

Answer by tailcalled0-7

Life on earth started 3.5 billion years ago.  Log_2(3.5 billion years/1 hour) = 45 doublings. With one doubling every 7 months, that makes 26 years, or in 2051.

(Obviously this model underestimates the difficulty of getting superalignment to work. But also extrapolating the METR trend is questionable for 45 doublings is dubious in an unknown direction. So whatever.)

1Christopher King
You're saying that if you assigned 1 human contractor the task of solving superalignment, they would succeed after ~3.5 billion years of work? 🤔 I think you misunderstood what the y-axis on the graph is measuring.

I talk to geneticists (mostly on Twitter, or rather now BlueSky) and they don't really know about this stuff.

(Presumably there exists some standard text about this that one can just link to lol.)

I don't think so.

I'm still curious whether this actually happens.... I guess you can have the "propensity" be near its ceiling.... (I thought that didn't make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes

... (read more)
2TsviBT
How confident are you / why do you think this? (It seems fairly plausible given what I've heard about the field of genomics, but still curious.) E.g. "I have a genomics PhD" or "I talk to geneticists and they don't really know about this stuff" or "I follow some twitter stuff and haven't heard anyone talk about this". Ok I'm too tired to follow this so I'll tap out of the thread for now. Thanks again!

Ok, more specifically, the decrease in the narrowsense heritability gets "double-counted" (after you've computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.

2TsviBT
Ah... ok I think I see where that's going. Thanks! (Presumably there exists some standard text about this that one can just link to lol.) I'm still curious whether this actually happens.... I guess you can have the "propensity" be near its ceiling.... (I thought that didn't make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...

It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn't decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you'd get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.

2TsviBT
Why not? Shuffling around the second chunk, while the first chunk is already high, doesn't do anything, and therefore does not contribute phenotypic variance to broadsense heritability.

If some amount of heritability is from the second chunk, then to that extent, there's a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you'd see these pairs of people and then you'd find out how specifically the second chunk affects the trait.

This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the sec... (read more)

2TsviBT
Wouldn't this also decrease the heritability?
2TsviBT
Because if some of the heritability is from the second chunk, that means that for some pairs of people, they have roughly the same first chunk but somewhat different second chunks; and they have different traits, due to the difference in second chunks. If some amount of heritability is from the second chunk, then to that extent, there's a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you'd see these pairs of people and then you'd find out how specifically the second chunk affects the trait. I could be confused about some really basic math here, but yeah, I don't see it. Your example for how the gradient doesn't flow seems to say "the gradient doesn't flow because the second chunk doesn't actually affect the trait".

Some of the heritability would be from the second chunk of genes.

2TsviBT
To the extent that the heritability is from the second chunk, to that extent the gradient does flow, no?

The original discussion was about how personality traits and social outcomes could behave fundamentally differently from biological traits when it comes to genetics. So this isn't necessarily meant to apply to disease risks.

2TsviBT
Well you brought up depression. But anyway, all my questions apply to personality traits as well. ..... To rephrase / explain how confused I am about what you're trying to tell me: It kinda sounds like you're saying "If some trait is strongly determined by one big chunk of genes, then you won't be able to see how some other chunk affects the trait.". But this can't explain missing heritability! In this scenario, none of the heritability is even from the second chunk of genes in the first place! Or am I missing something?

Let's start with the basics: If the outcome  is a linear function of the genes , that is , then the effect of each gene is given by the gradient of , i.e. . (This is technically a bit sketchy since a genetic variant is discrete while gradients require continuity, but it works well enough as a conceptual approximation for our purposes.) Under this circumstance, we can think of genomic studies as finding . (This is also technically a bit sketchy because of linkage disequillibrium and such, but it works we... (read more)

2TsviBT
Ah. Thank you, this makes sense of what you said earlier. (I / someone could have gotten this from what you had written before, by thinking about it more, probably.) I agree with your analysis as math. However, I'm skeptical of the application to the genetics stuff, or at least I don't see it yet. Specifically, you wrote: And your argument here says that there's "gradient interference" between the summed products specifically when one of the summed products is really big. But in the case of disease risk, IIUC the sum-of-products f(x) is something like logits. So translating your argument, it's like: In this case, yes the analysis is valid, but it's not very relevant. For the diseases that people tend to talk about, if there are several substantial disjunctive causes (I mean, the risk is a sum of a few different sub-risks), then they all would show substantial signal in the data. None of them drowns out all the others. Maybe you just meant to say "In theory this could happen". Or am I missing what you're suggesting? E.g. is there a way for there to be a trait that: * has lots of variation (e.g. lots of sick people and lots of non-sick people), and * it's genetic, and * it's a fairly simple functional form like we've been discussing, * but you can't optimize it much by changing a bunch of variants found by looking at some millions of genotype/phenotype pairs?

It kind-of applies to the Bernoulli-sigmoid-linear case that would usually be applied to binary diagnoses (but only because of sample size issues and because they usually perform the regression one variable at a time to reduce computational difficulty), but it doesn't apply as strongly as it does to the polynomial case, and it doesn't apply to the purely linear (or exponential-linear) case at all.

If you have a purely linear case, then the expected slope of a genetic variant onto an outcome of interest is proportional to the effect of the genetic variant.

Th... (read more)

It doesn't matter if depression-common is genetic or environmental. Depression-common leads to the genetic difference between your cases and controls to be small along the latent trait axis that causes depression-rare. So the effect gets estimated to be not-that-high. The exact details of how it fails depends on the mathematical method used to estimate the effect.

2TsviBT
Ok I think I get what you're trying to communicate, and it seems true, but I don't think it's very relevant to the missing heritability thing. The situation you're describing applies to the fully linear case too. You're just saying that if a trait is more polygenic / has more causes with smaller effects, it's harder to detect relevant causes. Unless I still don't get what you're saying.

Not right now, I'm on my phone. Though also it's not standard genetics math.

2TsviBT
Ok. I don't get why you think this. It doesn't seem to make any sense. You'd still notice the effect of variants that cause depression-rare, exactly like if depression-rare was the only kind of depression. How is your ability to detect depression-rare affected by the fact that there's some genetic depression-common? Depression-common could just as well have been environmentally caused. I might be being dumb, I just don't get what you're saying and don't have a firm grounding myself.

Isn't the derivative of the full variable in one of the multiplicands still noticeable? Maybe it would help if you make some quantitative statement?

Taking the logarithm (to linearize the association) scales the derivative down by the reciprocal of the magnitude. So if one of the terms in the sum is really big, all the derivatives get scaled down by a lot. If each of the terms are a product, then the derivative for the big term gets scaled up to cancel out the downscaling, but the small terms do not.

I mean, I think depression is heritable, and I think there

... (read more)
2TsviBT
Can you please write down the expressions you're talking about as math? If you're trying to invoke standard genetics knowledge, I'm not a geneticist and I'm not picking it up from what you're saying.

It becomes more complex once you take the sum of the product of several things. At that point the log-additive effect of one of the terms in the sum disappears if the other term in the sum is high. If you've got a lot of terms in the sum and the distribution of the variables is correct, this can basically kill the bulk of common additive variance. Conceptually speaking, this can be thought of as "your system is a mixture of a bunch of qualitatively distinct things". Like if you imagine divorce or depression can be caused by a bunch of qualitatively unrelated things.

2TsviBT
Hm.... Not sure how to parse this. (What do you mean " the distribution of the variables is correct"?) Isn't the derivative of the full variable in one of the multiplicands still noticeable? Maybe it would help if you make some quantitative statement? I mean, I think depression is heritable, and I think there are polygenic scores that do predict some chunk of this. (From a random google: https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2783096 ) Quite plausibly yes these heritability estimates and PGSes are picking up on heterogeneous things, but they still work, and you can still construct the PGS; you find the additive variants when you look. (Also I am interested in the difference between traits that are OR / SUM of some heritable things and some non-heritable things. E.g. you can get lung cancer from lung cancer genes, or from smoke 5 packs a day. This matters for asking "just how low exactly can we drive down disease risk?". But this would not show up as missing heritability!)
1Caleb Biddulph
Interesting, strong-upvoted for being very relevant. My response would be that identifying accurate "labels" like "this is a tree-detector" or "this is the Golden Gate Bridge feature" is one important part of interpretability, but understanding causal connections is also important. The latter is pretty much useless without the former, but having both is much better. And sparse, crisply-defined connections make the latter easier. Maybe you could do this by combining DLGNs with some SAE-like method.

Couldn't it also end if all the AI companies collapse under their own accumulated technical debt and goodwill lost to propaganda, and people stop wanting to use AI for stuff?

And as a separate note, I'm not sure what the appropriate human reference class for game-playing AIs is, but I challenge the assumption that it should be people who are familiar with games. Rather than, say, people picked at random from anywhere on earth.

Should maybe restrict it to someone who has read all the documentation and discussion for the game that exists on the internet.

4MondSemmel
Fair. But then also restrict it to someone who has no hands, eyes, etc.

The defining difference was whether they have contextually activating behaviors to satisfy a set of drives, on the basis that this makes it trivial to out-think their interests. But this ability to out-think them also seems intrinsically linked to them being adversarially non-robust, because you can enumerate their weaknesses. You're right that one could imagine an intermediate case where they are sufficiently far-sighted that you might accidentally trigger conflict with them but not sufficiently far-sighted for them to win the conflicts, but that doesn't mean one could make something adversarially robust under the constraint of it being contextually activated and predictable.

2Mateusz Bagiński
Alright, fair, I misread the definition of "homeostatic agents".

That would be ones that are bounded so as to exclude taking your manipulation methods into account, not ones that are truly unbounded.

2Mateusz Bagiński
I interpreted "unbounded" as "aiming to maximize expected value of whatever", not "unbounded in the sense of bounded rationality". 

That's not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that's game over for the maximizer.

I don't think of my argument as model-based vs heuristic-reactive, I mean it as unbounded vs bounded. Like you could imagine making a giant stack of heuristics that makes it de-facto act like an unbounded consequentialist, and you'd have a similar problem. Model-based agents only become relevant because they seem like an ea... (read more)

Homeostatic agents are easily exploitable by manipulating the things they are maintaining or the signals they are using to maintain them in ways that weren't accounted for in the original setup. This only works well when they are basically a tool you have full control over, but not when they are used in an adversarial context, e.g. to maintain law and order or to win a war.

As capabilities to engage in conflict increase, methods to resist losing to those capabilities have to get optimized harder. Instead of thinking "why would my coding assistant/tutor bot ... (read more)

3faul_sname
I agree that a homeostatic agent in a sufficiently out-of-distribution environment will do poorly - as soon as one of the homeostatic feedback mechanisms starts pushing the wrong way, it's game over for that particular agent. That's not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that's game over for the maximizer. Sorry, I'm having some trouble parsing this sentence - does "they" in this context refer to homeostatic agents? If so, I don't think they make particularly great tools even in a non-adversarial context. I think they make pretty decent allies and trade partners though, and certainly better allies and trade partners than consequentialist maximizer agents of the same level of sophistication do (and I also think consequentialist maximizer agents make pretty terrible tools - pithily, it's not called the "Principal-Agent Solution"). And I expect "others are willing to ally/trade with me" to be a substantial advantage. Can you expand on "turn evil"? And also what I was trying to accomplish by making my comms-screening bot into a self-directed goal-oriented agent in this scenario?
3Mateusz Bagiński
Unbounded consequentialist maximizers are easily exploitable by manipulating the things they are optimizing for or the signals/things they are using to maximize them in ways that weren't accounted for in the original setup.  

What if humanity mistakenly thinks that ceding control voluntarily is temporary, when actually it is permanent because it makes the systems of power less and less adapted to human means of interaction?

When asking this question, do you include scenarios where humanity really doesn't want control and is impressed by the irreproachability of GPTs, doing our best to hand over control to them as fast as possible, even as the GPTs struggle and only try in the sense that they accept whatever tasks are handed to them? Or do the GPTs have to in some way actively attempt to wrestle control from or trick humans?

2Seth Herd
Yes, that's what I meant by "takeover." It's distinct from ceding control voluntarily. I do not see humanity ever fully ceding control, as distinct from accepting a lot of help and advice from AIs. Why cede control if you can get all of the upsides without losing the ability to change your mind? Of course, if you're accepting all of the advice, you have temporarily ceded control. But I'm primarily concerned with humanity accidentally losing its choice through the creation of AGI that's not aligned with the majority of humanity's interests or desires - the classic alignment question.

Consider this model.

Suppose the state threatens people to do the following six things for their citizens:

* Teach the young
* Cure the sick
* Maintain law and order
* Feed, clothe and house people with work injuries
* Feed, clothe and house the elderly
* Feed, clothe and house people with FUBAR agency

(Requesting roughly equally many resources to be put into each of them.)

People vary in how they react to the threats, having basically three actions:

1. Assist with what is asked
2. Develop personal agency for essentially-selfish reasons, beyond what is useful on the ... (read more)

I feel like the case of bivariate PCA is pretty uncommon. The classic example of PCA is over large numbers of variables that have been transformed to be short-tailed and have similar variance (or which just had similar/small variance to begin with before any transformations). Under that condition, PCA gives you the dimensions which correlate with as many variables as possible.

Load More