All of Elias Schmied's Comments + Replies

random spurious accusations with zero factual backing are usually considered town/vanilla/arthurian moves in werewolf games; irl this breeds chaos and is a classic DARVO tactic.


In my experience this is only true for beginner play (where werewolves are often too shy to say anything), and in advanced play it is a bad guy tactic for the same reasons as IRL. Eg I think in advanced Among Us lobbies it's an important skill to subtly push an unproductive thread of conversation without making it obvious that you were the one who distracted everybody.

It's not clear... (read more)

Linch113

Eg I think in advanced Among Us lobbies it's an important skill to subtly push an unproductive thread of conversation without making it obvious that you were the one who distracted everybody.

I'm not much of an avid Among Us player, but I suspect this only works in Among Us because of the (much) heavier-than usual time pressures. In the other social deception games I'm aware of, the structural incentives continue to point in the other direction, so the main reason for bad guys to make spurious accusations is for anti-inductive reasons (if everybody knows th... (read more)

3Linch
Sorry that was awkwardly worded. Here's a simplified rephrase: Put in a different way, because of the structure of games like Avalon (it's ~impossible for all the bad guys to not be found out, minions know who each other are, all minions just want their "team" to win so having sacrificial lambs make sense, etc), there are often equilibria where in even slightly advanced play, minions (bad guys) want to be seen as disagreeing with other minions earlier on. So if you find someone disagreeing with minions a lot (in voting history etc), especially in non-decision-relevant ways, this is not much evidence one way or another (and in some cases might even be negative evidence on your goodness). Similarly, if Mildred constantly speaks highly of you, and we later realize that Mildred is a minion, this shouldn't be a negative update on you (and in some cases is a positive), because minions often have structural reasons to praise/bribe good guys. At higher levels obviously people become aware of this dynamic so there's some anti-inductive play going on, but still. Frequently the structural incentives prevail. In real life there's a bit of this dynamic but the level one model ("birds of a feather flock together") is more accurate, more of the time. 
2simon
Yes, I was thinking on those lines myself and suspect that we've already left the optimal conditions for democracy.  Consider how people say, for example, that it's impossible to revolt against the government using just personal firearms, given that the government has nukes, fighter jets etc. Well, if that's true, democracy depends on the ideological commitment of members of the relevant institutions. And I don't think that's necessarily an especially stable situation - if the incentive is there, the ideology will shift eventually. Moreover, I think alyssavance (OP) is perhaps a bit too dismissive of wokeism, in part precisely for the above reasons - woke ideology has disproportionate institutional influence compared with its popular support. But another, perhaps more important reason to be concerned about woke ideology is that its institutional influence is leading de facto policy as actually implemented to - as I see it - be considerably more woke-oriented than is popularly supported. This naturally could lead to support among anti-woke people for political crackdowns on woke-influenced institutions to prevent this. But of course, such crackdowns are exactly the sort of thing that would enable a takeover. And that sort of support could also lead to increased fervor among the woke: "see, we have to stop those terrible people", etc (which is also what the anti-woke are saying, of course). Classic toxoplasma, potentially. Edit: to be clear, I do think it's a bad thing that democracy may be unstable now. 

Oh, I see what you're saying now. Thanks for clarifying.

But this would apply to the visual cortex as well right? So it doesn't explain the discrepancy.

6jacob_cannell
It does of course apply to the visual cortex, so I don't understand your comment. Essentially the answer is #2 in the OP's list. CNNs are like the visual cortex but highly compressed through weight sharing, which is easy for a von neumman machine but isn't really feasible for a slow neuromorphic computer like the brain. The Op is mistaken about visual transformers, they can also exploit parameter sharing just in a different way.

I appreciate the charity!

I'm not claiming that people don't care about other people's internal states, I'm saying that it introspectively doesn't feel like that is implemented via empathy (the same part of my world model that predicts my own emotions), but via a different part of my model (dedicated to modeling other people), and that this would solve the "distinguishing-empathy-from-transient-feelings" mystery you talk about.

Additionally (but relatedly), I'm also skeptical that those beliefs are better decribed as being about other people's internal state... (read more)

2Steven Byrnes
Hmm. Continuing with the schadenfraude example, let’s say Alice stole my kettle and I would feel good if she burned her fingers on it. (Serves her right!) My introspection says, if Alice is alone when she burns her fingers, I’m still happy—that still counts. If I never see her again after that, that still counts. Heck, if she becomes a hermit and never sees another human again, that still counts. And therefore, that thought of Alice burning her fingers is pleasing in a way that is tightly connected to how I believe Alice feels, and disconnected from how I believe Alice is behaving socially, I think. You mention “I imagine Alice acting happy, smiling and uncaring”. But I feel like the following two things feel very different to me: * “I imagine that Alice is acting happy, smiling and uncaring, and this is straightforwardly related to how she really feels”, versus * “I imagine that Alice is acting happy, smiling and uncaring, but on the inside she’s miserable, and she’s hiding how she really feels”. What do you think? I don’t update much on that because I think almost all of the discourse and intuitions and literature surrounding the word “empathy” are not talking about the same thing that I want to talk about. Thus I tend to avoid the word “empathy” altogether where possible. I’ve been using other terms like “empathetic simulation” or “little glimpse of empathy”. I talk about that a bit in Section 13.5.2 here. More specifically, I’m guessing that it doesn’t “feel like empathy” when you imagine Alice burning her fingers on the kettle she stole from me, because that thought feels good, whereas empathizing with Alice would be unpleasant. Here, my model says “yes the thought feels good, and if that’s not what you think of as “empathy”, then the thing you think of as “empathy” is not what I’m talking about”. When we think of emotion concepts / categories, the valence / arousal / etc. associated with them are central properties. E.g. righteous indignation has to hav

Thanks for the reply!

  • In envy, if a little glimpse of empathy indicates that someone is happy, it makes me unhappy.
  • In schadenfreude, if a little glimpse of empathy indicates that someone is unhappy, it makes me happy.
  • When I’m angry, if a little glimpse of empathy indicates that the person I’m talking to is happy and calm, it sometimes makes me even more angry!

How sure are you that these are instances of empathy (defining it as "prediction by our own latent world model of ourselves being happy/unhappy soon")? If I imagine myself in these examples, ... (read more)

3Steven Byrnes
I think I probably don’t follow what you’re saying. It seems to me that people care very much about the internal state of other people. (Not in the sense of “people care that they have veridical beliefs about the internal state of other people”, but in the sense of “people spend a lot of time thinking about the internal state of other people, and their beliefs about those states are very relevant to their reactions”.) Like, if I am to feel schadenfraude at Alice’s misfortune, it seems to me that it really matters that it’s a misfortune from Alice’s perspective. If I hate swimming and Alice loves it, and then Alice swims, then I wouldn’t feel schadenfraude there, right? And that requires attending to and reacting to (my beliefs about) Alice’s internal state, right? Again, this seems very obvious to me, which suggests that I’m probably misunderstanding you.

In the specific example of chocolate (unless it wasn't supposed to be realistic), are you sure it doesn't get trained away? I don't think that, upon seeing someone eating chocolate, I immediately imagine tasting chocolate. I feel like the chocolate needs to rise to my attention for other reasons, and only then do I viscerally imagine tasting chocolate.

4Steven Byrnes
What I really believe is that “the brain does other things with that information”, things more general than “feeling the same feeling as the other person is feeling”. See here: I do think “feeling the same feeling as the other person is feeling” can happen. The ice cream example is not great for that; maybe consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult. Maybe an even better example (that only works for half the population) is “seeing someone get kicked in the balls”. But it’s a bit subtle. If I saw people getting unexpectedly punched hard in the stomach day after day, sure, maybe I would stop cringing. But how much of that is a natural consequence of the learning algorithm and how much of that is “empathy is kinda aversive here, so I learn by RL to leverage top-down attention to deliberately avoid triggering that reaction”? I tend to think it’s mostly the latter, but it’s not obvious.
4DragonGod
I think this is an obviously wrong assumption of training data for within lifetime human learning. I think it's likely orders of magnitude off? * Relevant time frame is childhood * Text data consumption seems like a more relevant metric * Children do not read at a rate of 5 words per second * Children do not read all their lives

Ah, I see what you mean! Interesting perspective. The one thing I disagree with is that a "gradient" doesn't seem like the most natural way to see it. It seems like it's more of a binary, "Is there (accurate) modelling of the counterfactual of your choice being different going on that actually impacted the choice? If yes, it's acausal. If not, it's not". This intuitively feels pretty binary to me.

2ryan_b
I agree the gradient-of-physical-systems isn't the most natural way to think about it; I note that it didn't occur to me until this very conversation despite acausal trade being old hat here. What I am thinking now is that a more natural way to think about it is overlapping abstraction space. My claim is that in order to acausally coordinate, at least one of the conditions is that all parties need to have access to the same chunk of abstraction space, somewhere in their timeline. This seems to cover the similar physical systems intuition we were talking about: two rocks with coordinate painted on them are abstractly identical, so check; two superrational AIs need the abstractions to model another superrational AI, so check. This is terribly fuzzy, but seems to allow in all the candidates for success. The binary distinction makes sense, but I am a little confused about the work the counterfactual modeling is doing. Suppose I were to choose between two places to go to dinner, conditional on counterfactual modelling of each choice. Would this be acausal in your view?

I don't think the "zero-computation" case should count. Are two ants in an anthill doing acausal coordination? No, they're just two similar physical systems. It seems to stretch the original meaning , it's in no sense "acausal".

3ryan_b
I agree two ants in an anthill are not doing acausal coordination; they are following the pheromone trails laid down by each other. This is the ant version of explicit coordination. But I think the crux between us is this: I agree, it does seem to stretch the original meaning. I think this is because the original meaning was surprising and weird; it seemed to be counterintuitive and I had to put quite a few cycles in to work through the examples of AIs negotiating without coexisting. But consider for a moment we had begun from the opposite end: if we accept two rocks with "cooperate" painted on them as counting for coordination, starting from there we can make a series of deliberate extensions. By this I mean stuff like: if we can have rocks with cooperate painted on, surely we can have agents with cooperate painted on (which is what I think voting mostly is); if we can have agents with cooperate painted on, we can have agents with decision rules about whether to cooperate; if we can have decision rules about whether to cooperate they can use information about other decision rules, and so on until we encompass the original case of superrational AGI trading acausally with AGIs in the future. I feel like this progression from cooperating rocks to superrational AGIs is just recognizing a gradient whereby progressively less-similar physical systems can still accomplish the same thing as the 0 computation, 0 information systems which are very similar.

I disagree. There is no acausal coordination because eg the reasoning "If everyone thought like me, democracy would fall apart" does not actually influence many people's choice, ie they would vote due to various social-emotional factors no matter what that reasoning said. It's just a rationalization.

More precisely, when people say "If everyone thought like me, democracy would fall apart", it's not actually the reasoning that it could be interpreted as, it's a vague emotional appeal to loyalty/the identity of a modern liberal/etc. You can tell because it re... (read more)

I've been thinking along similar lines, but instinctively, without a lot of reflection, I'm concerned about negative social effects of having an explicit community-wide list of "trusted people".

After thinking about it a little bit, the only hypothesis I could come up with for what's going on in the negation example is that the smaller models understand the Q&A format and understand negation, but the larger models have learned that negation inside a Q&A is unusual and so disregard it.

Thanks for this post, this looks very useful :) (it comes at a great time for me since I'm starting to work on my first self-directed research project right now).

I'm very interested, but since you've already found someone, please post the results! :)

Thanks! Am probably convinced by the third point, unsure about the others due to not having much time to think at the moment.

This has been my vague intuition as well, and I'm confused as to where exactly people think this argument goes wrong. So I would appreciate some rebuttals to this.

For 9, are you thinking of grokking

3Quintin Pope
See my comment.
2blf
I think it would be a good idea to ask the question at the ongoing thread on AGI safety questions.

Thanks for the post. A clarifying question: Are you claiming that / do you think that these framings are extensionally equivalent?

Sorry, I should be more specific. We are talking about AGI Safety, it seems unlikely that running narrow AI faster gets you AGI. I'm not sure if you disagree with that. I don't understand what you mean by "imitations of augmented of humans" and "planning against a human-level imitation".

This "imitating an optimizer" / "optimizing to imitate" dichotomy seems unnecessarily confusing to me. Isn't it just inner alignment / inner misalignment (with the human behavior you're being trained on)? If you're imitating an optimizer, you're still an optimizer.

2David Johnston
I agree with this. If the key idea is, for example, optimising imitators generalise better than imitations of optimisers, or for a second example that they pursue simpler goals, it seems to me that it'd be better just to draw distinctions based on generalisation or goal simplicity and not on optimising imitators/imitations of optimisers.
1Elias Schmied
Sorry, I should be more specific. We are talking about AGI Safety, it seems unlikely that running narrow AI faster gets you AGI. I'm not sure if you disagree with that. I don't understand what you mean by "imitations of augmented of humans" and "planning against a human-level imitation".

I must be missing something here. Isn't optimizing necessary for superhuman behavior? So isn't "superhuman behavior" a strictly stronger requirement than "being a mesaoptimizer"? So isn't it clear which one happens first?

7paulfchristiano
Fast imitations of subhuman behavior or imitations of augmented of humans are also superhuman. As is planning against a human-level imitation. And so on. It's unclear if systems trained in that way will be imitating a process that optimizes, or will be optimizing in order to imitate. (Presumably they are doing both to varying degrees.) I don't think this can be settled a priori.

Great post. Would add as an example: "While thinking about something and trying to figure out your viewpoint on it, track internal feelings of cognitive dissonance and confusion"

Have you gotten farther with this? It seems like a potentially very impactful thing to me. I also had the idea recently of paying skeptical AI researchers to spend a few hours discussing/debating their reasons for skepticism