Oh, I see what you're saying now. Thanks for clarifying.
But this would apply to the visual cortex as well right? So it doesn't explain the discrepancy.
I appreciate the charity!
I'm not claiming that people don't care about other people's internal states, I'm saying that it introspectively doesn't feel like that is implemented via empathy (the same part of my world model that predicts my own emotions), but via a different part of my model (dedicated to modeling other people), and that this would solve the "distinguishing-empathy-from-transient-feelings" mystery you talk about.
Additionally (but relatedly), I'm also skeptical that those beliefs are better decribed as being about other people's internal state...
Thanks for the reply!
- In envy, if a little glimpse of empathy indicates that someone is happy, it makes me unhappy.
- In schadenfreude, if a little glimpse of empathy indicates that someone is unhappy, it makes me happy.
- When I’m angry, if a little glimpse of empathy indicates that the person I’m talking to is happy and calm, it sometimes makes me even more angry!
How sure are you that these are instances of empathy (defining it as "prediction by our own latent world model of ourselves being happy/unhappy soon")? If I imagine myself in these examples, ...
In the specific example of chocolate (unless it wasn't supposed to be realistic), are you sure it doesn't get trained away? I don't think that, upon seeing someone eating chocolate, I immediately imagine tasting chocolate. I feel like the chocolate needs to rise to my attention for other reasons, and only then do I viscerally imagine tasting chocolate.
Katja Grace's p(doom) is 8% IIRC
Ah, I see what you mean! Interesting perspective. The one thing I disagree with is that a "gradient" doesn't seem like the most natural way to see it. It seems like it's more of a binary, "Is there (accurate) modelling of the counterfactual of your choice being different going on that actually impacted the choice? If yes, it's acausal. If not, it's not". This intuitively feels pretty binary to me.
I don't think the "zero-computation" case should count. Are two ants in an anthill doing acausal coordination? No, they're just two similar physical systems. It seems to stretch the original meaning , it's in no sense "acausal".
I disagree. There is no acausal coordination because eg the reasoning "If everyone thought like me, democracy would fall apart" does not actually influence many people's choice, ie they would vote due to various social-emotional factors no matter what that reasoning said. It's just a rationalization.
More precisely, when people say "If everyone thought like me, democracy would fall apart", it's not actually the reasoning that it could be interpreted as, it's a vague emotional appeal to loyalty/the identity of a modern liberal/etc. You can tell because it re...
I've been thinking along similar lines, but instinctively, without a lot of reflection, I'm concerned about negative social effects of having an explicit community-wide list of "trusted people".
After thinking about it a little bit, the only hypothesis I could come up with for what's going on in the negation example is that the smaller models understand the Q&A format and understand negation, but the larger models have learned that negation inside a Q&A is unusual and so disregard it.
Very useful, thank you!
Thanks for this post, this looks very useful :) (it comes at a great time for me since I'm starting to work on my first self-directed research project right now).
I'm very interested, but since you've already found someone, please post the results! :)
Thanks! Am probably convinced by the third point, unsure about the others due to not having much time to think at the moment.
This has been my vague intuition as well, and I'm confused as to where exactly people think this argument goes wrong. So I would appreciate some rebuttals to this.
For 9, are you thinking of grokking?
Thanks for the post. A clarifying question: Are you claiming that / do you think that these framings are extensionally equivalent?
Sorry, I should be more specific. We are talking about AGI Safety, it seems unlikely that running narrow AI faster gets you AGI. I'm not sure if you disagree with that. I don't understand what you mean by "imitations of augmented of humans" and "planning against a human-level imitation".
This "imitating an optimizer" / "optimizing to imitate" dichotomy seems unnecessarily confusing to me. Isn't it just inner alignment / inner misalignment (with the human behavior you're being trained on)? If you're imitating an optimizer, you're still an optimizer.
I must be missing something here. Isn't optimizing necessary for superhuman behavior? So isn't "superhuman behavior" a strictly stronger requirement than "being a mesaoptimizer"? So isn't it clear which one happens first?
Great post. Would add as an example: "While thinking about something and trying to figure out your viewpoint on it, track internal feelings of cognitive dissonance and confusion"
Have you gotten farther with this? It seems like a potentially very impactful thing to me. I also had the idea recently of paying skeptical AI researchers to spend a few hours discussing/debating their reasons for skepticism
In my experience this is only true for beginner play (where werewolves are often too shy to say anything), and in advanced play it is a bad guy tactic for the same reasons as IRL. Eg I think in advanced Among Us lobbies it's an important skill to subtly push an unproductive thread of conversation without making it obvious that you were the one who distracted everybody.
It's not clear... (read more)
I'm not much of an avid Among Us player, but I suspect this only works in Among Us because of the (much) heavier-than usual time pressures. In the other social deception games I'm aware of, the structural incentives continue to point in the other direction, so the main reason for bad guys to make spurious accusations is for anti-inductive reasons (if everybody knows th... (read more)