Independent researcher of AI alignment and human capabilities. pi_star on Discord.
Oops that was a typo. Fixed now, and added a comma to clarify that I mean the latter.
I propose the following desideratum for self-referential doxastic modal agents (agents that can think about their own beliefs), where represents "I believe ", represents the agent's world model conditional on , and is the agent's preference relation:
Positive Placebomancy: For any proposition , The agent concludes from , if .
In natural English: The agent believes that hyperstitions, that benefit the agent if true, are true.
"The placebo effect works on me when I want it to".
A real life example: In this sequence post, Eliezer Yudkowsky advocates for using positive placebomancy on "I cannot self-deceive".
I would also like to formalize a notion of "negative placebomancy" (doesn't believe hyperstitions that don't benefit it), "total placebomancy" (believes hypestitions iff they are beneficial), "group placebomancy" (believes group hyperstitions that are good for everyone in the group, conditional on all other group members having group placebomancy or similar), and generalizations to probabilistic self-referential agents (like "ideal fixed-point selection" for logical inductor agents).
I will likely cover all of these in a future top-level post, but I wanted to get this idea out into the open now because I keep finding myself wanting to reference it in conversation.
Edit log:
I think I know (80% confidence) the identity of this "local Vassarite" you are referring to, and I think I should reveal it, but, y'know, Unilateralist's Curse, so if anyone gives me a good enough reason not to reveal this person's name, I won't. Otherwise, I probably will, because right now I think people really should be warned about them.
People often say things like "do x. Your future self will thank you." But I've found that I very rarely actually thank my past self, after x has been done, and I've reaped the benefits of x.
This quick take is a preregistration: For the next month I will thank my past self more, when I reap the benefits of a sacrifice of their immediate utility.
e.g. When I'm stuck in bed because the activation energy to leave is too high, and then I overcome that and go for a run and then feel a lot more energized, I'll look back and say "Thanks 7 am Morphism!"
(I already do this sometimes, but I will now make a TAP out of it, which will probably cause me to do it more often.)
Then I will make a full post describing in detail what I did and what (if anything) changed about my ability to sacrifice short-term gains for greater long-term gains, along with plausible theories w/ probabilities on the causal connection (or lack thereof), as well as a list of potential confounders.
Of course, it is possible that I completely fail to even install the TAP. I don't think that's very likely, because I'm #1-prioritizing my own emotional well-being right now (I'll shift focus back onto my world-saving pursuits once I'm more stablely not depressed). In that case I will not write a full post because the experiment would have not even been done. I will instead just make a comment on this shortform to that effect.
Edit: There are actually many ambiguities with the use of these words. This post is about one specific ambiguity that I think is often overlooked or forgotten.
The word "preference" is overloaded (and so are related words like "want"). It can refer to one of two things:
I'm not sure how we should distinguish these. So far, my best idea is to call the former "global preferences" and the latter "local preferences", but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime. Does anyone have a better name for this distinction?
I think we definitely need to distinguish them, however, because they often disagree, and most "values disagreements" between people are just disagreements in local preferences, and so could be resolved by considering global preferences.
I may write a longpost at some point on the nuances of local/global preference aggregation.
Example: Two alignment researchers, Alice and Bob, both want access to a limited supply of compute. The rest of this example is left as an exercise.
Emotions can be treated as properties of the world, optimized with respect to constraints like anything else. We can't edit our emotions directly but we can influence them.
Contrary to what the current wiki page says, Simulacrum levels 3 and 4 are not just about ingroup signalling. See these posts and more, as well as Beaudrillard's original work if you're willing to read dense philosophy.
Here is an example where levels 3 and 4 don't relate to ingroups at all, which I think may be more illuminating than the classic "lion across the river" example:
Alice asks "Does this dress makes me look fat?" Bob says "No."
Depending on the simulacrum level of Bob's reply, he means:
Here are some potentially better definitions, of which the group association definitions are a clear special case:
Communication of object-level truth.
Optimization over the listener's belief that the speaker is communicating on simulacrum level 1, i.e. desire to make the listener believe what the listener says.
These are the standard old definitions. The transition from 1 to 2 is pretty straightforward. When I use 2, I want you to believe I'm using 1. This is not necessarily lying. It is more like Frankfurt's bullshit. I care about the effects of this belief on the listener, regardless of its underlying truth value. This is often (naively considered) prosocial, see this post for some examples.
Now, the transition from 2 to 3 is a bit tricky. Level 3 is a result of a social equilibrium that emerges after communication in that domain gets flooded by prosocial level 2. Eventually, everyone learns that these statements are not about object-level reality, so communication on levels 1 and 2 become futile. Instead, we have:
E.g. that Alice cares about Bob's feelings, in the case of the dress, or that I'm with the cool kids that don't cross the river, in the case of the lion. Another example: bids to hunt stag.
3 to 4 is analogous to 1 to 2.
Like with the jump from 1 to 2, the jump from 3 to 4 has the quality of bullshit, not necessarily lies. Speaker intent matters here.