Wiki Contributions

Comments

Sorted by
A_donor5-9

For reasons that maybe no normal person really understands, in an outcome that the theory's inventors found very surprising, some people seem to be insane in a way that seizes very hard on a twisted version of this theory.

Oh... huh. @Eliezer Yudkowsky, I think I figured it out.

In a certain class of altered state,[1] a person's awareness includes a wider part of their predictive world-model than usual. Rather than perceiving primarily the part of the self model which models themselves looking out into a world model, the normal gating mechanisms come apart and they perceive much more of their world-model directly (including being able to introspect on their brain's copy of other people more vividly).

This world model includes other agents. Those models of other agents in their world model are now existing in an much less sandboxed environment. It viscerally feels like there is extremely strong entanglement between their actions and those of the agents that might be modelling them, because their model of the other agents is able to read their self-model and vice versa, and in that state they're kinda running it right on the bare-metal models themselves. Additionally, people's models of other people generally use themselves as a template. If they're thinking a lot about threats and blackmail and similar, it's easy for that to leak into expecting others are modelling this more than they are.

So their systems strongly predict that there is way more subjunctive dependence than is real, due to how the brain handles those kind of emergencies.[2]

Add in the thing where decision theory has counterintuitive suggestions and tries to operate kinda below the normal layer of decision process, plus people not being intuitively familiar with it, and yea, I can see why some people can get to weird places. Not reasonably predictable in advance, it's a weird pitfall, but in retrospect fits.

Maybe it's a good idea to write an explainer for this to try and mitigate this way people seem to be able to implode. I might talk to some people.

  1. ^

    The schizophrenia/psychosis/psychedelics-like cluster, often caused by being in extreme psychological states like those caused by cults and extreme perceived thread, especially with reckless mind exploration thrown in the mix.

  2. ^

    [epistemic status: very speculative] it seems plausible this is in part a feature evolution built for handling situations where you seem to be in extreme danger, taking a large chance of doing quite badly and damaging your epistemics or acting in wildly bad ways in order to try and get some chance of finding a path through whatever put you in that state by running a bunch of unsafe cognitive operations which might hit upon a way out of likely death. it sure seems like the common advice is things like "eat food", "drink water", "sleep at all", "be around people who feel safe", which feel like the kinds of things that would turn down those alarm bells. though also this could just be an entirely natural consequence of stress on a cognitive system

A_donor110

I am donating half my net worth over the next few years towards efforts to make it more likely that we end up in one of the better futures (focusing on highly frugal people, as they are more neglected by major grantmakers and my funds will go further), alongside most of my productive energies.

If anyone wants to commit to joining the effort, please reach out. There are many low hanging fruit in the alignment ecosystem space.

Answer by A_donor40
  • There's also the new Alignment Ecosystem Slack, but that's invite only currently. From the tag: "If you'd like to join message plex with an overview of your involvement."
  • I found a great designer/programmer for one of my alignment projects on the EA Creatives and Communicators Slack.
  • Impact Markets is somewhat relevant.

I am getting ready to help launch two more in the next couple of weeks, one for alignment grantmakers (gated to people who can verify they've directed $10k+ towards alignment), one for software engineers who want to help with alignment. They're not polished yet so not ready for a big announcement, but feel free to turn up early if you're sold on the idea already.

(also, neat, I got cited!)

I'm hopeful that most people would see the difference between "rich people trying to save their own skin" and "allowing researchers who are trying to make sure humanity has a long-term future at all to continue their work", but I would be very happy to have leads on who to talk to about presenting this well.

Rob's is invite-only right now, but I have an invite and sent you it. I'm adding links to the two public ones to the main post, the investing one is private and I don't have any invites, but they're considering switching to an open invite next week so I'll get back to you.

For how to get on the radar of grantmakers, I think applying to microgrants programs like ACX+ or LTFF is a reasonable start (one of my grantees was from there ACX+). Another idea which should open up soon with impact certificates is just going ahead and doing something of value then selling your impact. Talking to AI Safety Support is also a near-universally good thing to do for people who want to help. I'd also be open to hearing more about your situation in DM.

A lot of that makes sense, other than I don't understand the angle-grating part?