Ivan Vendrov

Wiki Contributions

Comments

Sorted by

I like this a lot! A few scattered thoughts

  • This theory predicts and explains "therapy-resistant dissociation", or the common finding that none of the "woo" exercises like focusing, meditation, etc, actually work. (c.f. Scott's experience as described in https://www.astralcodexten.com/p/are-woo-non-responders-defective). If there's an active strategy of self-deception, you'd expect people to react negatively (or learn to not react via yet deeper levels of self-deception) to straightforward attempts to understand and untangle one's psychology.
  • It matches and extends Robert Trivers' theory of self-deception, wherein he predicts that when your mind is the site of a conflict between two sub-parts, the winning one will always be subconscious, because the conscious mind is visible to the subconscious but not vice versa, and being visible makes you weak. Thus, counterintuitively, the mind we are conscious of - in your phrase the false self - is always the losing part.
  • It connects to a common question I have for people doing meditation seriously - why exactly do you want to make the subconscious conscious? Why is it such a good thing to "become more conscious"? Now I can make the question more precise - why do you think it's safe to have more access to your thoughts and feelings than your subconscious gave you? And how exactly do you plan to deal with all the hostile telepaths out there (possibly including parts of yourself?). I expect most people find themselves dealing with (partly) hostile telepaths all the time, and so Occlumency is genuinely necessary unless one lives in an extraordinarily controlled environment such as a monastery.
  • Social deception games like Avalon or Diplomacy provide a fertile ground for self- and group experimentation with the ideas in this essay.

I know this isn't the central point of your life reviews section but curious if your model has any lower bound on life review timing - if not minutes to hours, at least seconds? milliseconds? (1 ms being a rough lower bound on the time for a signal to travel between two adjacent neurons).

If it's at least milliseconds it opens the strange metaphysical possibility of certain deaths (e.g. from very intense explosions) being exempt from life reviews.

Really appreciated this exchange, Ben & Alex have rare conversational chemistry and ability to sense-make productively at the edge of their world models.

I mostly agree with Alex on the importance of interfacing with extant institutional religion, though less sure that one should side with pluralists over exclusivists. For example, exclusivist religious groups seem to be the only human groups currently able to reproduce themselves, probably because exclusivism confers protection against harmful memes and cultural practices.

I'm also pursuing the vision of a decentralized singleton as alternative to Moloch or turnkey totalitarianism, although it's not obvious to me how the psychological insights of religious contemplatives are crucial here, rather than skilled deployment of social technology like the common law, nation states, mechanism design, cryptography, recommender systems, LLM-powered coordination tools, etc. Is there evidence that "enlightened" people, for some sense of "enlightened" are in fact better at cooperating with each other at scale? 

If we do achieve existential security through building a stable decentralized singleton, it seems much more likely that it would be the result of powerful new social tech, rather than the result of intervention on individual psychology. I suppose it could be the result of both with one enabling the other, like the printing press enabling the Reformation.

definitely agree there's some power-seeking equivocation going on, but wanted to offer a less sinister explanation from my experiences in AI research contexts. Seems that a lot of equivocation and blurring of boundaries comes from people trying to work on concrete problems and obtain empirical information. a thought process like

  1. alignment seems maybe important?
  2. ok what experiment can I set up that lets me test some hypotheses
  3. can't really test the long-term harms directly, let me test an analogue in a toy environment or on a small model, publish results
  4. when talking about the experiments, I'll often motivate them by talking about long-term harm

Not too different from how research psychologists will start out trying to understand the Nature of Mind and then run a n=20 study on undergrads because that's what they had budget for.  We can argue about how bad this equivocation is for academic research, but it's a pretty universal pattern and well-understood within academic communities.

The unusual thing in AI is that researchers have most of the decision-making power in key organizations, so these research norms leak out into the business world, and no-one bats an eye at a "long-term safety research" team that mostly works on toy and short term problems.

This is one reason I'm more excited about building up "AI security" as a field and hiring infosec people instead of ML PhDs. My sense is that the infosec community actually has good norms for thinking about and working on things-shaped-like-existential-risks, and the AI x-risk community should inherit those norms, not the norms of academic AI research.

by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn't be a 'warning shot', it'd just be a 'shot' or 'disaster'.

Yours is the more direct definition but from context I at least understood 'warning shot' to mean 'disaster', on the scale of a successful terrorist attack, where the harm is large and undeniable and politicians feel compelled to Do Something Now.  The 'warning' is not of harm but of existential harm if the warning is not heeded.

I do still expect such a warning shot, though as you say it could very well be ignored even if there are large undeniable harms (e.g. if a hacker group deploys a rogue AI that causes a trillion dollars of damage, we might take that as warning about terrorism or cybersecurity not about AI).

Agreed that coalitional agency is somehow more natural than squiggly-optimizer agency. Besides people, another class of examples are historical empires (like the Persian and then Roman) which were famously lenient [1] and respectful of local religious and cultural traditions; i.e. optimized coalition builders that offered goal-stability guarantees to their subagent communities, often stronger guarantees than those communities could expect by staying independent.

This extends my argument in Cooperators are more powerful than agents - in a world of hierarchical agency, evolution selects not for world-optimization / power-seeking but for cooperation, which looks like coalition-building (negotiation?) at the higher levels of organization and coalition-joining (domestication?) at the lower levels. 

I don't see why this tendency should break down at higher levels of intelligence, if anything it should get stronger as power-seeking patterns are detected early and destroyed by well-coordinated defensive coalitions. There's still no guarantee that coalitional superintelligence will respect "human values" any more than we respect the values of ants; but contra Yudkowsky-Bostrom-Omohundro doom is not the default outcome.

  1. ^

    if you surrendered!

Correct, I was not offered such paperwork nor any incentives to sign it. Edited my post to include this.

Ivan Vendrov14012

I left Anthropic in June 2023 and am not under any such agreement.

EDIT: nor was any such agreement or incentive offered to me.

  1. Agree trust and cooperation is dual use, and I'm not sure how to think about this yet; perhaps the most important form of coordination is the one that prevents (directly or via substitution) harmful forms of coordination from arising.
  2. One reason I wouldn't call lack of altruism the root is that it's not clear how to intervene on it, it's like calling the laws of physics the root of all evil. I prefer to think about "how to reduce transaction costs to self-interested collaboration". I'm also less sure that a society of people more altruistic motives will necessarily do better... the nice thing about self-interest is that your degree of care is proportional to your degree of knowledge about the situation. A society of extremely altruistic people who are constantly devoting resources to solve what they believe to be other people's problems may actually be less effective at ensuring flourishing.

You're right the conclusion is quite underspecified - how exactly do we build such a cooperation machine?

I don't know yet, but my bet is more on engineering, product design, and infrastructure than on social science. More like building a better Reddit or Uber (or supporting infrastructure layers like WWW and the Internet) than like writing papers.

Load More