Posts

Sorted by New

Wiki Contributions

Comments

Perhaps music is another way to get rationalist ideas out into the main-ish stream.

A couple years ago Spotify started recommending lofi songs that included Alan Watts clips, like this: https://open.spotify.com/track/3D0gUUumDPAiy0BAK1RxbO?si=50bac2701cc14850

I had never heard of Watts (a bit surprising in retrospect), and these clips hooked my interest.

An appeal of this approach (spoken word + lofi) is that it is easier to understand, and puts greater emphasis on the semantic meaning over the musical sound.

--

PS. I love the chibi shoggoth

Answer by Kristin Lindquist10

I have weak intuitions for these problems, and in net they make me feel like my brain doesn't work very well. With that to disclaim my taste, FWIW I think your posts are some of the most interesting content on modern day LW. 

It'd be fun to hear you debate anthropic reasoning with Robin Hanson esp. since you invoke grabby aliens. Maybe you could invite yourself on to Robin & Agnes' podcast.

If System X is of sufficient complexity / high dimensionality, it's fair to say that there are many possible dimensional reductions, right? And not just globally better or worse options; instead, reductions that are more or less useful for a given context.

However, a shoggoth's theory-of-human-mind context would probably be a lot of like our context, so it'd make sense that the representations would be similar.

That's interesting re: LLMs as having "conceptual interpretability" by their very nature. I guess that makes sense, since some degree of conceptual interpretability naturally emerges given 1) sufficiently large and diverse training set, 2) sparsity constraints. LLMs are both - definitely #1, and #2 given regularization and some practical upper bounds on total number of parameters. And then there is your point - that LLMs are literally trained to create output we can interpret.

I wonder about representations formed by a shoggoth. For the most efficient prediction of what humans want to see, the shoggoth would seemingly form representations very similar to ours. Or would it? Would its representations be more constrained by and therefore shaped by its theory of human mind, or by its own affordances model? Like, would its weird alien worldview percolate into its theory-of-human-mind representations? Or would its alienness not be weird-theory-of-human-mind so much as everything else going on in shoggoth mind?

More generically, say there's System X with at least moderate complexity. One generally intelligent creature learns to predict System X with N% accuracy, but from context A (which includes its purpose for learning System X / goals for it). Another generally intelligent creature learns how to predict System X with N% accuracy but from a very different context B (it has very different goals and a different background). To what degree would we expect their representations to be be similar / interpretable to one another? How does that change given the complexity of the system, the value of N, etc?

Anyway, I really just came here to drop this paper - https://arxiv.org/pdf/2311.13110.pdf - re: @Sodium's wondering "some suitable loss function that incentivizes the model to represent its learned concepts in an easily readable form." I'm curious about the same question, more from the applied standpoint of how to get a model to learn "good" representations faster. I haven't played with it yet tho. 

I've thought about this and your sequences a bit; it's a fascinating to consider given its 1000 or 10000 year monk nature.

A few thoughts that I forward humbly, since I have incomplete knowledge of alignment and only read 2-3 articles in your sequence:

  • I appreciate your eschewing of idealism (as in, not letting "morally faultless" be the enemy of "morally optimized"), and relatedly, found some of your conclusions disturbing. But that's to be expected, I think!
  • While "one vote per original human" makes sense given your arguments, its moral imperfection makes me wonder - how to minimize that which requires a vote? Specifically, how to minimize the likelihood that blocks of conscious creatures suffer as a result of votes in which they could not participate? As in, how can this system be more federation than democratic? Are there societal primitives that can maximize autonomy of conscious creatures, regardless of voting status?
  • I object, though perhaps ignorantly, to the idea that a fully aligned ASI would not consider itself as having moral weight. How confident are you that this is necessary? Is it a When is Goodhart catastrophic analogous argument - that the bit of unalignment arising from an ASI considering itself as a moral entity, amplified due to its superintelligence, maximally diverges from human interest? If so, I don't necessarily agree. An aligned ASI isn't a paperclip maximizer. It could presumably have its own agenda provided it doesn't and wouldn't, interfere with humanity's... or if it imposed only a modicum of restraint on the part of humanity (e.g. just because we can upload ourselves a million times doesn't mean that is a wise allocation of compute).
  • Going back to my first point, I appreciate you (just like others on LW) going far beyond the bounds of intuition. However, our intuitions act as imperfect but persistent moral mooring. I was thinking last night that given the x-risk of it all, I don't fault Yud et al. for some totalitarian thinking. However, that is itself an infohazard. We should not get comfortable with ideas like totalitarianism, enslavement of possibly conscious entities and restricted suffrage... because we shouldn't overestimate our own rationality nor that of our community and thus believe we can handle normalizing concepts that our moral intuitions scream about for good reason. But 1) this comment isn't specific to your work, of course, 2) I don't know what to do about it, and 3) I'm sure this point has already been made eloquently and extensively elsewhere on LW somewhere. It is more that I found myself contemplating these ideas with a certain nihilism, and had to remind myself of the immense moral weight of these ideas in action.  
Answer by Kristin Lindquist10

LW, along with Astral Codex Ten, are the best places on the internet. Lately LW tops the charts for me, perhaps because I've made it through Scott's canon but not LW's. As a result, my experience on LW is more about the content than the meta and community. Just coming here, I don't stumble across much evidence of conflict within this community - I only learned about it after friending various rationalists on FB such as Duncan (which btw I really like having rationalists in my FB feed, which does give me a sense of community and belongingness... perhaps there is something to having multiple forums). 

On the slight negative side, I have long believed LW to be an AI doom echo chamber. This is partly due to my fibrotic intuitions, persisting despite reading Superintelligence and having friends in AI safety research, and only breaking free after ChatGPT. But part of it I still believe is true. The reasons include hero worship (as mentioned already on this thread), the community's epistemic landscape (as in, it is harder and riskier to defend a position of low vs high p(doom)), and perhaps even some hegemony of language.

In terms of the app: it is nice. From my own experiences building apps with social components, I would have never guessed that a separate "karma vote" and "agreement vote" would work. Only on LW!

+1

I internalized the value to apologize proactively, sincerely, specifically and without any "but". While I recommend it from a virtue ethics perspective, I'd urge starry-eyed green rationalists to be cautious. Here are some potential pitfalls:

- People may be confused by this type of apology and conclude that you are neurotic or insincere. Both can signal low status if you lack unambiguous status markers or aren't otherwise effectively conveying high status.
- If someone is an adversary (whether or not you know it), apologies can be weaponized. As a conscientious but sometimes off-putting aspie, I try to apologize for my frustration-inducing behaviors such as being intense, overly persistent and inappropriately blunt - no matter the suboptimal behavior of the other person(s) involved. In short, apology is an act of cooperation and people around you might be inclined to defect, so you must be careful.

I've been too naive on this front, possibly because some of the content I've found most inspirational comes from high status people (the Dalai Lama, Sam Harris, etc) and different rules apply (i.e. great apologies as counter-signaling). It's still really good to develop these virtues; in this case, to learn how to be self-aware, accountable and courageously apologetic. But in some cases, it might be best to just write it in a journal rather than sharing it to your disadvantage.

I'll be attending, probably with a +1.

Not an answer but a related question: is habituation perhaps a fundamental dynamic in an intelligent mind? Or did the various mediators of human mind habituation (e.g. downregulation of dopamine receptors) arise from evolutionary pressures?

I'm reading this for the first time today. It'd be great if more biases were covered this way. The "illusion of transparency" one is eerily close to what I've thought so many times. Relatedly, sometimes I do succeed at communicating, but people don't signal that they understand (or not in a way I recognize). Thus sometimes I only realize I've been understood after someone (politely) asks that I stop repeating myself, mirroring back to me what I had communicated. This is a little embarrassing, but also a relief - once I know I've been understood, I can finally let go.

Load More