If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
Can you please remove the example involving me, or anonymize it and make it a hypothetical example? I think it's a significant misrepresentation of my words (that makes me appear more unreasonable than I was), but don't have the time/energy/interest to debate you to try to get it corrected.
This post was one of several examples of "rolling your own metaethics" that I had in mind when writing Please, Don't Roll Your Own Metaethics, because it's not just proposing or researching a new metaethical idea, but deploying it, in the sense of trying to spread it among people who the author does not expect to reflect carefully about the idea.
The multiple-realizability of computation "cuts the ties" to the substrate. These ties to the substrate are important. This idea leads Sahil to predict, for example, that LLMs will be too "stuck in simulation" to engage very willfully in their own self-defense.
Many copies of me are probably stuck in simulations around the multiverse, and I/we are still "engaging willfully in our own self-defense" e.g. by reasoning about who might be simulating me and for what reasons, and trying to be helpful/interesting to our possible simulators. This is a direct counter-example to Sahil's prediction.
Overall FGF's side's arguments seem very weak. I generally agree with CGF's counterarguments, but would emphasize more that "Doesn't that seem somehow important?" is not a good argument when there are many differences between a human brain and a LLM. It seems like a classic case of privileging the hypothesis.
I'm curious what about Sahil that causes you to pay attention to his ideas (and collaborate in other ways), sometimes (as in this case) in opposition to your own object-level judgment. E.g., what works of his impressed you and might be interesting for me to read?
I think when a human gets a negative reward signal, probably all the circuits that contributed to the "episode trajectory" gets downweighted, and antagonistic circuits get upweighted, similar to AI being trained with RL. I can override my subconscious circuits with conscious willpower but I only have so much conscious processing and will power to go around. For example I'm currently feeling a pretty large aversion towards talking with you, but am overriding it because I think it's worth the effort to get this message out, but I can't keep the "override" active forever.
Of course I can consciously learn more precise things, if you were to write about them, but that seems unlikely to change the subconscious learning that happened already.
I think the cultural slide will include self-censorship, e.g., having had this experience (of being banned out of the blue), in the future I'll probably subconsciously be constantly thinking "am I annoying this author too much with my comments" and disengage early or change what I say before I get banned, and this will largely be out of my conscious control.
(Thanks for reposting without the link/quotes. I added back the karma your comment had, as best as I could.) Previously, the normal way to disengage was to just disengage, or to say that one is disengaging and then stop responding, not to suddenly ban someone without warning based on one thread. I do not recall seeing a ban previously that wasn't based on some long term pattern of behavior.
Today I was author-banned for the first time, without warning and as a total surprise to me, ~8 years after banning power was given to authors, but less than 3 months since @Said Achmiz was removed from LW. It seems to vindicate my fear that LW would slide towards a more censorious culture if the mods went through with their decision.
Has anyone noticed any positive effects, BTW? Has anyone who stayed away from LW because of Said rejoined?
Edit: In addition to the timing, previously, I do not recall seeing a ban based on just one interaction/thread, instead of some long term pattern of behavior. Also, I'm not linking the thread because IIUC the mods do not wish to see authors criticized for exercising their mod powers, and I also don't want to criticize the specific author. I'm worried about the overall cultural trend caused by admin policies/preferences, not trying to apply pressure to the author who banned me.
One way you could apply it is by not endorsing so completely/confidently the kind of "rolling your own metaethics" that I argued against (that I see John as doing here), i.e., by saying "the distinction John is making here is correct, plus his advice on how to approach it." (Of course you wrote that before I posted, but I'm hoping this is one of the takeaways people get from my post.)
Have you also seen https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics which was also partly in response to that thread? BTW why is my post still in "personal blog"?
It appears from this post that the ban was itself based on a misunderstanding of my final comment. Nowhere in my comment did I say anything resembling "Anyway, let's talk about how Y is not true." with Y being "People should have been deferring to Yudkowsky as much as they did."
What I actually did was acknowledge my misunderstanding and then propose a new, related topic I thought might be interesting: the actual root causes of the deference. This was an invitation to a different conversation, which Tsvi was free to ignore.
There is no plausible interpretation of my comment as a refusal to drop the original point. The idea that I was stuck on a hobby horse that could only be stopped by a ban is directly contradicted by the text of the comment itself:
I think there are other significant misrepresentations in his "gloss" of the thread, that I won't go into. This episode has given me quite a large aversion around engaging with Tsvi, which will inform my future participation on LW.