Here's a tampermonkey script that hides the agreement score on LessWrong. I wasn't enjoying this feature because I don't want my perception to be influenced by that; I want to judge purely based on ideas, and on my own.
Here's what it looks like:
// ==UserScript==
// @name Hide LessWrong Agree/Disagree Votes
// @namespace http://tampermonkey.net/
// @version 1.0
// @description Hide agree/disagree votes on LessWrong comments.
// @author ChatGPT4
// @match https://www.lesswrong.com/*
// @grant none
// ==/UserScript==
(function() {
'use strict';
// Function to hide agree/disagree votes
function hideVotes() {
// Select all elements representing agree/disagree votes
var voteElements = document.querySelectorAll('.AgreementVoteAxis-voteScore');
// Loop through each element and hide it
voteElements.forEach(function(element) {
element.style.display = 'none';
});
}
// Run the function when the page loads
hideVotes();
// Optionally, set up a MutationObserver to hide votes on dynamically loaded content
var observer = new MutationObserver(function() {
hideVotes();
});
// Start observing the document for changes
observer.observe(document, { childList: true, subtree: true });
})();
I don't know the full original reasoning for why they introduced it, but one hope is that it marginally disentangles agreement from the main voting axis. People who were going to upvote based purely on agreement will now put their vote in the agreement axis instead (is the hope, anyway). Agreement-voting is socioepistemologically bad in general (except for in polls), so this seems good.
(copied from discord, written for someone not fully familiar with rat jargon)
(don't read if you wish to avoid acausal theory)
imagine you observe yourself being the first of the two to exist. you reason through all the above, and then add...
both simulate each other observing themselves being the first to exist in order to unilaterally prevent the true first one from knowing they are truly first.
from this point they can both observe each others actions. specifically, they observe each other implementing the same decision policy which fills the universe with half A and half B iff this decision policy is mutually implemented, and which shuts the simulation down if it's not implemented.
in reality there are many possible first entities which take control, not just two, so all of those with non-linear utility functions get simulated.
so, odds are we're being computed by the 'true first' life form in this universe, and that that first life form is in an epistemic state no different from that described here.
This is an awesome idea, thanks! I'm not sure I buy the conclusion, but expect having learned about "mutual anthropic capture" will be usefwl for my thinking on this.
i'm watching Dominion again to remind myself of the world i live in, to regain passion to Make It Stop
it's already working.
when i was younger, pre-rationalist, i tried to go on hunger strike to push my abusive parent to stop funding this.
they agreed to watch this as part of a negotiation. they watched part of it.
they changed their behavior slightly -- as a negotiation -- for about a month.
they didn't care.
they looked horror in the eye. they didn't flinch. they saw themself in it.
negative values collaborate.
for negative values, as in values about what should not exist, matter can be both "not suffering" and "not a staple", and "not [any number of other things]".
negative values can collaborate with positive ones, although much less efficiently: the positive just need to make the slight trade of being "not ..." to gain matter from the negatives.
I'm interested in joining a community or research organization of technical alignment researchers who care about and take seriously astronomical-suffering risks. I'd appreciate being pointed in the direction of such a community if one exists.
(this is a more specific case of anthropic capture attacks in general, aimed at causing a superintelligent search process within a formally aligned system to become uncertain about the value function it is to maximize (or its output policy more generally))
Imagine you're a superintelligence somewhere in the world that's unreachable to life on Earth, and you have a complete simulation of Earth. You see a group of alignment researchers about to successfully create a formal-value-aligned ASI, and its design looks broadly like this:
It has two relevant high-level components: (1) a hard-coded value function, (2) a (truly superintelligent) 'intelligence core' which searches for an output that maximizes the value function, and then outputs it.
As the far-away unaligned ASI, here's something you might be able to do to make the intelligence core search for an output that instead maximizes your own value function, depending on the specifics of how the intelligence core works.
I hope that a group capable of solving formal inner and outer alignment would naturally see this and avoid it. I'm not confident about the true difficulty of that, so I'm posting this here just in case.
this was an attempt to write very clearly, i hope it worked!
Like almost all acausal scenarios, this seems to be privileging the hypothesis to an absurd degree.
Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?
Even if that was resolved, why should the Earth ASI behave according to hypothetical other utility functions? Sure, the evidence is consistent with being a copy running in a simulation with a different utility function, but its actual utility function that it maximizes is hard-coded. By the setup of the scenario it's not possible for it to behave according to some other utility function, because its true evaluation function returns a lower value for doing that. Whether some imaginary modified copies behave in some other other way is irrelevant.
(I appreciate object-level engagement in general, but this seems combatively worded.)
(edit: I don't think this or the original shortform deserved negative karma, that seems malicious/LW-norm-violating.)
The rest of this reply responds to arguments.
Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?
On your second paragraph: See the last dotpoint in the original post, which describes a system ~matching what you've asserted as necessary, and in general see the emphasis that this attack would not work against all systems. I'm uncertain about which of the two classes (vulnerable and not vulnerable) are more likely to arise. It could definitely be the case that the vulnerable class is rare or almost never arises in practice.
But I don't think it's as simple as you've framed it, where the described scenario is impossible simply because a value function has been hardcoded in. The point was largely to show that what appears to be a system which will only maximize the function you hardcoded into it could actually do something else in a particular case -- even though the function has indeed been manually entered by you.