Do you know if Plato was claiming Euclidean geometry was physically true in that sense? Doesn't sound like something he would say.
I'd like to see how this would compare to a human organization. Suppose individual workers or individual worker-interactions are all highly faithful in a tech company. Naturally, though, the entire tech company will begin exhibiting misalignment, tend towards blind profit seeking, etc. Despite the faithfulness of its individual parts.
Is that the kind of situation you're thinking of here? Is that why having mind-reading equipment that forced all the workers to dump their inner monologue wouldn't actually be of much use towards aligning the overall system, because the real problem is something like the aggregate or "emergent" behavior of the system, rather than the faithfulness of the individual parts?
What do you mean by "over the world"? Are you including human coordination problems in this?
Did you end up writing the list of interventions? I'd like to try some of them. (I also don't want to commit to doing 3 hours a day for two weeks until I know what the interventions are.)
It's very surprising to me that he would think there's a real chance of all humans collectively deciding to not build AGI, and successfully enforcing the ban indefinitely.
Patternism is usually defined as a belief about the metaphysics of consciousness, but that boils down to incoherence, so it's better defined as a property of a utility function of agents not minding being subjected to major discontinuities in functionality, ie, being frozen, deconstructed, reduced to a pattern of information, reconstructed in another time and place, and resumed.
That still sounds like a metaphysical belief, and less empirical since consciousness experience isn't involved in it (instead it sounds like it's just about personal identity).
Any suggestions for password management?
Because it's an individualized approach that is a WIP and if I just write it down 99% of people will execute it badly.
Why is that a problem? Do you mean this in the sense of "if I do this, it will lead to people making false claims that my experiment doesn't replicate" or "if I do this, nothing good will come of it so it's not even worth the effort of writing".
As someone who runs a lot of self-experiments and occasionally helps others, I'm disappointed in but sympathetic to this approach. People are complicated: the right thing to do probably is try a bunch of stuff and see what sticks. But people really, really want the answer to be simple, and will round down complicated answers until they are simple enough, then declare the original protocol a failure when their simplification doesn't work.
I think it would be valuable for George to write up the list of interventions they considered, and a case report o...
I'm confused whether:
Skimming it again I'm pretty sure you mean (2).
If I understand right the last sentence should say "does not hold".
It's not easy to see the argument for treating your vales as incomparable with the values of other people, but seeing your future self's values as identical to your own. Unless you've adopted some idea of a personal soul.
The suffering and evil present in the world has no bearing on God's existence. I've always failed to buy into that idea. Sure, it sucks. But it has no bearing on the metaphysical reality of a God. If God does not save children--yikes I guess? What difference does it make? A creator as powerful as has been hypothesised can do whatever he wants; any arguments from rationalism be damned.
Of course, the existence of pointless suffering isn't an argument against the existence of a god. But it is an old argument against the existence of a god who deserves to b...
"tensorware" sprang to mind
Yeah, it's hard to say whether this would require restructuring the whole reward center in the brain or if the needed functionality is already there, but just needs to be configured with different "settings" to change the origin and truncate everything below zero.
My intuition is that evolution is blind to how our experiences feel in themselves. I think it's only the relative differences between experiences that matter for signaling in our reward center. This makes a lot of sense when thinking about color and "qualia inversion" thought experiments, but it's trickier with valence. My color vision could become inverted tomorrow, and it would hardly affect my daily routine. But not so if my valences were inverted.
What about our pre-human ancestors? Is the twist that humans can't have negative valences either?
I agreed up until the "euthanize everything that remains" part. If we actually get to the stage of having aligned ASI, there are probably other options with the same or better value. The "gradients of bliss" that I described in another comment may be one.
Pearce has the idea of "gradients of bliss", which he uses to try to address the problem you raised about insensitivity to pain being hazardous. He thinks that even if all of the valences are positive, the animal can still be motivated to avoid danger if doing so yields an even greater positive valence than the alternatives. So the prey animals are happy to be eaten, but much more happy to run away.
To me, this seems possible in principle. When I feel happy, I'm still motivated at some low level to do things that will make me even happier, even though I was...
What are your thoughts on David Pearce's "abolitionist" project? He suggests genetically engineering wild animals to not experience negative valences, but still show the same outward behavior. From a sentientist stand-point, this solves the entire problem, without visibly changing anything.
Same. I feel somewhat jealous of people who can have a visceral in-body emotional reaction to X-risks. For most of my life I've been trying to convince my lizard brain to feel emotions that reflect my beliefs about the future, but it's never cooperated with me.
You can compress huge prompts into metatokens, too (just run inference with the prompt to generate the training data)
I'm very curious about this technique but couldn't find anything about it. Do you have any references I can read?
I see. Yes, "philosophy" often refers to particular academic subcultures, with people who do their philosophy for a living as "philosophers" (Plato had a better name for these people). I misread your comment at first and thought it was the "philosopher" who was arguing for the instrumentalist view, since that seems like their more stereotypical way of thinking and deconstructing things (whereas the more grounded physicist would just say "yes, you moron, electrons exist. next question.").
Do you have any examples of the "certain philosophers" that you mentioned? I've often heard of such people described that way, but I can't think of anyone who's insulted scientists for assuming e.g. causality is real.
On the contrary, it is my intention to illustrate that assertions of instances that have not been experienced (with respect to their assertion at t1) can be justified in the future in which they are observed (with respect to their observation at t2).
Sorry, I may not be following this right. I had thought the point of the skeptical argument was that you can't justify a prediction about the future until it happens. Induction is about predicting things that haven't happened yet. You don't seem to be denying the skeptical argument here, if we still need to wait for the prediction to resolve before it can be justified.
I've also noticed that scaffolded LLM agents seem inherently safer. In particular, deceptive alignment would be hard for one such agent to achieve, if at every thought-step it has to reformulate its complete mind state into the English language just in order to think at all.
You might be interested in some work done by the ARC Evals team, who prioritize this type of agent for capability testing.
I'm sorry that comparing my position to yours led to some confusion: I don't deny the reality of 3rd person facts. They probably are real, or at least it would be more surprising if they weren't than if they were. (If not, then where would all of the apparent complexity of 1st person experience come from? It seems positing an external world is a good step in the right direction to answering this). My comparison was about which one we consider to be essential. If I had used only "pragmatist" and "agnostic" as descriptors, it would have been less confusing.
A...
If I had to choose between those two phrasings I would prefer the second one, for being the most compatible between both of our notions. My notion of "emerges from" is probably too different from yours.
The main difference seems to be that you're a realist about the third-person perspective, whereas I'm a nominalist about it, to use your earlier terms. Maybe "agnostic" or "pragmatist" would be good descriptors too. The third-person is a useful concept for navigating the first-person world (i.e. the one that we are actually experiencing). But that it seems u...
I meant subjective in the sense of "pertaining to a subject's frame of reference", not subjective in the sense of "arbitrary opinion". I'm sorry if that was unclear.
But all of these observations are also happening from a third-person perspective, just like the rest of reality.
This is a hypothesis, based on information in your first-person perspective. To make arguments about a third-person reality, you will always have to start with first-person facts (and not the other way around). This is why the first person is epistemologically more fundamental.
It's possible to doubt that there is a third-person perspective (e.g. to doubt that there's anything like being God). But our first person perspective is primary, and ca...
You don't believe that all human observations are necessarily made from a first-person viewpoint? Can you give a counter-example? All I can think of are claims that involve the paranormal or supernatural.
I don't think I fall into either camp because I think the question is ambiguous. It could be talking about the natural structure of space and time ("mathematics") or it could be talking about our notation and calculation methods ("mathematics"). The answer to the question is "it depends what you mean".
The nominalist vs realist issue doesn't appear very related to my understanding of the Hard Problem, which is more about the definition of what counts as valid evidence. Eliminitivism says that subjective observations are problematic. But all observations are subjective (first person), so defining what counts as valid evidence is still unresolved.
I appreciate hearing your view; I don't have any comments to make. I'm mostly interested in finding a double crux.
This isn't really a double crux, but it could help me think of one:
If someone becomes convinced that there isn't any afterlife, would this rationally affect their behavior? Can you think of a case where someone believed in Heaven and Hell, had acted rationally in accordance with that belief, then stopped believing in Heaven and Hell, but still acted just the same way as they did before? We're assuming their utility function hasn't changed, just their ontology.
Here are some cruxes, stated from what I take to be your perspective:
because such sensations would be equivalent to predictions that I would be burning alive, which would be false and therefore interfere with my functioning
I don't see a necessary equivalence here. You could be fully aware that the sensations were inaccurate, or hallucinated. But it would still hurt just as much.
if you could have a body which doesn’t experience, then it’s not going to function as normal.
A human body, or any kind of body? It seems like a robot could engage in the same self-preservation behavior as a human without needing to have anythi...
You seem to be claiming that you have experiences, but that their role is purely functional. If you were to experience all tactile sensations as degrees of being burnt alive, but you could still make predictions just as well as before, it wouldn't make any difference to you?
It's plausible that reverse-engineering the human mind requires tools that are much more powerful than the human mind.
So you don't believe there is such a thing as first-person phenomenal experiences, sort of like Brian Tomasik? Could you give an example or counterexample of what would or wouldn't qualify as such an experience?
Doesn't "direct" have the implication of "certain" here?
Response in favor of the assumption that Signer said was detrimental.
but my current theory is that one such detrimental assumption is "I have direct knowledge of content of my experiences"
It's true this is the weakest link, since instances of the template "I have direct knowledge of X" sound presumptuous and have an extremely bad track record.
The only serious response in favor of the presumptuous assumption [edit] that I can think of is epiphenomenalism in the sense of "I simply am my experiences", with self-identity (i.e. X = X) filling the role of "having direct knowledge of X". For explaining how we're able to have co...
The burden of proof is on those who assert that the Hard Problem is real. You can say what consciousness is not, but can you say what it is?
In the sense that you mean this, this is a general argument against the existence of everything, because ultimately words have to be defined either in terms of other words or in terms of things that aren't words. Your ontology has the same problem, to the same degree or worse. But we only need to give particular examples of conscious experience, like suffering. There's no need to prove that there is some essence of ...
Are you saying that you don't think there's any fact of the matter whether or not you have phenomenal experiences like suffering? Or do you mean that phenomenal experience is unreal in the same way that the hellscape described by Dante is unreal?
I don't like "illusionism" either, since it makes it seem like illusionists are merely claiming that consciousness is an illusion, i.e., it is something different than what it seems to be. That claim isn't very shocking or novel, but illusionists aren't claiming that. They're actually claiming that you aren't having any internal experience in the first place. There isn't any illusion.
"Fictionalism" would be a better term than "illusionism": when people say they are having a bad experience, or an experience of saltiness, they are just describing a fictional character.
Exactly. I wish the economic alignment issue was brought up more often.
You're right. I'm updating towards illusionism being orthogonal to anthropics in terms of betting behavior, though the upshot is still obscure to me.
I agree realism is underrated. Or at least the term is underrated. It's the best way to frame ideas about sentientism (in the sense of hedonic utilitarianism). On the other hand, you seem to be talking more about rhetorical benefits of normative realism about laws.
Most people seem to think phenomenal valence is subjective, but that's confusing the polysemy of the word "subjective", which can mean either arbitrary or bound to a first-person subject. All observations (including valenced states like suffering) are subjective in the second sense, but not in th...
it is easy to cooperate on the shared goal of not dying
Were you here for Petrov Day? /snark
But I'm confused what you mean about a Pivotal Act being unnecessary. Although both you and a megacorp want to survive, you each have very different priors about what is risky. Even if the megacorp believes your alignment program will work as advertised, that only compels them to cooperate with you if they are (1) genuinely concerned about risk in the first place, (2) believe alignment is so hard that they will need your solution, and (3) actually possess the institutional coordination abilities needed.
And this is just for one org.
World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.
That is totum pro parte. It's not World B which has a solution at hand. It's you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don't believe it's a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there's very little chance that your utopian vision will match up wit...
Okay, let's operationalize this.
Button A: The state of alignment technology is unchanged, but all the world's governments develop a strong commitment to coordinate on AGI. Solving the alignment problem becomes the number one focus of human civilization, and everyone just groks how important it is and sets aside their differences to work together.
Button B: The minds and norms of humans are unchanged, but you are given a program by an alien that, if combined with an AGI, will align that AGI in some kind of way that you would ultimately find satisfying.
World ...
I agree that the political problem of globally coordinating non-abuse is more ominous than solving technical alignment. If I had the option to solve one magically, I would definitely choose the political problem.
What it looks like right now is that we're scrambling to build alignment tech that corporations will simply ignore, because it will conflict with optimizing for (short-term) profits. In a word: Moloch.
The existence of God and Free Will feel like religious problems that philosophers took interest in, and good riddance to them.
Whether the experience of suffering/pain is fictional or not is a hot topic in some circles, but both sides are quite insistent about being good church-going "materialists" (whatever that means).
As for "knowledge", I agree that question falls apart into a million little subproblems. But it took the work of analytic philosophers to pull it apart, and after much labor. You're currently reaping the rewards of that work and the simplicity of hindsight.