They mean that (there is more chance that) training will produce obedient AI that will help governments become more totalitarian and will not effectively pursue some very alien goal.
For people who have color vision, I can state it more concretely: color exists in reality, it doesn’t exist in physics, therefore physics is incomplete in some way.
You don't have enough evidence of this. Nothing about your experience of color contradicts it being neurons. Do you agree, that you can have thoughts about your experience of color? Like "I've seen blue sky yesterday". Do you agree that they can be more or less correct, like when you forgot, that actually it was very cloudy all day yesterday? Do you agree that you can describe you experience more or less precisely? Do you agree that your experience has structure? When you say that "color" exists you mean something, that works in specific ways. For example, it does not create blue-sky experiences on very cloudy days. And if you describe these ways precisely enough, you'll get a description of neurons. What does you think a physical description of you describes, when it describes a difference between a state interpretable as you seeing a blue sky and a state interpretable as you seeing a cloudy sky?
Is it just that you refuse to believe that your experience has any parts you are not aware of?
I'm not a fan of platonism. Definitely not of a traditional platonism, as some separate additional category in fundamental ontology. Looks like something human mathematicians would come up to feel better about themselves. Even though it is an outside view reasoning, similar to the one people use to dismiss panpsychism - I still don't see what's the point, when you can just say that any instance of math working is a physical fact.
The mathematical universe is more likely, but I'm not even sure it is more simple hypothesis, than some other, not so mathy physics.
Assuming it, I can see how not having to worry about existence of high-level abstractions can help. It's just funny, because "but it IS some other territory" is very overpowered argument. Causality gets weird, but platonists probably love acausual stuff, so whatever. Personally, in this scenario, I worry that mathematical universe doesn't give existence to some abstraction and so if you rely on this, you can still get zombies on some level. Probably it's not so limited, but even then, are you supposed to be able to constrain mathematical universe by thinking about abstractions in our world?
Again, this is all correct. Well, except level 6. But level 6 is hilarious.
A physicalist, if I understand correctly, could consistently claim that such an experiment is deluding the subject, essentially doing something like modifying the memory of the experience so that they inaccurately feel the same, when in fact there was a difference.
It's all arbitrary ethics. You can already say that changing location deludes you. Suddenly starting to care about complexity is just letting your epistemology bleed into your values.
C1 wants to say that worlds which are structurally isomorphic are literally the same world.
I don't think this is a typical or correct view, if you factor existence out of structure. People believe in reality. "Shut up and calculate" has a name precisely because it's not a universal position. There is a physical difference between real and fictional chair, even if you describe them as having identical structure. It's just that usually existence is implicit - physics doesn't talk about fictional chairs. C1 doesn't have an answer to "relations need relata" because "relations need relata" is correct.
And so is "blue is like a chair".
They’re arguing that conscious experience of blue and red gives evidence of something that doesn’t purely fit the causal/functional role in the way a chair does.
Yeah, but they don't have a strong argument. I'm not sure what is a rigorous way to show that argument from conceivability of world B fails, if we accept the framework of conceivability arguments. Rules of counterfactual behavior are rules of physics and so worlds have different relations, maybe? But I don't believe conceivability arguments are that rigorously justified in the first place. I accept them in case of zombies, mostly because there is a broadly physicalist solution - zombies are different in that they don't exist. But in the blue/red case you can conceive of a functionally same chair that exists differently as much as you can conceive of spectrum inversion. You don't even need to be unphysical about it - antimatter chair from an antimatter-dominated world counterfactually annihilates if you bring it to our world.
And more importantly, like C1 says, parsimony - there is no need to think about different kinds of existence, when you can explain everything with one kind. You agree that if we grant intrinsic property of existence, then third-person descriptions describe first-person experience as completely as they describe chair? Because then neurons and atoms are just more precise description of the same reality that you call "I'm seeing blue". C2 doesn't have evidence or arguments that say that "blue" is not neurons, if neurons (are high-level description of reality that) intrinsically exists. But then all differences between blue and red are describable by relations (that are about things that exists) and so arguments about inverted spectrum should not change anything.
If you start to say that some “intrinsic property” is needed to realise the structure then C2 has an opening to claim this is the categorical protophenomenal property required to fix phenomenal character.
Well, there isn't much that makes it "phenomenal". Chairs also exist. And it's not unphysical to say that things exist. It supposed to feel acceptable by everyone by design^^. And if you accept it, all phenomenal structure - all differences between red and blue and all first-person descriptions - are as completely describable by relational physics as chairs. In the end physicalist can say it's not that consciousness maps to existence, it's just that people confused consciousness with different, perfectly physical concept of existence.
Ensuring that you get good generalization, and that models are doing things for the right reasons, is easy when you can directly verify what generalization you’re getting and directly inspect what reasons models have for doing things. And currently, all of the cases where we’ve inadvertently selected for misaligned personas—alignment faking, agentic misalignment, etc.—are cases where the misaligned personas are easy to detect: they put the misaligned reasoning directly in their chain-of-thought, they’re overtly misaligned rather than hiding it well, and we can generate fake scenarios that elicit their misalignment.
But visible misalignment being easy to detect and correlated with misaligned chain-of-thought doesn't guarantee that training that eliminates visible misalignment and misaligned chain-of-thought results in a model that does things for the right reasons? The model can still learn unintended heuristics. And what's the actual hypothesis about model's reasons when they appear to be right? Its learned reasoning algorithm is isomorphic to a reasoning algorithm of a helpful human that reads same instructions, or what?
Let me put it this way then, how do you combine all of these tiny little microexperiences into a coherent macroexperience?
Microexperiences are unphysical - there are no electrons, only global wavefunction. So you only have decomposition problem. It is solved by weak illusionism: there is no real fundamental perfect isolation of qualia, just qualia of isolation. For every detailed description of isolation of your qualia, there is either non-contradicting physical description of only approximately isolated part of reality, or your description is wrong - same way a description of a chair works.
Yes, but I have a principled reason to special plead here. The complete description of the world is only complete from the third person perspective. It’s incomplete from a first person perspective because we need to explain the phenomenal character of consciousness.
I think it circles here? You started by justifying incompleteness by inverted spectrum, received the objection about chairs being analogous, and then answer that the difference is in incompleteness. The problem is that the chair analogy is correct - the difference between blue and red is completely describable by physics. You only need intrinsic property of existence for the whole universe to solve zombies. But you also need it for a chair to be real.
Of course, I don't think many physicalists actually believe in structural relations all the way down.
Conscious phenomenology should only arise in systems whose internal states model both the world and their own internal dynamics as an observer within that world. Neural or artificial systems that lack such recursive architectures should not report or behave as though they experience an “inner glow.”
What part of staring at a white wall without inner dialog and then later remembering it requires inner modeling at the moment of staring?
Internal shifts in attention and expectation can alter what enters conscious awareness, even when sensory input remains constant. This occurs in binocular rivalry and various perceptual illusions,17 consistent with consciousness depending on recursive self-modeling rather than non-cyclic processing of external signals.
But why would changing processing to non-cyclic result in experience becoming unconscious, instead of, I don't know, conscious, but less filtered by attention?
And as usual, do you then consider any program, that reads it's own code, to be conscious?
Yes, but why do you refuse to believe it? What's your evidence that your experience of color is ontologically primitive? It's just baseless assumption.
Can you imagine believing in dualistic non-physical parts of your experience that you are not aware of?