I think philosophy is basically either conceptual analysis to turn an unclear question into a well-defined empirical / mathematical one or normative reasoning about what we ought to do, feel or believe. I’ve developed and programmed a formal theory of metasemantics and metaethics that can explain how to ideally do those. I apply them to construct an ethical goal function for AI. It would take some more work to figure out the details but I think together they also provide the necessary resources to solve metaphilosophy.
I think the simplest intentional systems just refer to their own sensory states. It's true that we are able to refer to external things but that's not by somehow having different external causes of our cognitive states from that of those simple systems. External reference is earned by reasoning in such a way that attributing content like 'the cause of this and that sensory state ...' is a better explanation of our brain's dynamics and behavior than just 'this sensory state', e.g. reasoning in accordance with the axioms of Pearl's causal models. This applie...
My aim is to specify our preferences and values in a way that is as philosophically correct as possible in defining the AI's utility function. It's compatible with this that in practice, the (eventual scaled down version of the) AI would use various heuristics and approximations to make its best guess based on "human-related data" rather than direct brain data. But I do think it's important for the AI to have an accurate concept of what these are supposed to be an approximation to.
But it sounds like you have a deeper worry that intentional states are not r...
Officially, my research is metaethical. I tell the AI how to identify someone’s higher-order utility functions but remain neutral on what those actually are in humans. Unofficially, I suspect they amount to some specification of reflective equilibrium and prescribe changing one’s values to be more in line with that equilibrium.
On distortion, I’m not sure what else to say but repeat myself. Distortions are just changes in value not governed by satisfying higher-order decision criteria. The examples I gave are not part of the specification, they’re just thin
...Here, the optimal decisions would be the higher-order outputs which maximize higher-order utility. They are decisions about what to value or how to decide rather than about what to do.
To capture rational values, we are trying to focus on the changes to values that flow out of satisfying one’s higher-order decision criteria. By unrelated distortions of value, I pretty much mean changes in value from any other causes, e.g. from noise, biases, or mere associations.
In the code and outline I call the lack of distortion Agential Identity (similar to personal ide
...Nice catch. Yes, I think I’ll have to change the ordinal utility functions to range over lotteries rather than simply outcomes.
In this initial version, I am just assuming the ontology of the world is given, perhaps from just an oracle or the world model the AI has inferred.
I now have a much more readable explanation of my code. I'd be interested to hear your thoughts on it.
Yeah, more or less. In the abstract, I "suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available." I've tended to imagine this as an oracle that just has a causal model of the actual world and the brains in it. But whole brain emulations would likely also suffice.
In the code, the causal models of the world and brains in it would be passed as parameters to the metaethical_ai_u function in main. The world w and each element of the set bs would be an instance of the ca...
If you or anyone else could point to a specific function in my code that we don't know how to compute, I'd be very interested to hear that. The only place that I know of that is uncomputable is in calculating Kolmogorov complexity, but that could be replaced by some finite approximation. The rest should be computable, though its complexity may be super-duper exponentially exponential.
In the early stages, I would often find, as you expect, components that I thought would be fairly straightforward to define technically but would realize upon diggin...
I agree that there can be a skill involved in observation but isn’t there also a cost in attention and energy? In that case, it probably isn’t wise to try to observe anything and everything. Perhaps there are some principles for noticing when observation is likely to be worthwhile.
I also worry about generalizing too much from the example of fiction, which is often crafted to try to make nothing arbitrary. That property seems far less likely to apply to reality.
If you mean an AGI that optimizes for human values exactly as they currently are will be unaligned, you may have a point. But I think many of us are hoping to get it to optimize for an idealized version of human values.
Both eliminative materialism and reductionism can acknowledge that consciousness is not necessary for explanation and seek a physical explanation. But while eliminativists conclude that there is no such thing as consciousness, reductionists say we simply would have discovered that consciousness is different from what we might have initially thought and is a physical phenomenon. Is there a reason you favor the former?
One might think eliminativism is metaphysically simpler but reductionism doesn’t really posit more stuff, more like just allowing synonyms for...
I agree that people's actual moral views don't track all that well with correct reasoning from their fundamental norms. Normative reasoning is just one causal influence on our views but there's plenty of biases such as from status games that also play a causal role. That's no problem for my theory. It just carefully avoids the distortions and focuses on the paths with correct reasoning to determine the normative truths. In general, our conscious desires and first-order views don’t matter that much on my view unless they are endorsed by the standards we imp... (read more)