User Comment Replies

25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong

I agree that people's actual moral views don't track all that well with correct reasoning from their fundamental norms. Normative reasoning is just one causal influence on our views but there's plenty of biases such as from status games that also play a causal role. That's no problem for my theory. It just carefully avoids the distortions and focuses on the paths with correct reasoning to determine the normative truths. In general, our conscious desires and first-order views don’t matter that much on my view unless they are endorsed by the standards we imp... (read more)

Ngo's view on alignment difficulty

June Ku3y130

I think philosophy is basically either conceptual analysis to turn an unclear question into a well-defined empirical / mathematical one or normative reasoning about what we ought to do, feel or believe. I’ve developed and programmed a formal theory of metasemantics and metaethics that can explain how to ideally do those. I apply them to construct an ethical goal function for AI. It would take some more work to figure out the details but I think together they also provide the necessary resources to solve metaphilosophy.

25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong

June Ku4y20

I think the simplest intentional systems just refer to their own sensory states. It's true that we are able to refer to external things but that's not by somehow having different external causes of our cognitive states from that of those simple systems. External reference is earned by reasoning in such a way that attributing content like 'the cause of this and that sensory state ...' is a better explanation of our brain's dynamics and behavior than just 'this sensory state', e.g. reasoning in accordance with the axioms of Pearl's causal models. This applie... (read more)

3Charlie Steiner4y

No, I'm definitely being more descriptivist than causal-ist here. The point I want to get at is on a different axis. Suppose you were Laplace's demon, and had perfect knowledge of a human's brain (it's not strictly necessary to pretend determinism, but it sure makes the argument simpler). You would have no need to track the human's "wants" or "beliefs," you would just predict based on the laws of physics. Not only could you do a better job than some human psychologist on human-scale tasks (like predicting in advance which button the human will press), you would be making information-dense predictions about the microphysical state of the human's brain the would just be totally beyond a model of humans coarse-grained to the level of psychology rather than physics. So when you say "External reference is earned by reasoning in such a way that attributing content like 'the cause of this and that sensory state ...' is a better explanation", I totally agree, but I want to emphasize: better explanation for whom? If we somehow built Laplace's demon, what I'd want to tell it is something like "model me according to my own standards for intentionality."

25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong

June Ku4y20

My aim is to specify our preferences and values in a way that is as philosophically correct as possible in defining the AI's utility function. It's compatible with this that in practice, the (eventual scaled down version of the) AI would use various heuristics and approximations to make its best guess based on "human-related data" rather than direct brain data. But I do think it's important for the AI to have an accurate concept of what these are supposed to be an approximation to.

But it sounds like you have a deeper worry that intentional states are not r... (read more)

3Charlie Steiner4y

I don't think we disagree too much, but what does "play the right functional role" mean, since my desires are not merely about what brain-state I want to have, but are about the real world? If I have a simple thermostat where a simple bimetallic spring opens or closes a switch, I can't talk about the real-world approximate goals of the thermostat until I know whether the switch goes to the heater or to the air conditioner. And if I had two such thermostats, I would need the connections to the external world to figure out if they were consistent or inconsistent. In short, the important functional role that my desires play does not just take place intra-cranially, they function in interaction with my environment. If you were a new superintelligence, and the first thing you found was a wireheaded human, you might conclude that humans value having pleasurable brain states. If the first thing you found were humans in their ancestral environment, you might conclude that they value nutritious foods or producing healthy babies. The brains are basically the same, but the outside world they're hooked up to is different. So from the premises of functionalism, we get a sort of holism.

New MetaEthical.AI Summary and Q&A at UC Berkeley

June Ku5y10

Officially, my research is metaethical. I tell the AI how to identify someone’s higher-order utility functions but remain neutral on what those actually are in humans. Unofficially, I suspect they amount to some specification of reflective equilibrium and prescribe changing one’s values to be more in line with that equilibrium.

On distortion, I’m not sure what else to say but repeat myself. Distortions are just changes in value not governed by satisfying higher-order decision criteria. The examples I gave are not part of the specification, they’re just thin

June Ku5y10

Here, the optimal decisions would be the higher-order outputs which maximize higher-order utility. They are decisions about what to value or how to decide rather than about what to do.

To capture rational values, we are trying to focus on the changes to values that flow out of satisfying one’s higher-order decision criteria. By unrelated distortions of value, I pretty much mean changes in value from any other causes, e.g. from noise, biases, or mere associations.

In the code and outline I call the lack of distortion Agential Identity (similar to personal ide

... (read more)

2Gordon Seidoh Worley5y

What constitutes utility here, then? For example, some might say utility is grounded in happiness or meaning, in economics we often measure utility in money, and I've been thinking along the lines of grounding utility (through value) in minimization of prediction error. It's fine that you are concerned with higher-order processes (I'm assuming by that you mean processes about processes, like higher-order outputs is outputs about outputs, higher-order utility is utility about utility), and maybe you are primarily concerned with abstractions that let you ignore these details, but then it must still be that those abstractions can be embodied in specifics at some point or else they are abstractions that don't describe reality well. After all, meta-values/preferences/utility functions are still values/preferences/utility functions. How do you distinguish whether something is a distortion or not? You point to some things that you consider distortions, but I'm still unclear on the criteria by which you know distortions from the rational values you are looking for. One person's bias may be another person's taste. I realize some of this may depend on how you identify higher-order processes, but even if that's the case we're still left with the question as it applies to those directly, i.e. is some particular higher-order decision criterion a distortion or rational? This seems strange to me, because much of what makes a person unique lies in their distortions (speaking loosely here), not in their lack. Normally when we think of distortions they are taking an agent away from a universal perfected norm, and that universal norm would ideally be the same for all agents if it weren't for distortions. What leads you to think there are some personal dispositions that are not distortions and not universal because they are caused by the shared rationality norm?

New MetaEthical.AI Summary and Q&A at UC Berkeley

June Ku5y10

Nice catch. Yes, I think I’ll have to change the ordinal utility functions to range over lotteries rather than simply outcomes.

In this initial version, I am just assuming the ontology of the world is given, perhaps from just an oracle or the world model the AI has inferred.

Formal Metaethics and Metasemantics for AI Alignment

June Ku5y10

I now have a much more readable explanation of my code. I'd be interested to hear your thoughts on it.

Formal Metaethics and Metasemantics for AI Alignment

June Ku5y30

Yeah, more or less. In the abstract, I "suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available." I've tended to imagine this as an oracle that just has a causal model of the actual world and the brains in it. But whole brain emulations would likely also suffice.

In the code, the causal models of the world and brains in it would be passed as parameters to the metaethical_ai_u function in main. The world w and each element of the set bs would be an instance of the ca... (read more)

Formal Metaethics and Metasemantics for AI Alignment

June Ku5y30

If you or anyone else could point to a specific function in my code that we don't know how to compute, I'd be very interested to hear that. The only place that I know of that is uncomputable is in calculating Kolmogorov complexity, but that could be replaced by some finite approximation. The rest should be computable, though its complexity may be super-duper exponentially exponential.

In the early stages, I would often find, as you expect, components that I thought would be fairly straightforward to define technically but would realize upon diggin... (read more)

3Mitchell_Porter5y

"If you or anyone else could point to a specific function in my code that we don't know how to compute, I'd be very interested to hear that." From the comments in main(): "Given a set of brain models, associate them with the decision algorithms they implement." "Then map each brain to its rational self's values (understood extensionally i.e. cashing out the meaning of their mental concepts in terms of the world events they refer to)." Are you assuming that you have whole brain emulations of a few mature human beings? And then the "decision algorithms" and "rational... values" are defined in terms of how those emulations respond to various sequences of inputs?

PRINCIPLES OF PERCEPTION

June Ku6y30

I agree that there can be a skill involved in observation but isn’t there also a cost in attention and energy? In that case, it probably isn’t wise to try to observe anything and everything. Perhaps there are some principles for noticing when observation is likely to be worthwhile.

I also worry about generalizing too much from the example of fiction, which is often crafted to try to make nothing arbitrary. That property seems far less likely to apply to reality.

1Jason Ken6y

Quoting the article, you could probably find out it said, ‘observe everything and anything’, whilst excluding the definitive you pointed out, ‘all the time’. I of course excluded that point for a reason, though I did not point that out. I agree with you that it would be heavily exhausting, much like advising a singer to practice singing all the time. Observe everything and anything simply means a person doesn’t have to put a limit on what he/she should observe, not the length of time they should observe it. Your second argument was presented on my suggestion that a person should test out their observational skills on fictional work. It’s just my opinion, you might prove me wrong, someone else might prove me right, it’s all about opinions. But the book I recommended was beautiful, mysterious, and the clues while clear and logical, still proved largely unpredictable. It was one of those ‘we’re all thinking forward, because we believe it has to be forward, neglecting there’s really no law standing against it being backwards’ kind of logic.

Three Stories for How AGI Comes Before FAI

June Ku6y10

If you mean an AGI that optimizes for human values exactly as they currently are will be unaligned, you may have a point. But I think many of us are hoping to get it to optimize for an idealized version of human values.

Physical linguistics

June Ku6y10

Both eliminative materialism and reductionism can acknowledge that consciousness is not necessary for explanation and seek a physical explanation. But while eliminativists conclude that there is no such thing as consciousness, reductionists say we simply would have discovered that consciousness is different from what we might have initially thought and is a physical phenomenon. Is there a reason you favor the former?

One might think eliminativism is metaphysically simpler but reductionism doesn’t really posit more stuff, more like just allowing synonyms for... (read more)

1Yuxi_Liu6y

I am not fully committed to eliminative materialism, just trying to push it as far as possible, as I see it as the best chance at clarifying what consciousness does. As for the last paragraph, if your analysis is correct, then it just means that a classical hedonic utilitarian + eliminative materialist would be a rare occurrence in this world, since such agents are unlikely to behave in a way that keeps itself existing. If the project of eliminative materialism is fully finished, it would completely remove value judgments from human language. In the past, human languages refer to the values of many things, like the values of animals, plants, mountains, rivers, and some other things. This has progressively narrowed, and now in Western human language, only the values of biological neural networks that are carried in animal bodies are referred to. If this continues, this could lead to a language that does not refer to any value, but I don't know what it would be like. The Heptapod language seems to be value-free, and describes the past and the future in the same factual way. The human languages describes only the past factually, but the future valuefully. A value-free human language could be like the Heptapod language. In the story Story of Your Life, the human linguist protagonist who struggled to communicate with the Heptapods underwent a partial transformation of mind, and sometimes sees the past and future in the same descriptive, value-free way. She mated with her spouse and conceived a child, who she knew would die in an accident. She did it not because of a value calculation. An explanation of "why she did it" must instead be like * On a physical level, because of atoms and stuff. * On a conscious level, because that's the way the world is. To see the future and then "decide" whether to play it out or not, is not physically possible.

1TheWakalix6y

I don't think Occam's razor is the main justification for eliminativism. Instead, consider the allegory of the wiggin: if a category is not natural, useful, or predictive, then in common English we say that the category "isn't real".

LESSWRONG
LW

All of June Ku's Comments + Replies