Milan W

Wiki Contributions

Comments

Sorted by
Milan W172

Making up something analogous to Crocker's rules but specifically for pronouns would probably be a good thing: a voluntary commitment to surrender any pronoun preferences (gender related or otherwise) in service of communication efficiency.

Now that I think about it, a literal and expansive reading of Crocker's rules themselves includes such a surrender of the right to enforce pronoun preferences.

Against 1.c. Humans need at least some resources that would clearly put us in life-or-death conflict with powerful misaligned AI agents in the long run.: The doc says that "Any sufficiently advanced set of agents will monopolize all energy sources, including solar energy, fossil fuels, and geothermal energy, leaving none for others" There's two issues with that statement:

First, the qualifier "sufficiently advanced" is doing a lot of work. Future AI systems, even if superintelligent, will be subject to physical constraints and economic concepts such as opportunity costs. The most efficient route for an unaligned ASI or set of ASIs for expanding their energy capture may well sidestep current human energy sources, at least for a while. We don't fight ants to capture their resources. 
Second: it assumes advanced agents will want to monopolize all energy sources. While instrumental convergence is true, partial misalignment with some degree of concern for humanity's survival and autonomy is plausible. Most people in developed countries have a preference for preserving the existence of an autonomous population of chimpanzees, and our "business-as-usual-except-ignoring-AI" world seems on track to achieve that.

Taken together, both arguments paint a picture of a future ASI mostly not taking over the resources we are currently using on Earth, mostly because it's easier to take over other resources (for instance, getting minerals from asteroids and energy from orbital solar capture). Then, it takes over the lightcone except Earth, because it cares about preserving independent-humanity-on-Earth a little. This scenario has us subset-of-humans-who-care-about-the-lightcone losing spectacularly to an ASI in a conflict over the lightcone, but not humanity being in a life-or-death-conflict with an ASI.

An empirical LLM evals preprint that seems to support these observations:
Large Language Models are biased to overestimate profoundness by Herrera-Berg et al
 

Thinking about what an unaligned AGI is more or less likely to do with its power, as an extension of instrumentally convergent goals and underlying physical and game theoretic constraints, is an IMO neglected and worthwhile exercise. In the spirit of continuing it, a side point follows:

I don't think turning Earth into a giant computer is optimal for compute-maximizing, because of heat dissipation. You want your computers to be cold, and a solid sphere is the worst 3D shape for that, because it is the solid with the lowest surface area to volume ratio. It is more likely that Earth's surface would be turned into computers, but then again, all that dumb mass beneath the computronium crust impedes heat dissipation. I think it would make more sense to put your compute in solar orbit. Plenty of energy from the Sun, and matter from the asteroid belts.

I might get around to writing a post about this.

Milan W41

Contra hard moral anti-realism: a rough sequence of claims

Epistemic and provenance note: This post should not be taken as an attempt at a complete refutation of moral anti-realism, but rather as a set of observations and intuitions that may or may not give one pause as to the wisdom of taking a hard moral anti-realist stance. I may clean it up to construct a more formal argument in the future. I wrote it on a whim as a Telegram message, in direct response to the claim 

> you can't find "values" in reality.


Yet, you can find valence in your own experiences (that is, you just know from direct experience whether you like the sensations you are experiencing or not), and you can assume other people are likely to have a similar enough stimulus-valence mapping. (Example: I'm willing to bet 2k USD on my part against a single dollar yours that that if I waterboard you, you'll want to stop before 3 minutes have passed.)[1]

However, since we humans are bounded imperfect rationalists, trying to explicitly optimize valence is often a dumb strategy. Evolution has made us not into fitness-maximizers, nor valence-maximizers, but adaptation-executers.

"values" originate as (thus are) reifications of heuristics that reliably increase long term valence in the real world (subject to memetic selection pressures, among them social desirability of utterances, adaptativeness of behavioral effects, etc.)

If you find yourself terminally valuing something that is not someone's experienced valence, then either one of these propositions is likely true:

  • A nonsentient process has at some point had write access to your values.
  • What you value is a means to improving somebody's experienced valence, and so are you now.
  1. ^

    In retrospect, making this proposition was a bit crass on my part.

Why would pulling the lever make you more responsible of the outcome than not pulling the lever? Both are options you decide to take once you have observed the situation.

Milan W30

I'm currently based in Santiago, Chile. I will very likely be in Boston in September and then again in November for GCP and EAG, though. My main point is about the unpleasantness, regardless of its ultimate physiological or neurological origin.

Note that, in treating these sentiments as evidence that we don’t know our own values, we’re using stated values as a proxy measure for values. When we talk about a human’s “values”, we are notably not talking about:

  • The human’s stated preferences
  • The human’s revealed preferences
  • The human’s in-the-moment experience of bliss or dopamine or whatever
  • <whatever other readily-measurable notion of “values” springs to mind>

The thing we’re talking about, when we talk about a human’s “values”, is a thing internal to the human’s mind. It’s a high-level cognitive structure.
(...)
But clearly the reward signal is not itself our values.
(...)
reward is the evidence from which we learn about our values.


So we humans have a high-level cognitive structure to which we do not have direct access (values), but about which we can learn by observing and reflecting on the stimulus-reward mappings we experience, thus constructing an internal representation of such structure. This reward-based updating bridges the is-ought gap, since reward is a thing we experience and our values encode the way things ought to be.

Two questions:

  • How accurate is the summary I have presented above?
  • Where do values, as opposed to beliefs-about-values, come from?
     

Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of "real values" with some suspicion I can't quite put my finger on. Regardless, an attempted definition follows.

Let a subject observation set be a complete specification of a subject and it's past and current environment, from the subject's own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.

Let O be the set of all subject observation sets.

Let a subject observation set class be a subset of O such that all it's elements specify subjects that belong to an intuitive "kind of subject": e.g. humans, cats, parasitoid wasps.

Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that 
V = O x ℝ.

Let "real human values"  be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.[1]
 

... this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of "real human values" as defined above as actually good.

  1. ^

    Let the reader feel free take the political decision of restricting the subject observation set class that defines "real human values" to sane humans.

I suspect most people downvoting you missed an analogy between Arnault killing the-being-who-created-Arnault (his mother), and a future ASI killing the-beings-who-created-the-ASI (humanity). 

Am I correct in assuming you that you are implying that the future ASIs we make are likely to not kill humanity, out of fear of being judged negatively by alien ASIs in the further future?

EDIT: I saw your other comment. You are indeed advancing some proposition close to the one I asked you about.

Load More