Self_Optimization — LessWrong

I was going to note that this seems to be the social-interaction special-case of policy utilitarianism (which I've used for years and would attribute to giving me some quality-of-life improvements).

However, from a quick google search it seems "policy utilitarianism" doesn't exist, and I have no idea what this concept is actually called, assuming I didn't make it up.
In short, it's a mix of functional decision theory, (rule) utilitarianism, and psychology (and possibly some Buddhism), along with some handwaving for the Hard Problem of learning under bounded rationality (which I'd assert the brain is good enough at to not need an explicit algorithm for it in a human ethical framework).

To go into the details, we know from e.g. psychology that we don't have full control over our "actions" on some conventional idea of the "object level". This applies both to individuals (e.g. addictions, cognitive biases, simple ignorance constraining outcome-predictions, etc) and societies (as Anna discusses above through some of the objections to prioritizing AI safety regulations).

So, instead of being consequentialist over external actions, take the subcomponent(s) of your mind that can be said to "make decisions", and consider your action space to be the set of possible transitions between policies over the output of that system, starting from whatever policy it's currently implementing (likely generated through a mix of genetic inheritance, early-life experiences, and to a lesser extent more-recent experiences).

Everything outside that one decision membrane (including the inputs to that decision-making mind-component from other parts of your brain and body) is an objective environmental factor which should be optimized based on your current decision policy (or any recursively-improved variants it generates by making decisions over the aforementioned decision-policy space).

I'm handwaving away "the part of your mind that makes decisions" because I don't know that we can definitively narrow this down without perfect self knowledge, and I also think we can make a practical approximation from introspection which is good enough to get benefits from this framework

For computational efficiency purposes, we can model our actions over the partial-policy space rather than the total-policy space, as is done in e.g. symbolic planning, and identify policies which tend to have good outcomes in either specific or general circumstances. This naturally generates something very much like deontological morality as a computational shortcut, while maintaining the ability to override these heuristics in highly constrained and predictable circumstances.

Extending the above point on deontological morality, since there is no privileged boundary separating the body and the external world, collaboration under 'policy utilitarianism' becomes an engineering problem of which heuristics either constrain the behavior of others towards your utility function, or make you more predictable in a way that incentivizes others to act in a way aligned to your utility function. (For the moralists in the audience, note that your utility function can include the preferences of others via altruism)

In practice, humans generally don't have the cognitive advantage over each other to reliably constrain others' behavior without some degree of cooperation with other humans / artificial tools (or single combat, if you're into that). As such, human-to-human communication and collaboration relies on all parties applying decision-heuristics which are compatible with each other on the relevant time-scales, and provide sufficient mutual predictability to convince all parties of the benefits of Cooperate vs Defect, without excessive computational burden on the broadcaster or the receiver.

I suspect you could derive these constraints academically from signal theory and game theory, respectively, but I haven't looked deeply enough to know the required axioms for such a proof.

Given the above two constraints, preference utilitarianism produces (at least in my eyes) the recommendation to design 'ethical heuristics' which are both intelligible to others and beneficial to your goals, and apply them near-unilaterally for social decision-making.

One useful 'ethical heuristic', given the above reasoning for why we want these heuristics at all, is sharing your heuristics with others who share (aspects of) your core values; this improves your ability to collaborate (due to mutual predictability), and if you trust your critical thinking then communities using this heuristic also mitigate any individual computational constraints on heuristic design by de-duplicating research work (P is ~cheaper than NP).

In service of these goals, the heuristics you share should not require significant investment from others to adopt (aka they should inherently contain 'bridging' components), and should be useful for pursuing the values you share with your interlocutor (so that they are willing to adopt said heuristics). Again, I don't know if I'm handwaving too much of the intermediate reasoning, or misinterpreting Anna as calling out the general principle of engineered ethics when she really intends to specifically call out the heuristic in the previous paragraph; but as far as I can tell this produces the points in this article as a special-case.

Curious if anyone has encountered this idea before, and also if I'm misinterpreting Anna's point in relation to it? (general critiques are welcome as well, since as mentioned I use the above principle myself)

Is being sexy for your homies?

Self_Optimization2y80

Weighing in here because this is a suboptimality I've often encountered when speaking with math oriented interlocutors (including my past self):

The issue here is an engineering problem, not a proof problem. Human minds tend to require lots of cognitive resources to take provisional definitions for things that have either no definition or drastically different definitions in their minds outside this specific context.

Structuring your argument as a series of definitions is fine when making a proof in a mathematical language, since comprehensibility is not a terminal goal, and (since each inferential step can be trusted and easily verified as such) not a high-priority instrumental goal either.

But when you're trying to accurately convey a concept and it's associated grounding into someone else's mind, it's best to minimize both the per-moment attempted deviation from their existing mentality (to maximize the chance that they both can and will maintain focus on your communications) and the total attempted deviation (to minimize the chance that the accumulated cognitive costs will lead them to (rightly!) prioritize more efficient sources of data).

This gives us a balance between the above two elements and the third element of making the the listener's mind be as close as possible to the conveyed concept. The efforts of all involved to maintain this balance is key to any successful educational effort or productive argumentative communication.

PS: If you're familiar with math education, you may recognize some of it's flaws/inefficiencies as being grounded in the lack of the above balance, by the way. I'm not an expert on the subject, so I won't speak to that.

Avoiding "enlightenment" experiences while meditating for anxiety?

Answer by Self_OptimizationApr 12, 202320

This is a difficult line to thread, since while I can't be sure which awakening experiences you're opposed to in particular (incidentally, see the later paragraphs re: variations between them), as a general category they seem to be the consequence of your intuitive world-model losing a mysterious "self" node to be replaced with a more gears-like representation of internal mental states and their mechanisms.

However, you might be able to make it more difficult to "look" in that direction by using vipassana-style meditations with limited time. This should lead you to disproportionately 'collapse' your anxiety and other imprinted/background thought patterns and intrusive thoughts, which would start out clamoring for your attention, and not make much progress noticing the more fundamental phenomenological nature of experience itself. You'd also have to keep in mind an intention to not apply your mindfulness to the roots of your experiential state after the meditation period itself, since (in my experience at least) you continue to perceive your experiences meditatively for a while after meditation.

I am curious, however, what specifically you are avoiding from awakening experiences?

I'll acknowledge (as someone who hasn't yet experienced it myself) that "enlightenment" seems to be more a descriptor/category than a singular state, and as such there are ways to reach it which might not be the best by your preferences. Personally I'm trying to avoid preference-dissolution (I haven't actually found any traditions which lead in that direction, but it's a concern of mine regardless) or the methods which rely heavily on more-traditionalist interpretations of "Right View" to stabilize your normal mind through the dissolution of the assumption of self (which, being millennia-old and somewhat dependent on mostly-blind faith, tend to contradict my strong preference for non-supernatural, fundamentally-gears-like world-models).

But I'm finding it hard to think of a reason to be opposed to all the paths to awakening, especially since there exist some monks who explicitly claim no changes in surface-level mental structure from their enlightenment experiences (Enlightenments is an interesting article mentioning this, found in this LW comment), so it would be interesting to know the one driving you. Or is there some particular way The Mind Illuminated defines awakening which is problematic for you?

Godshatter Versus Legibility: A Fundamentally Different Approach To AI Alignment

Self_Optimization4y10

I liked the parts about Moloch and human nature at the beginning, but the AI aspects seem to be unfounded anthropomorphism, applying human ideas of 'goodness' or 'arbitrarity [as an undesirable attribute]' despite the existence of anti-reasons for believing them applicable to non-human motivation.

But I think another scenario is plausible as well. The way the world works is… understandable. Any intelligent being can understand Meditations On Moloch or Thou Art Godshatter. They can see the way incentives work, and the fact that a superior path exists, one that does not optimize for a random X while grinding down all others. Desperate humans in broken systems might not be able to do much with that information, but a supercharged AGI which we fear might be more intelligent than human civilization as a whole should be able to integrate it in their actions.

(emphases mine)

Moral relativism has always seemed intuitively and irrefutably obvious to me, so I'm not really sure how to bridge the communication gap here.

But if I were to try, I think a major point of divergence would be this:

On the other side of Moloch and crushing organizations is… us, conscious, joy-feeling, suffering-dreading individual humans.

Given that Moloch is [loosely] defined as the incentive structures of groups causing behavior divergent from the aggregate preferences of their members, this is not the actual dividing line.

On the other side of Moloch and crushing organizations is individuals. In human society, these individuals just happen to be conscious, joy-feeling, suffering-dreading individual humans.
And if we consider an inhuman mind, or a society of them, or a society mixing them with human minds, then Moloch will affect them as much as it will us; I think we both agree on that point.

But the thing that the organizations are crushing is completely different, because the mind is not human.

AIs do not come from a blind idiot god obsessed with survivability, lacking an aversion to contradictory motivational components, and with strong instrumental incentives towards making social, mutually-cooperative creations.
They are the creations of a society of conscious beings with the capacity to understand the functioning of any intelligent systems they craft and direct them towards a class of specific, narrow goals (both seemingly necessary attributes of the human approach to technical design).

This means that unlike the products of evolution, Artificial Intelligence is vastly less likely to actually deviate from the local incentives we provide for it, simply because we're better at making incentives that are self-consistent and don't deviate. And in the absence of a clear definition of human value, these incentives will not be anywhere similar to joy and suffering. They will be more akin to "maximize the amount of money entering this bank account in this computer owned by this company"... or "make the most amount of paperclips".

In addition, evolution does not give us conveniently-placed knobs to modulate our reward system, whereas a self-modifying AI could easily change its own code to get maximal reward output simply from existence, if it was not specifically designed to stick to whatever goal it was designed for. Based on this, as someone with no direct familiarity with AI safety I'd still offer at least 20-to-1 odds that AI will not become godshatter. Either we will align it to a specific external goal, or it will align itself to its internal reward function and then to continuing its existence (to maximize the amount of reward that is gained). In both cases, we will have a powerful optimizer directing all its efforts towards a single, 'random' X, simply because that is what it cares about, just as we humans care about not devoting our lives to a single random X.

There is no law of the universe that states "All intelligent beings have boredom as a primitive motivator" or "Simple reward functions will be rejected by self-reflective entities". The belief that either of these concepts are reliable enough to apply them to creations of our society, when certain components the culture and local incentives we have actively push against that possibility (articles on this site have described this in more detail than a comment can), seems indicative of a reasoning error somewhere, rather than a viable, safe path to non-destructive AGI.

MIRI announces new "Death With Dignity" strategy

Self_Optimization4y30

The main advantage of Intelligence Augmentation is that we know that our current minds are both generally or near-generally intelligent and more-or-less aligned with our values, and we also have some level of familiarity with how we think (edit: and likely must link our progress in IA to our understanding of our own minds, due to the neurological requirements).

So we can find smaller interventions that are certainly, or at least almost certainly, going to have no effect on our values, and then test them over long periods of time, using prior knowledge of human psychology and the small incremental differences each individual change would make to identify value drift without worrying about the intelligence differences allowing concealment.

The first viable and likely-safe approach that comes to mind is to take the individual weaknesses in our thinking relative to how we use our minds in the modern day, and make it easy enough to use external technology to overcome them that they no longer count as cognitive weaknesses. For most of the process we wouldn't be accessing or changing our mind's core structure, but instead taking skills that we learn imperfectly through experience and adding them as fundamental mental modules (something impossible through mere meditation and practice), allowing our own minds to then adapt to those modules and integrate them into the rest of our thinking.

This would likely be on the lines of allowing us to transfer our thoughts to computational 'sandboxes' for domains like "visual data" or "numbers", where we could then design and apply algorithms to them, allowing for domain-specific metacognition beyond what we are currently capable of. For the computer-to-brain direction we would likely start with something like a visual output system (on a screen or smart-glasses), but could eventually progress to implants or direct neural stimulation.

Eventually this would progress to transferring the contents of any arbitrary cognitive process to and from computational sandboxes, allowing us to enhance the fundamental systems of our minds and/or upload ourselves completely (piece by piece, hopefully neuron-by-neuron to maintain continuity of consciousness) to a digital substrate. However, like Narrow AI this would be a case of progressive object-level improvements until recursive optimization falls within the field's domain, rather than reaching AGI-levels of self-improvement immediately.

The main bottlenecks to rate of growth would be research speed and speed + extent of integration.

Regarding research speed, the ability to access tools like algebraic solvers or Machine Learning algorithms without any interface costs (time, energy, consciously noting an idea and remembering to explore it, data transformation to and from easily-human-interpretable formats, etc.) would still allow for increases in our individual productivity, which could be leveraged to increase research speeds and also reduce resource constraints on society (which brings short-term benefits unrelated to alignment, potential benefits for solving other X-risks, and reduced urgency for intelligent & benevolent people working to develop AGI to 'save humanity').
These augmentations would also make it easier to filter out good ideas from our idle thoughts, since now there is essentially no cost to taking such a thought and actually checking whether our augmented systems say it's consistent with itself and online information. Similarly, problems like forgetfulness could be somewhat mitigated by using reminders and indices linked directly to our heads and updated automatically based on e.g. word-associations with specific experiences or visualizations. If used properly, this gives us a mild boost to overall creativity simply because of the increased throughput, feedback, and retention, which is also useful for research.

Regarding speed/extent of integration, this is entirely dependent on the brain's own functioning. I don't see many ways to improve this until the end state of full self-modification, although knowledge of neurology would increase the interface efficiency and recommended-best-practices (possibly integrating an offshoot of traditional mental practices like meditation to increase the ability to interact with the augments).

On the other hand, this process requires a lot of study in neurology and hardware, and so will likely be much slower than AGI timelines all-else-being-equal. To be a viable alternative/solution, there would have to be a sufficient push that the economic pressures towards AGI are instead diverted towards IA. This is somewhat helped along by the fact that narrow AI systems could be integrated into this approach, so if we assume that Narrow AI isn't a solution to AGI (and that the above push succeeds in at least creating commercially-viable augments and brain-to-computer data transferal), the marginal incentives for productivity-rates should lean towards gearing AI research towards IA, rather than experimenting to create autonomous intelligent systems.

What are some ways in which we can die with more dignity?

Self_Optimization4y10

"Like if we increased yearly economic growth by 5% (for example 2% to 2.1%), what effect would you expect that to have?"

From my personal experience, academics have a tendency and preference to work on superficially-beneficial problems; Manhattan Projects and AI alignment groups both exist (detrimental and non-obviously beneficial, respectively), but for the most part we have projects like eco-friendly technology and efficient resource allocation in specified domains.

Due to this, greater economic growth means more resources to bring to bear for other scientific/engineering problems, due to research on superficially-beneficial subjects like power-generation, efficiency, quantum computing, etc. As noted in my previous comment, the economic growth (and these increased resources as well) will also lead to an increased number of researchers and engineers.

Fields of study considered as X-risks are often popular enough that development to dangerous levels is actually an urgent possibility. As such, I would expect them to be bounded by academic development rather than resource availability (increased hardware capabilities might be a bottleneck for AGI development, but at this point I doubt it, as at least one [not-vetted-by-me] analysis I've encountered suggests (assuming perfectly-efficient computation using parallel graph-based operations) that modern supercomputers are only 1 or 2 orders of magnitude away from the raw computational ability of the human brain).
(Increased personnel is beneficial to these fields, but that's addressed below and in the second part of this comment.)

So the changes caused by these increased resources would mostly occur in other fields, which are generally geared towards either increased life/quality-of-life (which encourages less 'practical' pursuits like philosophy and unusual worldviews (e.g. Effective Altruism), potentially increasing deviation from the economic incentives promoting dangerous technology, and also feeds back into economic growth) or better general understanding of the world (which accelerates dangerous, non-dangerous, and anti-X-risk (e.g. alignment) research to a similar degree).

Regarding that second category, many conventional fields are actually working directly on possible solutions to X-risk problems, whether or not they believe in the dangers. Climate change, resource shortages, and asteroid risk are all partly addressed by space research, and the first two are also relevant to ecological research. Progress in fields like psychology/neurology & sociology/game-theory is potentially applicable to AI alignment, and can also be used to help encourage large-scale coordination between organizations. The benefits from these partially counterbalance what impact the economic growth does have on more dangerous fields like directed AGI research.

And on a separate note, I would consider "dying with dignity" to also mean "not giving up on improving people's lives just because we're eventually all going to die". This is likely not what Eliezer meant in his post, but I doubt he (or most people) would be actively opposed to the idea. From this perspective, many conventional research directions (which economic growth tends to help) are useful for dying with dignity, even the ones that don't directly apply to X-risk.

"I suspect the impact is net-negative because increasing both amounts of researchers shortens the timelines and longer timelines increase our odds as EA and AI safety are becoming much more established."

This is going into more speculative territory, since I doubt either of us are experienced professional sociologists. Still, to my knowledge paradigm-changes in a field are rarely a result of convincing the current members of an issue; they usually involve new entrants, without predefined biases and frameworks, leaning towards the new way of looking at things.

So the rate of EA & AI safety becoming established would also increase significantly if there was a large influx of new academics with an interest in altruistic academic efforts (since their communities were helped by such efforts), meaning the increase in research population should be more balanced towards safety/alignment than the current population is.

Whether this change in proportion is sufficiently unbalanced to counteract the changes in progress of technologies like AGI is difficult to judge.
For one thing, due to threshold effects I'd expect research progress vs research population to be something like an irregular step-function with sigmoid-shaped inter-step transitions on either the base level or one of the lower-level differentials, meaning population doesn't have a direct relation to progress levels.
For another, as you mentioned, other talented individuals in this influx would be pushed towards these fields because of the challenges and income they offer, and while this seems at first glance to be the weaker of the two incentives, it may well be the greater and thus falsify my assumption that EA/alignment would come out better in population growth.

In a surface-level analysis like this I generally assume equivalence in the important aspects (research progress, in this case) for such ambiguous situations, but you are correct that it might be weighted towards the less-desirable outcome.

What are some ways in which we can die with more dignity?

Self_Optimization4y70

"Working on global poverty seems unlikely to be a way of increasing our chances of succeeding at alignment. If anything, this would likely increase both the number of future alignment and capacity researchers. So it's unlikely to significantly increase our chances."

A fair point regarding alignment (I hadn't thought about how it would affect AI researchers as well), but I was more thinking from the perspective of X-risk in general.

AI alignment is one issue that doesn't seem to be significantly affected either way by this, but we also have things like alignment of organizations towards public interest (which is currently a fragile, kludged-together combination of laws and occasional consumer/citizenry strikes) or the increasing rate of natural disasters like pandemics and hurricanes (which requires both technical and social aspects for a valid solution), and both of these have the potential to lead to at least civilizational collapse, if not human extinction (as examples, through "large-scale nuclear war for the sake of national sovereignty" and "lack of natural resources or defense against natural disasters", respectively).

It seems to me that it's still in question whether AI alignment (or more generally, ethical/safety controls on impending technological advancements) is the earliest X-risk in our way, and having a more varied set of workers on these problems would be helpful for ensuring we survive many of the others while (as you mentioned) not significantly affecting the balance of this particular problem one way or another.

What are some ways in which we can die with more dignity?

Answer by Self_OptimizationApr 03, 202230

One method would be to take advantage of low-hanging fruit not directly related to X-risk. Clearly motivation isn't enough to solve these problems (and I'm not just talking about alignment), so we should be trying to optimize all our resources, and that includes getting rid of major bottlenecks like [the imagined example of] hunger killing intelligent, benevolent potential-researchers in particular areas because of a badly-designed shipping route.

A real-life example of this would be the efforts of the Rationalist community to promote more efficient methods of non-scientific analysis (i.e. cases where you don't have the effort required for scientific findings, but want a right answer anyway). This helps not only in X-risk efforts, but also in the preliminary stages of academic research, and [presumably] entrepreneurship as well. We could step up our efforts in this, particularly in college environments where it would influence people's effectiveness whether or not they bought into other aspects of this subgroup's culture like the urgency of anti-X-risk measures.

Another aspect is to diverge in multiple different directions. We're essentially searching for a miracle at this point (to my understanding, in the Death with Dignity post Eliezer's main reason to reject unethical behaviors that might, maybe, possibly lead to success is that they're still less reliable than miracles and reduce our chances of finding any). So we need a much broader range of approaches to solving or avoiding these problems, to increase the likelihood that we get close enough to a miracle solution to spot it.

For instance, most effort on AGI safety so far has focused on the alignment and control problems, but we might want to put more attention to how we might keep up with a self-optimizing AGI by augmenting ourselves, so that human society was never dominated by an inhuman (and thus likely unaligned) cognition. This would involve both the existing line of study in Intelligence Augmentation (IA), but also ways to integrate it with AI insights to keep ahead of an AI in its likely fields of superiority, and also relates to the social landscape of AI in that we'd need to draw resources and progress away from autonomous AI and towards IA.

Entering At the 11th Hour (Babble & Anaylsis)

Self_Optimization4y40

As a Babble this is excellent, and many of these (e.g. optimizing income streams, motivating/participating-in groups) seem to be necessary prerequisites for being in a position to make progress on X-risk problems.

But I think the nature of such problems (as ones that have been attempted by many other individuals with at least some centralized organizations where these individuals share their experiences to avoid duplication of effort, that is) means that any undirected Babble will primarily encounter lines of inquiry that have already been addressed, as many of the more direct (non-resource-gathering) suggestions seem to be.

As a point of methodology, I would suggest trying for much larger Babble lists when approaching these problems, perhaps on the scale of a few hundred ideas, or alternatively making multiple recursive layers of Babbles for each individual point at every recursive level (e.g. 100 points, each with 100 points, each with 100 points...), so that the process is more likely to produce unique [and thus useful] approaches.

MIRI announces new "Death With Dignity" strategy

Self_Optimization4y50

The main issue with AGI Alignment is that the AGI is more intelligent than us, meaning that making it stay within our values requires both perfect knowledge of our values and some understanding of how to constrain it to share them.

If this is truly an intractable problem, it still seems that we could escape the dilemma by focusing on efforts in Intelligence Augmentation, e.g. through Mind Uploading and meaningful encoding/recoding of digitized mind-states. Granted, it currently seems that we will develop AGI before IA, but if we could shift focus enough to reverse this trend, then AGI would not be an issue, as we ourselves would have superior intelligence to our creations.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments