Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: joaolkf 31 March 2015 09:38:18AM *  1 point [-]

In most scientific fields status is defined as access (or entitlement) to resources (i.e.: food and females, mostly). Period. And they tend to take this measure very seriously and stick to it (it has many advantages, easy to measure, evolutionary central, etc.). Both your definitions are only two accidental aspects of having status. Presumably, if you have - and in order to have - higher access to resources you have to be respected, liked, and have influence over your group. I think the definition is elegant exactly because all the things we perceive as status have as major consequences/goals higher access to resources.

Moreover, I don't think it is the case people can have warm fuzzies for everyone they meet. There's a limited amount of warm fuzzies to be spent. Of course, you can hack the warm-fuzzy system by using such and such body language, just like you could hack mating strategies using PUA techniques before everyone knew about it. But that's a zero-sum game.

Different people are comfortable with different levels of status; there are a lot of studies confirming that. If you put a regular gorilla as leader of a group of silverbacks he will freak out, because his trachea is most certainly to be lying on the floor in a few seconds. For very similar reasons, I will freak out if you give me a Jiu-Jitsu black belt and threw me into a dojo. This does not mean that same said regular gorilla will not fight with everything he has to achieve a higher status within certain safety boundaries. People are comfortable with different levels of status, and their current level is not one of them, nor is one too high to be safe. Nobody can be happy. That is the nature of status. (Also, there are limited resources - or so your brain thinks - so it is important to make other people miserable as well.)

Comment author: Kaj_Sotala 31 March 2015 01:20:46PM 0 points [-]

In most scientific fields status is defined as access (or entitlement) to resources (i.e.: food and females, mostly).

Which fields are these? This sounds to me a definition that could be useful in e.g. animal studies, but vastly insufficient when it comes to the complexities of status with regard to humans. E.g. according to this definition, an armed group such as occupiers or raiders who kept forcibly taking resources from the native population would be high status among the population, which seems clearly untrue.

Moreover, I don't think it is the case people can have warm fuzzies for everyone they meet. There's a limited amount of warm fuzzies to be spent. Of course, you can hack the warm-fuzzy system by using such and such body language, just like you could hack mating strategies using PUA techniques before everyone knew about it. But that's a zero-sum game.

What makes you say that?

[link] Thoughts on defining human preferences

1 Kaj_Sotala 31 March 2015 10:08AM


Abstract: Discussion of how we might want to define human preferences, particularly in the context of building an AI intended to learn and implement those preferences. Starts with actual arguments about the applicability of the VNM utility theorem, then towards the end gets into hypotheses that are less well defended but possibly more important. At the very end, suggests that current hypothesizing about AI safety might be overemphasizing “discovering our preferences” over “creating our preferences”.

Comment author: diegocaleiro 31 March 2015 06:30:10AM 4 points [-]

The technical academic term for (1) Is prestige and (2) Is Dominance. Papers which distinguish the two are actually really interesting.

Comment author: Kaj_Sotala 31 March 2015 07:55:32AM 1 point [-]

I second Creutzer's request for links to these papers.

Status - is it what we think it is?

9 Kaj_Sotala 30 March 2015 09:37PM

I was re-reading the chapter on status in Impro (excerpt), and I noticed that Johnstone seemed to be implying that different people are comfortable at different levels of status: some prefer being high status and others prefer being low status. I found this peculiar, because the prevailing notion in the rationalistsphere seems to be that everyone's constantly engaged in status games aiming to achieve higher status. I've even seen arguments to the effect that a true post-scarcity society is impossible, because status is zero-sum and there will always be people at the bottom of the status hierarchy.

But if some people preferred to have low status, this whole dilemma might be avoided, if a mix of statuses could be find that left everyone happy.

First question - is Johnstone's "status" talking about the same thing as our "status"? He famously claimed that "status is something you do, not something that you are", and that

I should really talk about dominance and submission, but I'd create a resistance. Students who will agree readily to raising or lowering their status may object if asked to 'dominate' or 'submit'.

Viewed via this lens, it makes sense that some people would prefer being in a low status role: if you try to take control of the group, you become subject to various status challenges, and may be held responsible for the decisions you make. It's often easier to remain low status and let others make the decisions.

But there's still something odd about saying that one would "prefer to be low status", at least in the sense in which we usually use the term. Intuitively, a person may be happy being low status in the sense of not being dominant, but most people are still likely to desire something that feels kind of like status in order to be happy. Something like respect, and the feeling that others like them. And a lot of the classical "status-seeking behaviors" seem to be about securing the respect of others. In that sense, there seems to be something intuitive true in the "everyone is engaged in status games and wants to be higher-status" claim.

So I think that there are two different things that we call "status" which are related, but worth distinguishing.

1) General respect and liking. This is "something you have", and is not inherently zero-sum. You can achieve it by doing things that are zero-sum, like being the best fan fiction writer in the country, but you can also do it by things like being considered generally friendly and pleasant to be around. One of the lessons that I picked up from The Charisma Myth was that you can be likable by just being interested in the other person and displaying body language that signals your interest in the other person.

Basically, this is "do other people get warm fuzzies from being around you / hearing about you / consuming your work", and is not zero-sum because e.g. two people who both have great social skills and show interest in you can both produce the same amount of warm fuzzies, independent of each other's existence.

But again, specific sources of this can be zero-sum: if you respect someone a lot for their art, but then run across into even better art and realize that the person you previously admired is pretty poor in comparison, that can reduce the respect you feel for them. It's just that there are also other sources of liking which aren't necessarily zero-sum.

2) Dominance and control of the group. It's inherently zero-sum because at most one person can have absolute say on the decisions of the group. This is "something you do": having the respect and liking of the people in the group (see above) makes it easier for you to assert dominance and makes the others more willing to let you do so, but you can also voluntarily abstain from using that power and leave the decisions to others. (Interestingly, in some cases this can even increase the extent to which you are liked, which translates to a further boost in the ability to control the group, if you so desired.)


Morendil and I previously suggested a definition of status as "the general purpose ability to influence a group", but I think that definition was somewhat off in conflating the two senses above.

I've always had the vague feeling that the "everyone can't always be happy because status is zero-sum" claim felt off in some sense that I was unable to properly articulate, but this seems to resolve the issue. If this model were true, it would also make me happy, because it would imply that we can avoid zero-sum status fights while still making everybody content.

Comment author: Mark_Friedenbach 22 March 2015 05:32:22PM *  2 points [-]

First of all, purposefully limiting scope to protecting against only the runaway superintelligence scenario is preventing a lot of good that could be done right now, and keeps your work from having practical applications it otherwise would have. For example, right now somewhere deep in Google and Facebook there are machine learning recommendation engines that are suggesting the display of whisky ads to alcoholics. Learning how to create even a simple recommendation engine whose output is constrained by the values of its creators would be a large step forward and would help society today. But I guess that's off-topic.

Second, even if you buy the argument that existential risk trumps all and we should ignore problems that could be solved today such as that recommendation engine example, it is demonstrably not the case in history that the fastest way to develop a solution is to ignore all practicalities and work from theory backwards. No, in almost every case what happens is the practical and the theoretical move forward hand in hand, with each informing progress in the other. You solve the recommendation engine example not because it has the most utilitarian direct outcomes, but because the theoretical and practical outcomes are more likely to be relevant to the larger problem than an ungrounded problem chosen by different means. And on the practical side, you will have engineers coming forward the beginnings of solutions -- "hey I've been working on feedback controls, and this particular setup seems to work very well in the standard problem sets..." In the real world theoreticians more often than not spend their time proving the correctness of the work of a technologist, and then leveraging that theory to improve upon it.

Third, there are specific concerns I have about the approach. Basically time spent now on unbounded AIXI constructs is probably completely wasted. Real AGIs don't have Solomonoff inductors or anything resembling them. Thinking that unbounded solutions could be modified to work on a real, computable superintelligence betrays a misunderstanding of the actual utility of AIXI. AIXI showed that all the complexity of AGI lies in the practicalities, because the pure uncomputable theory is dead simple but utterly divorced from practice. AIXI brought some respectability to the field by having some theoretical backing, even if that theory is presently worse than useless in as much as it is diverting otherwise intelligent people from making meaningful contributions.

Finally, there's the simple matter that an ignore-all-practicalities theory-first approach is useless until it nears completion. My current trajectory places the first AGI at 10 to 15 years out, and the first self-improving superintelligence shortly thereafter. Will MIRI have practical results in that time frame? The schedule is not going to stop and wait for perfection. So if you want to be relevant, then stay relevant.

Comment author: Kaj_Sotala 26 March 2015 07:59:14PM 0 points [-]

Basically time spent now on unbounded AIXI constructs is probably completely wasted. Real AGIs don't have Solomonoff inductors or anything resembling them.

I wouldn't say that the time studying AIXI-like models is completely wasted, even if real AGIs turned out to have very little to do with AIXI. Even if AIXI approximation isn't the way that actual AGI will be built, to the extent that the behavior of a rational agent resembles the model of AIXI, studying models of AIXI can still give hints of what need to be considered in AGI design. lukeprog and Bill Hibbard advanced this argument in Exploratory Engineering in AI:

...some experts think AIXI approximation isn’t a fruitful path toward human-level AI. Even if that’s true, AIXI is the first model of cross-domain intelligent behavior to be so completely and formally specified that we can use it to make formal arguments about the properties which would obtain in certain classes of hypothetical agents if we could build them today. Moreover, the formality of AIXI-like agents allows researchers to uncover potential safety problems with AI agents of increasingly general capability—problems which could be addressed by additional research, as happened in the field of computer security after Lampson’s article on the confinement problem.

AIXI-like agents model a critical property of future AI systems: that they will need to explore and learn models of the world. This distinguishes AIXI-like agents from current systems that use predefined world models, or learn parameters of predefined world models. Existing verification techniques for autonomous agents (Fisher, Dennis, and Webster 2013) apply only to particular systems, and to avoiding unwanted optima in specific utility functions. In contrast, the problems described below apply to broad classes of agents, such as those that seek to maximize rewards from the environment.

For example, in 2011 Mark Ring and Laurent Orseau analyzed some classes of AIXIlike agents to show that several kinds of advanced agents will maximize their rewards by taking direct control of their input stimuli (Ring and Orseau 2011). To understand what this means, recall the experiments of the 1950s in which rats could push a lever to activate a wire connected to the reward circuitry in their brains. The rats pressed the lever again and again, even to the exclusion of eating. Once the rats were given direct control of the input stimuli to their reward circuitry, they stopped bothering with more indirect ways of stimulating their reward circuitry, such as eating. Some humans also engage in this kind of “wireheading” behavior when they discover that they can directly modify the input stimuli to their brain’s reward circuitry by consuming addictive narcotics. What Ring and Orseau showed was that some classes of artificial agents will wirehead—that is, they will behave like drug addicts.

Fortunately, there may be some ways to avoid the problem. In their 2011 paper, Ring and Orseau showed that some types of agents will resist wireheading. And in 2012, Bill Hibbard (2012) showed that the wireheading problem can also be avoided if three conditions are met: (1) the agent has some foreknowledge of a stochastic environment, (2) the agent uses a utility function instead of a reward function, and (3) we define the agent’s utility function in terms of its internal mental model of the environment. Hibbard’s solution was inspired by thinking about how humans solve the wireheading problem: we can stimulate the reward circuitry in our brains with drugs, yet most of us avoid this temptation because our models of the world tell us that drug addiction will change our motives in ways that are bad according to our current preferences.

Relatedly, Daniel Dewey (2011) showed that in general, AIXI-like agents will locate and modify the parts of their environment that generate their rewards. For example, an agent dependent on rewards from human users will seek to replace those humans with a mechanism that gives rewards more reliably. As a potential solution to this problem, Dewey proposed a new class of agents called value learners, which can be designed to learn and satisfy any initially unknown preferences, so long as the agent’s designers provide it with an idea of what constitutes evidence about those preferences.

Comment author: Mark_Friedenbach 24 March 2015 11:31:34PM 1 point [-]

I have thought about the "create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time" route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!

That is absolutely not a route I would consider. If that's what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don't involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals.

SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw -- the better path forward is to not create non-human agents of any type, at all!

Comment author: Kaj_Sotala 26 March 2015 08:47:30AM 2 points [-]

How would you prevent others from building agent-type AIs, though?

Comment author: Kaj_Sotala 23 March 2015 07:02:09AM *  4 points [-]

I'm not sure the current implementation of the tiered privileges system is optimal. For instance, after a link I posted got two likes, it become visible to everyone, but it looks like my reply to your comment isn't visible if one isn't logged in. I think that once a non-member link meets the necessary threshold for becoming visible to everyone, the poster's replies to comments in the thread should become visible as well, or otherwise it's just confusing.

Also, I feel that if new contributor comments are hidden in general, that might be a little too discouraging for new people if those comments also need to acquire two member likes in order to become visible.

Comment author: Jayson_Virissimo 21 March 2015 12:21:39AM 1 point [-]

I think "pleasure" and "suffering" are very meaningful and that the prospects of finding decent metrics for each are good over the long term. The problem I have with hedonistic utilitarianism is that hedons are not what I want to maximize. Don't you ever pass up opportunities to do something you know will bring you more pleasure (even in the long run), in order to achieve some other value and don't regret doing so?

Comment author: Kaj_Sotala 21 March 2015 12:29:12AM 2 points [-]

Yeah, I've drifted away from hedonistic utilitarianism over time and don't particularly want to try to defend it here.

Comment author: DeVliegendeHollander 17 March 2015 10:31:46AM 1 point [-]

On AI: are we sure we are not influenced by meta-religious ideas of sci-fi writers who write about sufficiently advanced computers just "waking up into consciousness" i.e. create a hard, almost soul-like, barrier between conscious and not conscious, which carries an assumption that consciousness is a typically human-like feature? It is meta-religious as it is based on the unique specialness of the human soul.

I mean I think the potential variation space of intelligent, conscious agents is very, very large and a randomly selected AI will not be human-like in way we would recognize it to. We will not recognize its consciousness, we will not recognize its intelligence, even its agency, all we would see it does mysterious complicated stuff we don't understand. It may almost look random. It does stuff, maybe it communicates with us although the human-language words it uses will not reflects its thought processes, but it will be profoundly alien.

Comment author: Kaj_Sotala 20 March 2015 06:05:46PM 1 point [-]
Comment author: Meni_Rosenfeld 18 March 2015 06:31:36PM *  2 points [-]

I've written a post on my blog covering some aspects of AGI and FAI.

It probably has nothing new for most people here, but could still be interesting.

I'll be happy for feedback - in particular, I can't remember if my analogy with flight is something I came up with or heard here long ago. Will be happy to hear if it's novel, and if it's any good.

How many hardware engineers does it take to develop an artificial general intelligence?

Comment author: Kaj_Sotala 20 March 2015 09:29:51AM 0 points [-]

The flight analogy, or at least some variation of it, is pretty standard in my experience. (Incidentally, I heard a version of the analogy just recently, when I was reading through the slides of an old university course - see pages 15-19 here.)

View more: Next