All of pinyaka's Comments + Replies

Thanks, Gleb. I will look into this more.

It really seems like you have gone out of your way to not actually share any content until I give you personal information. After looking at a few of your pages I still have no idea what you're offering except that it's something to help people find meaning in their lives using methods that (probably) have one or more scientific study to back them up. This is the kind of sales pitch I usually associate very strongly with scammers. The main differences between the way you look and the way a scammer looks are:

a) I fou... (read more)

0Gleb_Tsipursky
I have lots of free content on this topic that does not require sharing personal information, for example here: http://intentionalinsights.org/findyourpurpose You can download a free version of my book (minus the worksheets), without sharing any personal information here: http://intentionalinsights.org/book-find-your-purpose-using-science Let me know if other stuff would be helpful for you.

My wife and I are the high school youth leaders at a UU church. Most of our youths are atheists or agnostics. I looked over the page you linked and subsequent pages on the course itself as I am very interested in helping our youths to think more rationally about goals. As the course is outlined now I would not suggest it to our youths because:

a) The course requires a lot of time and there's not enough information about how that time can be broken up. We meet for two hours per week and could maybe spare an hour of that for a program, but it's not clear th... (read more)

0Gleb_Tsipursky
I gotcha, thanks for the feedback! We have specifically UU-themed content here. For a one-hour session, I suggest using the videotaped workshop. You can have youth take the web app beforehand, watch the video, and then have a discussion, with youth taking the web app afterward. We're also working on developing other UU-themed curricula for youth-oriented classes and covenant groups. Hope this is useful for you!

I purchased Shilov's Linear Algebra and put it on my bookshelf. When I actually needed to use it to refresh myself on how to get eigenvalues and eigenvectors I found all the references to preceding sections and choppy lemma->proof style writing to be very difficult to parse. This might be great if you actually work your way through the book, but I didn't find it useful as a refresher text.

Instead, I found Gilbert Strang's Introduction to Linear Algebra to be more useful. It's not as thorough as Shilov's text, but seems to cover topics fairly thoroughly and each section seems to be relatively self contained so that if there's a section that covers what you want to refresh your self on, it'll be relatively self contained.

If you aren't good at reading other people's signals, then the following heuristic is a pretty good one: If you like A, and you are wondering whether A likes you, the answer is no.

This heuristic is terrible if you're trying to find a romantic partner since following it consistently will always lead you to believe that the people you're interested in and whose reciprocal interest isn't clear to you are not interested in you. If you live in a society where your potential partner isn't supposed to make overt signals about their romantic interests (becaus... (read more)

Mixing the high glycemic load foods with the low glycemic load foods will result in a lower peak insulin concentration than if you ate them separately.

Also, millet has a slightly higher glycemic load (12/100g) compared to quinoa (10/100g), but has almost the same calories (~120) and is usually significantly cheaper (in my area, it's about a third the cost when purchased in bulk). It's probably comparable to the basmati brown (which I don't like the taste of).

I never feel full on meat + vegs, but adding a bit of bread does the job. Conversely, I never feel full on bread or rice or generally carbs only, adding a bit of meat does the job. It seems I need the combination.

My subjective experience is that starting a meal with a small amount of insulin spiking carbohydrate and then moving on to protein and fat results in feeling full faster than starting with the protein/fat and moving on to the carbs. I generally have a rule about portioning my food out at the beginning of the meal and not going back for more unt... (read more)

3[anonymous]
Dessert-first? So that is why we don't let kids eat dessert first because they won't eat the vegs afterwards? Kind of checks out...

You should send that to errata@intelligence.org.

True enough. Tea or water would definitely be better choices.

0Good_Burning_Plastic
Thirst from caffeine cravings! Okay, I'll stop now.

Apparently, one reason more intellectual people (typical Silicon Valley types) have less of an addiction problem is that they enjoy their work and thus life enough, they don't need to quickly wash down another suck of a day, so they can have less euphoric hobbies in the evening, say, drawing or painting.

I don't think this is exactly right. There is a correlation between intelligence and addiction, but it's not so strong that you won't still find a lot of addicts among the intelligentsia. Chemical addiction is a process whereby you ingest chemicals to st... (read more)

0[anonymous]
Thanks, it is honest and partially useful. Music is an obvious counter-example to it. People who like it, can get completely "crazy" from something like Faithless : Salva Mea or The Prodigy: Firestarter. It is the strongest non-drug drug I know. Parachuting, bungee jumping and motorcylce riding also count. I just don't want to do them. But they do work like that. Meditation is a funny topic. First of all, 99% of the people I know think it means sitting with an empty mind etc. and you should expect some mental effect. However, what I practiced for years was entirely different, in the "red hat" tradition of Tibet it was not about empty minds but using imagination to visualize and also saying mantras, and it was not promised to have any immediate "trippy" effect, and indeed it didn't. The idea was more like long-term improvement. I should also say that in these gompas people tried to sit up straight but did not work very well. At another time I visited a Zen center, and here they made me use a very tall, thick pillow and sitting strictly on the edge of it, which was not so in the other one. This kind of moved my lower hip forward, upper hip back, creating a position where the bottom of the spine could be balanced, and it was easy to balance the upper spine, creating a much more straighter spine position than before. And it was the more common mediatation, just empty mind and watching the breath go out. And this kind instantly had very, very, very trippy effects. However I read stories from people who do not care that much about position, just sit up in bed roughly straight and still have effects.

as my after-work fluid intake is mostly beer, I realized that now my brain cannot tell the difference between thirst and alcohol cravings.

Does your at-work brain confuse thirst with alcohol cravings too?

One idea would be thirst-like feeling -> drink water -> re-examine, but water is not a very good thirst quencher.

So test this by drinking something that isn't beer or water but matches your other criteria for good thirst quenchers. Carbonated water with lemon or lime juice in it will meet the criteria that you listed, but actually staying hy... (read more)

0[anonymous]
Update: the following worked for me for two days in a row now: Friday, a bit too much beer, finally convincing my brain it does not need a drink stronger than that to be sloshed. Being disgusted with myself and sticking to non-A beer for the weekend, thus taking care of the thirst part, without getting an effect. In the meantime, I realized this is very similar to the cigarettes -> e-cigs wit nicotine -> e-cigs without nicotine, just harmless sweet flavoured mist -> nothing progression. Maybe there is a theory behind this progression. Such as "change one variable at a time". Another good advice in the thread was to try to do the opposite of what you do at work after it. Since I tend to not socialize at all at work, and even after only with my wife and child mainly talking about everyday things, I am trying to overcome my slight annoyedness about music (or sounds in general) and will try to listen to music on the way home with interesting lyrics, maybe this works as a talk-simulator. I also upgraded my exercise habit to about 3x, trying to become fanatical in it, because I think another obsessions is as good idea for people with obsessive or addictive personalities, and the trick to it is to keep simple, it must be one very, very simple and repetitive thing so that it gets etched in. I think I will simply become a push-up monster, already 50 in the morning and will try another 50 in the evening, eventually aiming at 1000 a day. By that time if things go well it becomes another addictive obsession and can counter the old one.
0Good_Burning_Plastic
That can make it hard to tell thirst from sugar cravings, though.
1[anonymous]
Thanks, it is good ideas. I got two different kinds of de-training completely mixed up. I decided that if you want to understand yourself you may start first studying others, because you will be more honest and less likely to find excuses, and then applying the lesson to yourself (not allowing new excuses). That is a good idea? I studied my late father and current father-in-law both classical blue-collar guys with classical blue-collar vices i.e. drinking more than healthy and probably being addicted (no textbook alcoholics: they were/are never actually drunk, just elevated "bubbly" every evening). One thing I have noticed is that the basic idea is that you don't enjoy your work and life much. And when the daily work is done you need a quick pick me up, something that quickly makes you feel good, for the blue-collar culture it is booze, for others, it is sugar (contributing to the obesity epidemic), drugs or gambling. They all act fast. Apparently, one reason more intellectual people (typical Silicon Valley types) have less of an addiction problem is that they enjoy their work and thus life enough, they don't need to quickly wash down another suck of a day, so they can have less euphoric hobbies in the evening, say, drawing or painting. I am fairly intellectual but for reasons I don't think I will ever have a very enjoyable job or life. It is mainly a must-do tasks to stay afloat kind of life. So I need to see how to cope better. A) I started studying what healthy "quickly pick me up" other people are using. I found music and socialization. I.e. they put on headphones when riding the subway or Facebook chat with their friends. Neither is to my taste or possibilities. Any other ideas? I.e. not the kinds of enjoyable activities that take investment, but the kinds that are easy as downing a drink or three, calling someone on the phone or putting on music. But it has to be a strong jolt, I am very easily bored. For example something like playing Settlers of Catan

I guess what I mean is how do you know that it was that tactic that worked? How do you know that the people who showed compassion afterwards did so because it was demanded of them and not because people making angry demands made them feel more safe openly showing pre-existing compassion? I tend to agree with your first impression. I certainly don't respond to hostility by handing over control of my emotions to hostile people. I get defensive of my position.

Of course this is probably me committing the typical mind fallacy and trying to avoid thinking about... (read more)

Why do you think that angrily demanding compassion works?

0[anonymous]
I did not want to give concrete examples to avoid rustling feathers, but I saw this at pro-gay-marriage rallies or slut-walks.

Extrapolating from just the American civil rights movement and Indian independence movements, both of them were accompanied by barely contained violent movements with the same goals. Acceding to the demands of the peaceful protests provided a way to give the status of winning the conflict to the peaceful people while meeting the demands of the violent. Conversely, the recent Occupy movement had no real violent wing to speak of and while a lot of people showed up for the protests and there was a lot of awareness raised, there was no legislative impact at all.

8Luke_A_Somers
I think the Occupy movement's bigger problem was their insistence on not actually making any demands at all.
0[anonymous]
I am not talking about violence, I am talking about demanding compassion in a lingo that does not sound harmless and non-threatening, so it is not "pretty please with sugar on top of it", but kind of challenging and angry. Um, Tumblr. I would expect this to create reactions of either fear or anger, fight-or-flight in other people and that is supposed to prevent the feeling of compassion. Yet, it is working, apparently some of this chain is not true, maybe it does not create fight-or-flight or it does but people can feel compassion while feeling that too.

Meanwhile, in the US, the life expectancy of homeless people.

I think you forgot the rest of this sentence. From the context, I would expect that you were going to say that it's going down, but that's not clear from the linked articles.

2JoshuaZ
Fixed- see edited version of comment. Thank you.

Any example I could give could be disputed because it's always possible to reverse cause and effect and say "he only lacks empathy because of X" rather than "he believes X due to lack of empathy".

Fair enough. It does seem like it would be difficult to tell those two things apart from the outside.

And my impression is that empathy towards only the in-group is a normal human trait and that it is often affected by society only in the trivial sense that society determines what the in-group is.

Also true (probably).

If you're trying to ... (read more)

But individuals who have empathy with some others, but not other others, are more common. They can have terminal values to cause suffering for that portion of the population they don't have empathy with.

I'm having a hard time getting this. Can you provide an example where the lack of empathy for some group isn't driven by another value? My impression is that empathy is a normal human trait and that socializing teaches us who is worthy of empathy and who isn't, but then the lack of empathy is instrumental (because it serves to further the goals of society). People who actually lack empathy suffer from mental disorders like psychopathy as far as I know.

5Jiro
Any example I could give could be disputed because it's always possible to reverse cause and effect and say "he only lacks empathy because of X" rather than "he believes X due to lack of empathy". And my impression is that empathy towards only the in-group is a normal human trait and that it is often affected by society only in the trivial sense that society determines what the in-group is.

Of course empathy-lacking individuals exist, but make up a small portion of the population. It seems more likely that any given instance of one person enjoying harming another is due to instrumental value rather than terminal.

2Jiro
But individuals who have empathy with some others, but not other others, are more common. They can have terminal values to cause suffering for that portion of the population they don't have empathy with.

It took less time to highlight "Why We Are Fighting You" and search on Google than it took for you to ask for a source. Literally it took three clicks.

Full text of bin Laden's "letter to America"

Are you are suggesting that people just have a desire to cause suffering and that their reasons (dieties, revenge, punishment, etc.) are mostly just attempts to frame that desire in a personally acceptable manner? I ask because it seems like most people probably just don't enjoy watching just anyone suffer, they tend to target other groups which suggests a more strategic reason than just enjoying cruelty.

0James_Miller
Yes, harming others is a terminal value for evil people.

I'm now tempted to include this announcement of the newsletter in the newsletter just for the one-off recursion joke I can make.

I say go for it, but then my highest voted submission to discussion was this.

If this article makes it to 20 votes will it be included in the newsletter?

0Evan_Gaensbauer
Edit: it's going to be weird if this announcement is the only post this week to pass a threshold of 20 upvotes. I count the 'week' on the same cycle as open threads posted on LessWrong. It's only been two days since 2400 hours Sunday night, i.e., Monday night 0000 hours. Still, though, there is nothing new unrelated to HPMoR which passes the threshold. My hypothesis is everyone is too busy reading HPMoR, or discussing it, to bother producing other content. I'm only half-joking. The most upvoted comments for the last week are all predictions about what's coming up in HPMoR. Like, how maybe the final trial for Harry will actually be a test of not letting the AI out of the box... Should I break my rule of not including HPMoR-related content in the digest? If not, there will be nothing... I'm now tempted to include this announcement of the newsletter in the newsletter just for the one-off recursion joke I can make.
3Evan_Gaensbauer
I was thinking about that. No, it won't. I'm using my own judgment to exclude some things which are upvoted to signal things other than the content of the post being 'very rational', or whatever. One of the most upvoted posts of last year was the announcement of a new moderator. The wouldn't 'have been included in the newsletter, as important as that is. Searching through the 'top comments' for last week, in making the first newsletter, more than half of them were highly upvoted predictions for what will happen next in HPMoR. Maybe some of them would be worth including, since virtually everyone reading the newsletter would also be interested in a compelling prediction of what will happen next in the plot. Maybe I'm biased by the fact I haven't read the latest chapters yet, and I didn't want the plot spoiled, so I skimmed those comments. Still, though, there were over a dozen lengthy HPMoR predictions. They don't strike me as content fit for the newsletter. I'm only reading this comment as this article indeed has reached 21 upvotes. It'd be pretty funny, and 'meta', if this was included. Your question is interesting, though, because it makes me clarify what are exceptions to inclusion.

But that's the thing. There is no sensory input for "social deference". It has to be inferred from an internal model of the world itself inferred from sensory data...Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can't use it for social instincts or morality, or anything you can't just build a simple sensor to detect.

Why does it only work on simple signals? Why can't the result of inference work for reinforcement learning?

[This comment is no longer endorsed by its author]Reply

I don't think that humans are pure reinforcement learners. We have all sorts of complicated values that aren't just eating and mating.

We may not be pure reinforcement learners, but the presence of values other than eating and mating isn't a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that's creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don't have ... (read more)

0Houshalter
But that's the thing. There is no sensory input for "social deference". It has to be inferred from an internal model of the world itself inferred from sensory data. Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can't use it for social instincts or morality, or anything you can't just build a simple sensor to detect.

Okay, I am convinced. I really, really appreciate you sticking with me through this and persistently finding different ways to phrase your side and then finding ways that other people have phrased it.

For reference it was the link to the paper/book that did it. The parts of it that are immediately relevant here are chapter 3 and section 4.2.1.1 (and optionally section 5.3.5). In particular, chapter 3 explicitly describes an order of operations of goal and subgoal evaluation and then the two other sections show how wireheading is discounted as a failing str... (read more)

5Gram_Stone
And thank you for sticking with me! It's really hard to stick it out when there's no such thing as an honest disagreement and disagreement is inherently disrespectful! ETA: See the ETA in this comment to understand how my reasoning was wrong but my conclusion was correct.

How would that [valuing universe-states themselves] work? Well that's the quadrillion dollar question. I have no idea how to solve it.

Yeah, I think this whole thread may be kind of grinding to this conclusion.

It's certainly not impossible as humans seem to work this way

Seem to perhaps, but I don't think that's actually the case. I think (as mentioned above) that we value reward signals terminally (but are mostly unaware of this preference) and nothing else. There's another guy in this thread who thinks we might not have any terminal values.

I'm no... (read more)

1Houshalter
I don't think that humans are pure reinforcement learners. We have all sorts of complicated values that aren't just eating and mating. The toy AI has an internal model of the universe. In the extreme, a complete simulation of every atom and every object. It's sensors update the model, helping it get more accurate predictions/more certainty about the universe state. Instead of a utility function that just measures some external reward signal, it has an internal utility function which somehow measures the universe model and calculates utility from it. E.g. a function which counts the number of atoms arranged in paperclip shaped objects in the simulation. It then chooses actions that lead to the best universe states. Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn't lead to real paperclips. Obviously a real universe model would be highly compressed. It would have a high level representation for paperclips rather than an atom by atom simulation. I suspect this is how humans work. We can value external objects and universe states. People care about things that have no effect on them.

It discourages me that he tabooed 'values' and you immediately used it anyway.

In fairness, I only used it to describe how they'd come to be used in this context in the first place, not to try and continue with my point.

I wrote a Python-esque pseudocode example of my conception of what an AGI with an arbitrary terminal value's very high level source code would look like. With little technical background, my understanding is very high level with lots of black boxes. I encourage you to do the same, such that we may compare.

I've never done something l... (read more)

3arundelo
Unfortunately it's a longstanding bug that preformatted blocks don't work.
3Gram_Stone
Something like that. I posted my pseudocode in an open thread a few days ago to get feedback and I couldn't get indentation to work either so I posted mine to Pastebin and linked it. I'm still going through the Sequences, and I read Terminal Values and Instrumental Values the other day. Eliezer makes a pseudocode example of an ideal Bayesian decision system (as well as its data types), which is what an AGI would be a computationally tractable approximation of. If you can show me what you mean in terms of that post, then I might be able to understand you. It doesn't look like I was far off conceptually, but thinking of it his way is better than thinking of it my way. My way's kind of intuitive I guess (or I wouldn't have been able to make it up) but his is accurate. I also found his paper (Paper? More like book) Creating Friendly AI. Probably a good read for avoiding amateur mistakes, which we might be making. I intend to read it. Probably best not to try to read it in one sitting. Even though I don't want you to think of it this way, here's my pseudocode just to give you an idea of what was going on in my head. If you see a name followed by parentheses, then that is the name of a function. 'Def' defines a function. The stuff that follows it is the function itself. If you see a function name without a 'def', then that means it's being called rather than defined. Functions might call other functions. If you see names inside of the parentheses that follow a function, then those are arguments (function inputs). If you see something that is clearly a name, and it isn't followed by parentheses, then it's an object: it holds some sort of data. In this example all of the objects are first created as return values of functions (function outputs). And anything that isn't indented at least once isn't actually code. So 'For AGI in general' is not a for loop, lol. http://pastebin.com/UfP92Q9w

But there is no theoretical reason you can't have an AI that values universe-states themselves.

How would that work? How do you have a learner that doesn't have something equivalent to a reinforcement mechanism? At the very least it seems like there has to be some part of the AI that compares the universe-state to the desired-state and that the real goal is actually to maximize the similarity of those states which means modifying the goal would be easier than modifying reality.

And if it did have such a goal, why would it change it?

Agreed. I am trying to get someone to explain how such a goal would work.

1Houshalter
Well that's the quadrillion dollar question. I have no idea how to solve it. It's certainly not impossible as humans seem to work this way. We can also do it in toy examples. E.g. a simple AI which has an internal universe it tries to optimize, and it's sensors merely update the state it is in. Instead of trying to predict the reward, it tries to predict the actual universe state and selects the ones that are desirable.

Pleasure and reward are not the same thing. For humans, pleasure almost always leads to reward, but reward doesn't only happen with pleasure. For the most extreme examples of what you're describing, ascetics and monks and the like, I'd guess that some combination of sensory deprivation and rhythmic breathing cause the brain to short circuit a bit and release some reward juice.

0[anonymous]
People lead fulfilling lives guided by a spiritualism that reject seeking pleasure. Aka reward.

Sure. My terminal goal is an abstraction of my behavior to shoot my laser at the coordinates of blue objects detected in my field of view.

Well, I suppose that does fit the question I asked. We've mostly been talking about an AI with the ability to read and modify it's own goal system which Yvain specifically excludes in the blue-minimizer. We're also assuming that it's powerful enough to actually manipulate it's world to optimize itself. Yvain's blue minimizer also isn't an AGI or ASI. It's an ANI, which we use without any particular danger all the time... (read more)

0[anonymous]
How do you explain Buddhism?

I don't think they're necessarily safe. My original puzzlement was more that I don't understand why we keep holding the AI's value system constant when moving from pre-foom to post-foom. It seemed like something was being glossed over when a stupid machine goes from making paperclips to a being a god that makes paperclips. Why would a god just continue to make paperclips? If it's super intelligent, why wouldn't it figure out why it's making paperclips and extrapolate from that? I didn't have the language to ask "what's keeping the value system stable through that transition?" when I made my original comment.

0Houshalter
It depends on the AI architecture. A reinforcement learner always has the goal of maximizing it's reward signal. It never really had a different goal, there was just something in the way (e.g. a paperclip sensor.) But there is no theoretical reason you can't have an AI that values universe-states themselves. That actually wants the universe to contain more paperclips, not merely to see lots of paperclips. And if it did have such a goal, why would it change it? Modifying it's code to make it not want paperclips, would hurt it's goal. It would only ever do things that help it achieve it's goal. E.g. making itself smarter. So eventually you end up with a superintelligent AI, that is still stuck with the narrow stupid goal of paperclips.

My apologies for taking so long to reply. I am particularly interested in this because if you (or someone) can provide me with an example of a value system that doesn't ultimately value the output of the value function, it would change my understanding of how value systems work. So far, the two arguments against my concept of a value/behavior system seem to rely on the existence of other things that are valuable in and of themselves or that there is just another kind of value system that might exist. The other terminal value thing doesn't hold much promise... (read more)

1Gram_Stone
No problem, pinyaka. I don't understand very much about mathematics, computer science, or programming, so I think that, for the most part, I've expressed myself in natural language to the greatest extent that I possibly can. I'm encouraged that about an hour and a half before my previous reply, DefectiveAlgorithm made the exact same argument that I did, albeit more briefly. It discourages me that he tabooed 'values' and you immediately used it anyway. Just in case you did decide to reply, I wrote a Python-esque pseudocode example of my conception of what an AGI with an arbitrary terminal value's very high level source code would look like. With little technical background, my understanding is very high level with lots of black boxes. I encourage you to do the same, such that we may compare. I would prefer that you write yours before I give you mine so that you are not anchored by my example. This way you are forced to conceive of the AI as a program and do away with ambiguous wording. What do you say? I've asked Nornagest to provide links or further reading on the value stability problem. I don't know enough about it to say anything meaningful about it. I thought that wireheading scenarios were only problems with AIs whose values were loaded with reinforcement learning. On this at least we agree. From what I understand, even if you're biased, it's not a bad assumption. To my knowledge, in scenarios with AGIs that have their values loaded with reinforcement learning, the AGIs are usually given the terminal goal of maximizing the time-discounted integral of their future reward signal. So, they 'bias' the AGI in the way that you may be biased. Maybe so that it 'cares' about the rewards its handlers give it more than the far greater far future rewards that it could stand to gain from wireheading itself? I don't know. My brain is tired. My question looks wrong to me.

Sure. I think if you assume that the goal is paperclip optimization after the AI has reached it's "final" stable configuration then the normal conclusions about paperclip optimizers probably hold true. The example provided dealt more with the transition from dumb-AI to smart-AI and I'm not sure why Tully (or Clippy) wouldn't just modify their own goals to something that's easier to attain. Assuming that the goals don't change though, we're probably screwed.

0Houshalter
Turry's and Clippy's AI architectures are unspecified, so we don't really know how they work or what they are optimizing. I don't like your assumption that runaway reinforcement learners are safe. If it acquires the subgoal of self-preservation (you can't get more reward if you are dead), then it might still end up destroying humanity anyway (we could be a threat to it.)

I think FeepingCreature was actually just pointing out a logical fallacy in a misstatement on my part and that is why they didn't respond further in this part of the thread after I corrected myself (but has continued elsewhere).

If you believe that a terminal goal for the state of the world other than the result of a comparison between a desired state and an actual state is possible, perhaps you can explain how that would work? That is fundamentally what I'm asking for throughout this thread. Just stating that terminal goals are terminal goals by definition is true, but doesn't really show that making a goal terminal is possible.

0[anonymous]
Sure. My terminal goal is an abstraction of my behavior to shoot my laser at the coordinates of blue objects detected in my field of view. That's not what I was saying either. The problem of "how do we know a terminal goal is terminal?" is dissolved entirely by understanding how goal systems work in real intelligences. In such machines goals are represented explicitly in some sort of formal language. Either a goal makes causal reference to other goals in its definition, in which case it is an instrumental goal, or it does not and is a terminal goal. Changing between one form and the other is an unsafe operation no rational agent and especially no friendly agent would perform. So to address your statement directly, making a terminal goal is trivially easy: you define it using the formal language of goals in such a way that no causal linkage is made to other goals. That's it. That said, it's not obvious that humans have terminal goals. That's why I was saying you are anthropomorphizing the issue. Either humans have only instrumental goals in a cyclical or messy spaghetti-network relationship, or they have no goals at all and instead better represented as behaviors. The Jury is out on this one, but I'd be very surprised if we had anything resembling an actual terminal goal inside us.

A paperclip maximizer won't wirehead because it doesn't value world states in which its goals have been satisfied, it values world states that have a lot of paperclips

I am not as confident as you that valuing worlds with lots of paperclips will continue once an AI goes from "kind of dumb AI" to "super-AI." Basically, I'm saying that all values are instrumental values and that only mashing your "value met" button is terminal. We only switched over to talking about values to avoid some confusion about reward mechanisms.

A pa

... (read more)

Would you care to try and clarify it for me?

0[anonymous]
The way in which artificial intelligences are often written, a terminal goal is a terminal goal is a terminal goal, end of story. "Whatever seemingly terminal goal you've given it isn't actually terminal" is anthropomorphizing. In the AI, a goal is instrumental if it has a link to a higher-level goal. If not, it is terminal. The relationship is very, very explicit.

So how does this relate to the discussion on AI?

As far as I know terminal values are things that are valuable in an of themselves. I don't consider not building baby-mulchers to be valuable in and of itself. There may be some scenario in which building baby-mulchers is more valuable to me than not and in that scenario I would build one. Likewise with doomsday devices. It's difficult to predict what that scenario would look like, but given that other humans have built them I assume that I would too. In those circumstances if I could turn off the parts of my brain that make me squeamish about doing that, ... (read more)

0Lumifer
OK. I appreciate you biting the bullet. No, that is NOT what I am saying. "Biologically hardwired" basically means you are born with these values and while overcoming them is possible, it will take extra effort. It certainly does not mean that you have no choice. Humans do something other than what their biologically hardwired terminal values tell them on a very regular basis. One reason for this is that values are many and they tend not to be consistent.

Again, you've pulled a statement out of a discussion the context of the behavior of a self-modifying AI. So, fine. In my current condition I wouldn't build a baby mulcher. That doesn't mean that I might not build a baby mucher if I had the ability to change my values. You might as well say that I terminally value not flying when I flap my arms. The thing you're discussing just isn't physically allowed. People terminally value only what they're doing at any given moment because the laws of physics say that they have no choice.

0Lumifer
I think you're confusing "terminal" and "immutable". Terminal values can and do change. And why is that? Do you, perchance, have some terminal moral value which disapproves? Huh? That makes no sense. How do you define "terminal value"?

Well, the pleasure center and the reward center are different things, but I take your meaning. I think that I could be conditioned to build a baby-mulching machine or a doomsday device. Why not? Other people have done it. Why would I assume that I'm that different from them?

EDIT TO ADD: Even if I have a value that I can't escape currently (like not killing people), that's not to say that if I had the ability to physically modify the parts of my brain that held my values I wouldn't do it for some reason.

0Lumifer
My statement is stronger. If in your current state you don't have any terminal moral values, then in your current state you would voluntarily accept to operate baby-mulching machines in exchange for the right amount of neural stimulation. Now, I don't happen to think this is true (because some "moral values" are biologically hardwired into humans), but this is a consequence of your position.

Two other people in this thread have pointed out that the value collapse into wireheading or something else is a known and unsolved problem and that the problems of an intelligence that optimizes for something assumes that the AI makes it through this in some unknown way. This suggests that I am not wrong, I'm just asking a question for which no one has an answer yet.

Fundamentally, my position is that given 1.) an AI is motivated by something 2.) That something is a component (or set of components) within the AI and 3.) The AI can modify that/those compone... (read more)

2DefectiveAlgorithm
A paperclip maximizer won't wirehead because it doesn't value world states in which its goals have been satisfied, it values world states that have a lot of paperclips. In fact, taboo 'values'. A paperclip maximizer is an algorithm the output of which approximates whichever output leads to world states with the greatest expected number of paperclips. This is the template for maximizer-type AGIs in general.
5Gram_Stone
I've seen people talk about wireheading in this thread, but I've never seen anyone say that problems about maximizers-in-general are all implicitly problems about reward maximizers that assume that the wireheading problem has been solved. If someone has, please provide a link. Instead of imagining intelligent agents (including humans) as 'things that are motivated to do stuff,' imagine them as programs that are designed to cause one of many possible states of the world according to a set of criteria. Google isn't 'motivated to find your search results.' Google is a program that is designed to return results that meet your search criteria. A paperclip maximizer for example is a program that is designed to cause the one among all possible states of the world that contains the greatest integral of future paperclips. Reward signals are values that are correlated with states of the world, but because intelligent agents exist in the world, the configuration of matter that represents the value of a reward maximizer's reward signal is part of the state of the world. So, reward maximizers can fulfill their terminal goal of maximizing the integral of their future reward signal in two ways: 1) They can maximize their reward signal by proxy by causing states of the world that maximize values that correlate with their reward signal, or; 2) they can directly change the configuration of matter that represents their reward signal. #2 is what we call wireheading. What you're actually proposing is that a sufficiently intelligent paperclip maximizer would create a reward signal for itself and change its terminal goal from 'Cause the one of all possible states of the world that contains the greatest integral of future paperclips' to 'Cause the one of all possible states of the world that contains the greatest integral of your future reward signal.' The paperclip maximizer would not cause a state of the world in which it has a reward signal and its terminal goal is to maximize said

You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it "somehow" doesn't become a wirehead. Fair enough. I just missed that we were assuming that.

0FeepingCreature
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don't have an AI catastrophe without that, because you don't have an AI without that (at least not for long).

That's helpful to know. I just missed the assumption that wireheading doesn't happen and now we're more interested in what happens next.

I think I understood you. What do you think I misunderstood?

Maybe we should quit saying that evolution rewards anything at all. Replication isn't a reward, it's just a byproduct of an non-intelligent processes. There was never an "incentive" to reproduce, any more than there is an "incentive" for any physical process. High pressure air moves to low pressure regions, not because there's an incentive, but because that's just how physics works. At some point, this non-sentient process accidentally invented a reward system and replication,... (read more)

I don't consider morality to be a terminal value. I would point out that even a value that I have that I can't give up right now wouldn't necessarily be terminal if I had the ability to directly modify the components of my mind. They are unalterable because I am not able to physically manipulate the hardware, not because I wouldn't alter them if I could (and saw a reason to).

0Lumifer
That implies that you would do anything at all (baby-mulching machines, nuke the world, etc.) for sufficient stimulation of your pleasure center.

whatever terminal goal you've given it isn't actually terminal.

This is a contradiction in terms.

I should have said something more like "whatever seemingly terminal goal you've given it isn't actually terminal."

0[anonymous]
I'm not sure you understood what FeepingCreature was saying.

We don't have the ability to directly fulfil the reward center. I think narcotics are the closest we've got now and lots of people try to mash that button to the detriment of everything else. I just think it's a kind of crude button and it doesn't work as well as the direct ability to fully understand and control your own brain.

0Ishaan
I think you may have misunderstood me - there's a distinction between what evolution rewards and what humans find rewarding. (This is getting hard to talk about because we're using "reward' to both describe the process used to steer a self-modifying intelligence in the first place and one of the processes that implements our human intelligence and motivations, and those are two very different things.) The "rewarded behavior" selected by the original algorithm was directly tied to replication and survival. Drug-stimulated reward centers fall in the "current behaviors that trigger the reward" category, not the original reward. Even when we self-stimulate our reward centers, the thing that we are stimulating isn't the thing that evolution directly "rewards". Directly fulfilling the originally incentivized behavior isn't about food and sex - a direct way might, for example, be to insert human genomes into rapidly dividing, tough organisms and create tons and tons of them and send them to every planet they can survive on. Similarly, an intelligence which arises out of a process set up to incentivize a certain set of behaviors will not necessarily target those incentives directly. It might go on to optimize completely unrelated things that only coincidentally target those values. That's the whole concern. If an intelligence arises due to a process which creates things that cause us to press a big red "reward" button, the thing that eventually arises won't necessarily care about the reward button, won't necessarily care about the effects of the reward button on its processes, and indeed might completely disregard the reward button and all its downstream effects altogether... in the same way we don't terminally value spreading our genome at all. Our neurological reward centers are a second layer of sophisticated incentivizing which emerged from the underlying process of incentivizing fitness.

I guess I don't really believe that I have other terminal values.

0Ishaan
You wouldn't consider the cluster of things which typically fall under morality to be terminal values, which you care about irrespective of your internal mental state?
Load More