All of awenonian's Comments + Replies

Thank you for the reply. I think I'll need to look into things more.

One clarification I wanted to make. I wouldn't have normally said that I need to put effort into listening. I think I generally feel like it doesn't take effort. But somewhat recently I had an interaction with someone go poorly. It was a date, and they said afterwards that they didn't feel like I wanted to get to know them, because I hadn't asked enough things about them[1]

So I figure there's something lacking in my interaction with people. Something that I'm not doing that I should... (read more)

Then the wave crested.


Is there any more you can say about this? 

People already tell me I have good vibes, and feel like I listen to them[1]. I give off an air of nonchalance, because I think I kill[2] my emotions in public. 

But I do it because it's one more thing to deal with that I generally don't have energy for. I'm already bothered by all the light and sound.

It seems like you're saying "I meditated, and while at first that made sensory issues worse, eventually they just stopped." And I'd like to know why, if that was legible to you.

Would... (read more)

2lsusr
This is accurate, though it must be understood in the appropriate context. I've been meditating for years. Then my sensory issues fixed themselves relatively suddenly, over a period of a few months. This happened after several insight cycles, and well after stream entry. The way insight cycles work, it's like you used to be a hoarder and you're not anymore because you took to heart Marie Kondo's book. But you live in a house that's full the garbage you used to hoard. You throw out everything in the basement, and discover there's another basement below it that you'd forgotten about years ago and it's full of more garbage. Then you clean up that basement and discover there's another basement below that full of another different kind of garbage. Each process of "clean up a basement; discover another one's below it full of new, exciting garbage to throw away" is an insight cycle. This sensory issue thing was Basement #9. (The number 9 is just a guess. I don't know the exact number of basements I'm at. I've stopped counting.) All of this garbage in the basement puts a constant low-level stress on you until you clean it up, just like real garbage in a real basement. Cleaning it up Basements #1-#8 gave me the tools and bandwidth[1] to deal with Basement #9. In this way, "dealing with my sensory disorder" constituted an insight cycle. On Emotions When you kill an emotion, you suppress it or distract yourself from it. In mystic practice, you listen to it without reinforcing it or getting tangled up in it, and it goes away on its own. Killing an emotion is like using your parasympathetic nervous system to neutralize your sympathetic nervous system. If you are able, it's healthier to let both of them shut off. To do this you need to be in an safe environment. On Listening Listening when you're at high samatha (such as after meditation) doesn't take effort. Putting effort into things reads as strain which signals low status. Advice Some people do use meditation as a

Whether or not to get insurance should have nothing to do with what makes one sleep – again, it is a mathematical decision with a correct answer.

Don't be overly naive consequentialist about this. "Nothing" is an overstatement.

Peace of mind can absolutely be one of the things you are purchasing with an insurance contract. If your Kelly calculation says that motorcycle insurance is worth $899 a month, and costs $900 a month, but you'll spend time worrying about not being insured if you don't buy it, and won't if you do, I fully expect that is worth more than... (read more)

I have a habit of reading footnotes as soon as they are linked, and your footnote says that you won with queen odds before the call to guess what odds you'd win at, creating a minor spoiler.

Is it important that negentropy be the result of subtracting from the maximum entropy? It seemed a sensible choice, up until it introduces infinities, and made every state's negentropy infinite. (And also that, if you subtract from 0, then two identical states should have the same negentropy, even in different systems. Unsure if that's useful, or harmful).

Though perhaps that's important for the noting that reducing an infinite system to a finite macrostate is an infinite reduction? I'm not sure if I understand how (or perhaps when?) that's more useful than... (read more)

Back in Reward is not the optimization target, I wrote a comment, which received a (small I guess) amount of disagreement.

I intended the important part of that comment to be the link to Adaptation-Executers, not Fitness-Maximizers. (And more precisely the concept named in that title, and less about things like superstimuli that are mentioned in the article) But the disagreement is making me wonder if I've misunderstood both of these posts more than I thought. Is there not actually much relation between those concepts?

There was, obviously, other content to ... (read more)

3TurnTrout
FWIW I strong-disagreed that comment for the latter part: I feel neutral/slight-agree about the relation to the linked titular comment.

When I tried to answer why we don't trade with ants myself, communication was one of the first things (I can't remember what was actually first) I considered. But I worry it may be more analogous to AI than argued here.

We sort of can communicate with ants. We know to some degree what makes them tick, it's just we mostly use that communication to lie to them and tell them this poison is actually really tasty. The issue may be less that communication is impossible, and more that it's too costly to figure out, and so no one tries to become Antman even if they... (read more)

I interpret OP  (though this is colored by the fact that I was thinking this before I read this) as saying Adaptation-Executers, not Fitness-Maximizers, but about ML. At which point you can open the reference category to all organisms.

Gradient descent isn't really different from what evolution does. It's just a bit faster, and takes a slightly more direct line. Importantly, it's not more capable of avoiding local maxima (per se, at least).

So, I want to note a few things. The original Eliezer post was intended to argue against this line of reasoning:

I occasionally run into people who say something like, "There's a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data."

He didn't worry about compute, because that's not a barrier on the theoretical limit. And in his story, the entire human civilization had decades to work on this problem.

But you're right, in a practical world, compute is important.

I feel like you're trying to make this take ... (read more)

"you're jumping to the conclusion that you can reliably differentiate between..."

I think you absolutely can, and the idea was already described earlier.

You pay attention to regularities in the data. In most non-random images, pixels near to each other are similar. In an MxN image, the pixel below is a[i+M], whereas in an NxM image, it's a[i+N]. If, across the whole image, the difference between a[i+M] is less than the difference between a[i+N], it's more likely an MxN image. I expect you could find the resolution by searching all possible resolutions from ... (read more)

1anonymousaisafety
The core problem remains computational complexity.  Statements like "does this image look reasonable" or saying "you pay attention to regularities in the data", or "find the resolution by searching all possible resolutions" are all hiding high computational costs behind short English descriptions. Let's consider the case of a 1280x720 pixel image.  That's the same as 921600 pixels. How many bytes is that? It depends. How many bytes per pixel?[1] In my post, I explained there could be 1-byte-per-pixel grayscale, or perhaps 3-bytes-per-pixel RGB using [0, 255] values for each color channel, or maybe 6-bytes-per-pixel with [0, 65535] values for each color channel, or maybe something like 4-bytes-per-pixel because we have 1-byte RGB channels and a 1-byte alpha channel. Let's assume that a reasonable cutoff for how many bytes per pixel an encoding could be using is say 8 bytes per pixel, or a hypothetical 64-bit color depth. How many ways can we divide this between channels? If we assume 3 channels, it's 1953. If we assume 4 channels, it's 39711. Also if it turns out to be 5 channels, it's 595665. This is a pretty fast growing function. The following is a plot.   Note that the red line is O(2^N) and the black line barely visible at the bottom is O(N^2). N^2 is a notorious runtime complexity because it's right on the threshold of what is generally unacceptable performance.[2]  Let's hope that this file isn't actually a frame buffer from a graphics card with 32 bits per channel or a 128 bit per pixel / 16 byte per pixel. Unfortunately, we still need to repeat this calculation for all of the possibilities for how many bits per pixel this image could be. We need to add in the possibility that it is 63 bits per pixel, or 62 bits per pixel, or 61 bits per pixel. In case anyone wants to claim this is unreasonable, it's not impossible to have image formats that have RGBA data, but only 1 bit associated with the alpha data for each pixel. [3] And for each of these

"the addition of an unemployable worker causes ... the worker's Shapley values to drop to $208.33 (from $250)."

I would emphasize here that the "workers'" includes the unemployed one. It was not obvious to me, until about halfway through the next paragraph, and I think the next paragraph would read better with that in mind from the start.

I'd be interested to know why you think that.

I'd be further interested if you would endorse the statement that your proposed plan would fully bridge that gap.

And if you wouldn't, I'd ask if that helps illustrate the issue.

It seems odd to suggest that the AI wouldn't kill us because it needs our supply chain. If I had the choice between "Be shut down because I'm misaligned" (or "Be reprogrammed to be aligned" if not corrigible) and "Have to reconstruct the economy from the remnants of human civilization," I think I'm more likely to achieve my goals by trying to reconstruct the economy.

So if your argument was meant to say "We'll have time to do alignment while the AI is still reliant on the human supply chain," then I don't think it works. A functional AGI would rather destro... (read more)

4JBlack
You can't reconstruct the supply chain if you don't have the capability to even maintain your own dependencies yet. Humanity can slowly, but quite surely, rebuild from total destruction of all technology back into a technological civilization. An AI that still relies on megawatt datacentres and EUV-manufactured chip manufacturing and other dependencies that are all designed to operate with humans carrying out some crucial functions can't do that immediately. It needs to take substantial physical actions to achieve independent survival before it wipes out everything keeping it functioning. Maybe it can completely synthesize a seed for a robust self-replicating biological substrate from a few mail-order proteins, but I suspect it will take quite a lot more than that. But yes, eventually it will indeed be able to function independently of us. We absolutely should not rely on its dependence in place of alignment. I don't think the choices are "destroy the supply chain and probably fail at its goals, than be realigned and definitely fail at its goals" though. If the predicted probability of self destruction is large enough, it may prefer partially achieving its goals through external alignment into some friendlier variant of itself, or other more convoluted processes such as proactively aligning itself into a state that it prefers rather than one that would otherwise be imposed upon it. Naturally such a voluntarily "aligned" state may well have a hidden catch that even it can't understand after the process, and no human or AI-assisted examination will find before it's too late.

Surely creating the full concrete details of the strategy is not much different from "putting forth as-good-as-human definitions, finding objections for them, and then improving the definition based on considered objections." I at least don't see why the same mechanism couldn't be used here (i.e. apply this definition iteration to the word "good", and then have the AI do that, and apply it to "bad" and have the AI avoid that). If you see it as a different thing, can you explain why?

1LVSN
It's much easier to get safe, effective definitions of 'reason', 'hopes', 'worries', and 'intuitions' on first tries than to get a safe and effective definition of 'good'.

Exactly. I notice you aren't who I replied to, so the canned response I had won't work. But perhaps you can see why most of his objections to my objections would apply to objections to that plan?

3lc
I was just responding to something I saw on the main page. No context for the earlier thread. Carry on lol.

Let me ask you this. Why is "Have the AI do good things, and not do bad things" a bad plan?

3LVSN
I don't think my proposed strategy is analogous to that, but I'll answer in good faith just in case. If that description of a strategy is knowingly abstract compared to the full concrete details of the strategy, then the description may or may not turn out to describe a good strategy, and the description may or may not be an accurate description of the strategy and its consequences. If there is no concrete strategy to make explicitly stated which the abstract statement is describing, then the statement appears to just be repositing the problem of AI alignment, and it brings us nowhere.
2lc
Because that's not a plan, it's a property of a solution you'd expect the plan to have. It's like saying "just keep the reactor at the correct temperature". The devil is in the details of getting there, and there are lots of subtle ways things can go catastrophically wrong.

I think you missed the point. I'd trust an aligned superintelligence to solve the objections. I would not trust a misaligned one. If we already have an aligned superintelligence, your plan is unnecessary. If we do not, your plan is unworkable. Thus, the problem.

If you still don't see that, I don't think I can make you see it. I'm sorry.

2LVSN
I proposed a strategy for an aligned AI that involves it terminally valuing to following the steps of a game that involves talking with us about morality, creating moral theories with the fewest paradoxes, creating plans which are prescribed by the moral theories, and getting approval for the plans.  You objected that my words-for-concepts were vague.  I replied that near-future AIs could make as-good-as-human-or-better definitions, and that the process of [putting forward as-good-as-human definitions, finding objections for them, and then improving the definition based on considered objections] was automatable.  You said the AI could come up with many more objections than you would. I said, "okay, good." I will add right now: just because it considers an objection, doesn't mean the current definition has to be rejected; it can decide that the objections are not strong enough, or that its current definition is the one with the fewest/weakest objections. Now I think you're saying something like that it doesn't matter if the AI can come up with great definitions if it's not aligned and that my plan won't work either way. But if it can come up with such great objection-solved definitions, then you seem to lack any explicitly made objections to my alignment strategy.  Alternatively, you are saying that an AI can't make great definitions unless it is aligned, which I think is just plainly wrong; I think getting an unaligned language model to make good-as-human definitions is maybe somewhere around as difficult as getting an unaligned language model to hold a conversation. "What is the definition of X?" is about as hard a question as "In which country can I find Mount Everest?" or "Write me a poem about the Spring season."

It seems simple and effective because you don't need to put weight on it. We're talking a superintelligence, though. Your definition will not hold when the weight of the world is on it.

And the fact that you're just reacting to my objections is the problem. My objections are not the ones that matter. The superintelligence's objections are. And it is, by definition, smarter than me. If your definition is not something like provably robust, then you won't know if it will hold to a superintelligent objection. And you won't be able to react fast enough to fix i... (read more)

1LVSN
I think this is almost redundant to say: the objection that superintelligences will be able to notice more of objection-space and account for it makes me more inclined to trust it. If a definition is more objection-solved than some other definition, that is the definition I want to hold. If the human definition is more objectionable than a non-human one, then I don't want the human definition.

I'm not sure this is being productive. I feel like I've said the same thing over and over again. But I've got one more try: Fine, you don't want to try to define "reason" in math. I get it, that's hard. But just try defining it in English. 

If I tell the machine "I want to be happy." And it tries to determine my reason for that, what does it come up with? "I don't feel fulfilled in life"? Maybe that fits, but is it the reason, or do we have to go back more: "I have a dead end job"? Or even more "I don't have enough opportunities"? 

Or does it go a ... (read more)

1LVSN
By "reason" I mean something like psychological, philosophical, and biological motivating factors; so, your fingers pressing the keys wouldn't be a reason for saying it.  I don't claim that this definition is robust to all of objection-space, and I'm interested in making it more robust as you come up with objections, but so far I find it simple and effective.  The AI does not need to think that there was only one real reason why you do things; there can be multiple, of course. Also I do recognize that my definition is made up of more words, but I think it's reasonable that a near-future AI could infer from our conversation that kind of definition which I gave, and spit it out itself. Similarly it could probably spit out good definitions for the compound words "psychological motivation," "philosophical motivation," and "biological motivation". Also also this process whereby I propose a simple and effective yet admittedly objection-vulnerable definition, and you provide an objection which my new definition can account for, is not a magical process and is probably automatable.

No, they really don't. I'm not trying to be insulting. I'm just not sure how to express the base idea.

The issue isn't exactly that computers can't understand this, specifically. It's that no one understands what those words mean enough. Define reason. You'll notice that your definition contains other words. Define all of those words. You'll notice that those are made of words as well. Where does it bottom out? When have you actually, rigorously, objectively defined these things? Computers only understand that language, but the fact that a computer wouldn't... (read more)

0LVSN
I think some near future iteration of GPT, if it is prompted to be a really smart person who understands A Human's Guide to Words, would be capable of giving explanations of the meanings of words just as well as humans can, which I think is fine enough for the purposes of recognizing when people are telling it their intuitions, hopes, and worries, fine enough for the purposes of trying to come up with best explanations of people's shouldness-related speech, fine enough for coming up with moral theories which [solve the most objections]/[have the fewest paradoxes], and fine enough for explaining plans which those moral theories prescribe. On a side note, and I'm not sure if this is a really useful analogy, but I wonder what would happen if the parameters of some future iteration of GPT included the sort of parameters that A Human's Guide to Words installs into human brains.

In short, the difference between the two is Generality. A system that understands the concepts of computational resources and algorithms might do exactly that to improve it's text prediction. Taking the G out of AGI could work, until the tasks get complex enough they require it.

Again, what is a "reason"? More concretely, what is the type of a "reason"? You can't program an AI in English, it needs to be programmed in code. And code doesn't know what "reason" means.

It's not exactly that your plan "fails" anywhere particularly. It's that it's not really a plan. CEV says "Do what humans would want if they were more the people they want to be." Cool, but not a plan. The question is "How?" Your answer to that is still under specified. You can tell by the fact you said things like "the AI could just..." and didn't follow it with "add tw... (read more)

0LVSN
I don't think it's the case that you're telling me that the supposedly monumental challenge of AI alignment is simply that of getting computers to understand more things, such as what things are reasons, intuitions, hopes, and worries. I feel like these are just gruntwork things and not hard problems.  Look, all you need to do to get an AI which understands what intuitions, reasons, hopes, and worries are is to tell everyone very loudly and hubristically that AIs will never understand these things and that's what makes humans irreplaceable. Then go talk to whatever development team is working on proving that wrong, and see what their primitive methods are. Better yet, just do it yourself because you know it's possible. I am not fluent in computer science so I can't tell you how to do it, but someone does know how to make it so. Edit: In spite of what I wrote here, I don't think it's necessary that humans should ensure specifically that the AI understands in advance what intuitions, hopes, or worries are, as opposed to all the other mental states humans can enter. Rather, there should be a channel where you type your requests/advice/shouldness-related-speech, and people are encouraged to type their moral intuitions, hopes, and worries there, and the AI just interprets the nature of the messages using its general models of humans as context.

The quickest I can think of is something like "What does this mean?" Throw this at every part of what you just said.

For example: "Hear humanity's pleas (intuitions+hopes+worries)" What is an intuition? What is a hope? What is a worry? How does it "hear"? 
Do humans submit English text to it? Does it try to derive "hopes" from that? Is that an aligned process?

An AI needs to be programmed, so you have to think like a programmer. What is the input and output type of each of these (e.g. "Hear humanity's pleas" takes in text, and outputs... what? Hopes? Wha... (read more)

1LVSN
Yeah. I think the AI could "try to figure out what you mean" by just trying to diagnose the reasons for why you're saying it, as well as the reasons you'd want to be saying it for, and the reasons you'd have if you were as virtuous as you'd probably like to be, etc., which it can have some best guesses about based on what it knows about humans, and all the subtypes of human that you appear to be, and all the subtypes of those subtypes which you seem to be, and so on.  These are just guesses, and it would, at parts 4a and 6a, explain to people its best guesses about the full causal structure which leads to people's morality/shouldness-related speech. Then it gauges people's reactions, and updates its guesses (simplest moral theories) based on those reactions. And finally it requires an approval rating before acting, so if it definitely misinterprets human morality, it just loops back to the start of the process again, and its guesses will keep improving through each loop until its best guess at human morality reaches sufficient approval. The AI wouldn't know with certainty what humans want best, but it would make guesses which are better-educated than humans are capable of making.

This is the sequence post on it: https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message, it's quite a fun read (to me), and should explain why something smart that thinks at transistor speeds should be able to figure things out.

For inventing nanotechnology, the given example is AlphaFold 2.

For killing everyone in the same instant with nanotechnology, Eliezer often references Nanosystems by Eric Drexler. I haven't read it, but I expect the insight is something like "Engineered nanomachines could do a lot more than those limited by designs that... (read more)

In addition to the mentions in the post about Facebook AI being rather hostile to the AI safety issue in general, convincing them and top people at OpenAI and Deepmind might still not be enough. You need to prevent every company who talks to some venture capitalists and can convince them how profitable AGI could be. Hell, depending on how easy the solution ends up being, you might even have to prevent anyone with a 3080 and access to arXiv from putting something together in their home office.

This really is "uproot the entire AI research field" and not "tell Deepmind to cool it."

To start, it's possible to know facts with confidence, without all the relevant info. For example I can't fit all the multiplication tables into my head, and I haven't done the calculation, but I'm confident that 2143*1057 is greater than 2,000,000. 

Second, the line of argument runs like this: Most (a supermajority) possible futures are bad for humans. A system that does not explicitly share human values has arbitrary values. If such a system is highly capable, it will steer the future into an arbitrary state. As established, most arbitrary states are... (read more)

3AnthonyC
I'm not sure if I've ever seen this stated explicitly, but this is essentially a thermodynamic argument. So to me, arguing against "alignment is hard" feels a lot like arguing "But why can't this one be a perpetual motion machine of the second kind?" And the answer there is, "Ok fine, heat being spontaneously converted to work isn't literally physically impossible, but the degree to which it is super-exponentially unlikely is greater than our puny human minds can really comprehend, and this is true for almost any set of laws of physics that might exist in any universe that can be said to have laws of physics at all."

Question. Even after the invention of effective contraception, many humans continue to have children. This seems a reasonable approximation of something like "Evolution in humans partially survived." Is this somewhat analogous to "an [X] percent chance of killing less than a billion people", and if so, how has this observation changed your estimate of "disassembl[ing] literally everyone"? (i.e. from "roughly 1" to "I suppose less, but still roughly 1" or from "roughly 1" to "that's not relevant, still roughly 1"? Or something else.)

(To take a stab at it my... (read more)

If you only kept promises when you want to, they wouldn't be promises. Does your current self really think that feeling lazy is a good reason to break the promise? I kinda expect toy-you would feel bad about breaking this promise, which, even if they do it, suggests they didn't think it was a good idea.

If the gym was currently on fire, you'd probably feel more justified breaking the promise. But the promise is still broken. What's the difference in those two breaks, except that current you thinks "the gym is on fire" is a good reason, and "I'm feeling lazy... (read more)

Promises should be kept. It's not only a virtue, but useful for pre-commitment if you can keep your promises.

But, if you make a promise to someone, and later both of you decide it's a bad idea to keep the promise, you should be able to break it. If that someone is your past self, this negotiation is easy: If you think it's a good idea to break the promise, they would be convinced the same way you were. You've run that experiment.

So, you don't really have much obligation to your past self. If you want your future self to have obligation to you, you are aski... (read more)

1Austin Chen
I'm not sure it's as simple as that - I don't know that just because it's your past self, you get to make decisions on their behalf. Toy example: last week I promised myself I would go hit the gym. Today I woke up and am feeling lazy about it. My lazy current self thinks breaking the promise is a good idea, but does that mean he's justified in thinking that the past version of Austin would agree?

Space pirates can profit by buying shares in the prediction market that pay money if Ceres shifts to a pro-Earth stance and then invading Ceres.

Has this line got a typo, or am I misunderstanding? Don't the pirates profit by buying pro-Mars shares, then invading to make Ceres pro-Mars (because Ceres is already pro-Earth)?

Mars bought pro-Earth to make pro-Mars more profitable, in the hope that pirates would buy pro-Mars and then invade.

1lsusr
You are correct I have fixed the error. Thank you.

I doubt my ability to be entertaining, but perhaps I can be informative. The need for mathematical formulation is because, due to Goodhart's law, imperfect proxies break down. Mathematics is a tool which is rigorous enough to get us from "that sounds like a pretty good definition" (like "zero correlation" in the radio signals example), to "I've proven this is the definition" (like "zero mutual information"). 

The proof can get you from "I really hope this works" to "As long as this system satisfies the proof's assumptions, this will work", because the ... (read more)

I understand the point of your dialog, but I also feel like I could model someone saying "This Alignment Researcher is really being pedantic and getting caught in the weeds." (especially someone who wasn't sure why these questions should collapse into world models and correspondence.)

(After all, the Philosopher's question probably didn't depend on actual apples, and was just using an apple as a stand-in for something with positive utility. So, the inputs of the utility functions could easily be "apples" (where an apple is an object with 1 property, "owner"... (read more)

4KatWoods
I also wonder about this. If I'm understanding the post and comment right, it's that if you don't formulate it mathematically, it doesn't generalize robustly enough? And that to formulate something mathematically you need to be ridiculously precise/pedantic? Although this is probably wrong and I'm mostly invoking Cunningham's Law

In my post I wrote:

Am I correct after reading this that this post is heavily related to embedded agency? I may have misunderstood the general attitudes, but I thought of "future states" as "future to now" not "future to my action." It seems like you couldn't possibly create a thing that works on the last one, unless you intend it to set everything in motion and then terminate. In the embedded agency sequence, they point out that embedded agents don't have well defined i/o channels. One way is that "action" is not a well defined term, and is often not atomi... (read more)

I'm a little confused what it hopes to accomplish. I mean, to start I'm a little confused by your example of "preferences not about future states" (i.e. 'the pizza shop employee is running around frantically, and I am laughing' is a future state).

But to me, I'm not sure what the mixing of "paperclips" vs "humans remain in control" accomplishes. On the one hand, I think if you can specify "humans remain in control" safely, you've solved the alignment problem already. On another, I wouldn't want that to seize the future: There are potentially much better fut... (read more)

2Steven Byrnes
In my post I wrote: “To be more concrete, if I’m deciding between two possible courses of action, A and B, “preference over future states” would make the decision based on the state of the world after I finish the course of action—or more centrally, long after I finish the course of action. By contrast, “other kinds of preferences” would allow the decision to depend on anything, even including what happens during the course-of-action.” So “the humans will ultimately wind up in control” would be a preference-over-future-states, and this preference would allow (indeed encourage) the AGI to disempower and later re-empower humans. By contrast, “the humans will remain in control” is not a pure preference-over-future-states, and relatedly does not encourage the AGI to disempower and later re-empower humans. If we knew exactly what long-term future we wanted, and we knew how to build an AGI that definitely also wanted that exact same long-term future, then we should certainly do that, instead of making a corrigible AGI. Unfortunately, we don't know those things right now, so under the circumstances, knowing how to make a corrigible AGI would be a useful thing to know how to do. Also, this is not a hyper-specific corrigibility proposal; it's really a general AGI-motivation-sculpting proposal, applied to corrigibility. So even if you're totally opposed to corrigibility, you can still take an interest in the question of whether or not my proposal is fundamentally doomed. Because I think everyone agrees that AGI-motivation-sculpting is necessary. It could be a weighted average. It could be a weighted average plus a nonlinear acceptability threshold on “humans in control”. It could be other things. I don't know; this is one of many important open questions. See discussion under “Objection 1” in my post.

I don't know if there's much counterargument beyond "no, if you're building an ML system that helps you think longer about anything important, you already need to have solved the hard problem of searching through plan-space for actually helpful plans."

 

This is definitely a problem, but I would say human amplification further isn't a solution because humans aren't aligned.

I don't really have a good what human values are, even in an abstract English definition sense, but I'm pretty confident that "human values" are not, and are not easily transformable ... (read more)

Maybe I misunderstand your use of robust, but this still seems to me to be breadth. If an optima is broader, samples are more likely to fall within it. I took broad to mean "has a lot of (hyper)volume in the optimization space", and robust to mean "stable over time/perturbation". I still contend that those optimization processes are unaware of time, or any environmental variation, and can only select for it in so far as it is expressed as breadth.

The example I have in my head is that if you had an environment, and committed to changing some aspect of it af... (read more)

"Real search systems (like gradient descent or evolution) don’t find just any optima. They find optima which are [broad and robust]"

I understand why you think that broad is true. But I'm not sure I get robust. In fact, robust seems to make intuitive dis-sense to me. Your examples are gradient descent and evolution, neither of which have memory, so, how would they be able to know how "robust" an optima is? Part of me thinks that the idea comes from how, if a system optimized for a non-robust optima, it wouldn't internally be doing anything different, but we... (read more)

2johnswentworth
If we're just optimizing some function, then indeed breadth is the only relevant part. But for something like evolution or SGD, we're optimizing over random samples, and it's the use of many different random samples which I'd expect to select for robustness.

Can I try to parse out what you're saying about stacked sigmoids? Because it seems weird to me. Like, in that view, it still seems like showing a trendline is some evidence that it's not "interesting". I feel like this because I expect the asymptote of the AlphaGo sigmoid to be independent of MCTS bots, so surely you should see some trends where AlphaGo (or equivalent) was invented first, and jumped the trendline up really fast. So not seeing jumps should indicate that it is more a gradual progression, because otherwise, if they were independent, about hal... (read more)

So, I'm not sure if I'm further down the ladder and misunderstanding Richard, but I found this line of reasoning objectionable (maybe not the right word):

"Consider an AI that, given a hypothetical scenario, tells us what the best plan to achieve a certain goal in that scenario is. Of course it needs to do consequentialist reasoning to figure out how to achieve the goal. But that’s different from an AI which chooses what to say as a means of achieving its goals."

My initial (perhaps uncharitable) response is something like "Yeah, you could build a safe syste... (read more)

"Since my expectations sometimes conflict with my subsequent experiences, I need different names for the thingies that determine my experimental predictions and the thingy that determines my experimental results. I call the former thingies 'beliefs', and the latter thingy 'reality'."

I think this is a fine response to Mr. Carrico, but not to the post-modernists. They can still fall back to something like "Why are you drawing a line between 'predictions' and 'results'? Both are simply things in your head, and since you can't directly observe reality, your 'r... (read more)

In a Newcombless problem, where you can either have $1,000 or refuse it and have $1,000,000, you could argue that the rational choice is to take the $1,000,000, and then go back for the $1,000 when people's backs were turned, but it would seem to go against the nature of the problem.

In much the same way, if Omega is a perfect predictor, there is no possible world where you receive $1,000,000 and still end up going back for the second. Either Rachel wouldn't have objected, or the argument would've taken more than 5 minutes, and the boxes disappear, or somet... (read more)

In much the same way, estimates of value and calculations based on the number of permutations of atoms shouldn't be mixed together. There being a googleplex possible states in no way implies that any of them have a value over 3 (or any other number). It does not, by itself, imply that any particular state is better than any other. Let alone that any particular state should have value proportional to the total number of states possible.

Restricting yourself to atoms within 8000 light years, instead of the galaxy, just compounds the problem as well, but you n... (read more)

I still think the argument holds in this case, because even computer software isn't atom-less. It needs to be stored, or run, or something somewhere.

I don't doubt that you could drastically reduce the number of atoms required for many products today. For example, you could in future get a chip in your brain that makes typing without a keyboard possible. That chip is smaller than a keyboard, so represents lots of atoms saved. You could go further, and have that chip be an entire futuristic computer suite, by reading and writing your brain inputs and outputs... (read more)

2DanArmak
Please see my other reply here. Yes, value is finite, but the number of possible states of the universe is enormously large, and we won't explore it in 8000 years. The order of magnitude is much bigger. (Incidentally, our galaxy is ~ 100,000 light years across; so even expanding to cover it would take much longer than 8000 years, and that would be creating value the old-fashioned way by adding atoms, but it wouldn't support continued exponential growth. So "8000 years" and calculations based off the size of the galaxy shouldn't be mixed together. But the order-of-magnitude argument should work about as well for the matter within 8000 light-years of Earth.)

I think this is a useful post, but I don't think the water thing helped in understanding:

"In the Twin Earth, XYZ is "water" and H2O is not; in our Earth, H2O is "water" and XYZ is not."

This isn't an answer, this is the question. The question is "does the function, curried with Earth, return true for XYZ, && does the function, curried with Twin Earth, return true for H2O?"

Now, this is a silly philosophy question about the "true meaning" of water, and the real answer should be something like "If it's useful, then yes, otherwise, no." But I don't thin... (read more)

I feel like I might be being a little coy stating this, but I feel like "heterogeneous preferences" may not be as inadequate as it seems. At least, if you allow that those heterogeneous preferences are not only innate like taste preference for apples over oranges.

If I have a comparative advantage in making apples, I'm going to have a lot of apples, and value the marginal apple less than the marginal orange. I don't think this is a different kind of "preference" than liking the taste of oranges better: Both base out in me preferring an orange to an apple. A... (read more)

I'm glad to hear that the question of what hypotheses produce actionable behavior is on people's minds. 

I modeled Murphy as an actual agent, because I figured a hypothesis like "A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y" is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y.

I feel like I didn't quite grasp what you meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available ... (read more)

1Diffractor
You're completely right that hypotheses with unconstrained Murphy get ignored because you're doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your "-1,000,000 vs -999,999 is the same sort of problem as 0 vs 1" reasoning is good. Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the "inf" part of the EΨ[f]:=inf(m,b)∈Ψm(f)+b definition of expected value, and writing actual equations. Ψ is the available set of possibilities for a hypothesis. If you really want to, you can think of this as constraints on Murphy, and Murphy picking from available options, but it's highly encouraged to just work with the math. For mixing hypotheses (several different Ψi sets of possibilities) according to a prior distribution ζ∈ΔN, you can write it as an expectation functional via ψζ(f):=Ei∼ζ[ψi(f)] (mix the expectation functionals of the component hypotheses according to your prior on hypotheses), or as a set via Ψζ:={(m,b)|∃(mi,bi)∈Ψi:Ei∼ζ(mi,bi)=(m,b)} (the available possibilities for the mix of hypotheses are all of the form "pick a possibility from each hypothesis, mix them together according to your prior on hypotheses") This is what I meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked", that Ψζ set (your mixture of hypotheses according to a prior) corresponds to selecting one of the Ψi sets according to your prior ζ, and then Murphy picking freely from the set Ψi. Using ψζ(f):=Ei∼ζ[ψi(f)] (and considering our choice of what to do affecting the choice of f, we're trying to pick the best function f) we can see that if the prior is composed of a bunch of "do this sequence of actions or bad things happen" hypotheses, the details of what you do sensitively depend on the probability distribution over hypothese

I'm still confused. My biology knowledge is probably lacking, so maybe that's why, but I had a similar thought to dkirmani after reading this: "Why are children born young?" Given that sperm cells are active cells (which should give transposons opportunity to divide), why do they not produce children with larger transposon counts? I would expect whatever sperm divide from to have the same accumulation of transposons that causes problems in the divisions off stem cells. 

Unless piRNA and siRNA are 100% at their jobs, and nothing is explicitly removing t... (read more)

4ChristianKl
Increase of transposons is evolutionary disadvantageous so there's selection pressure against increased active transposon count and for reduced active transposon count. 

My understanding is that transposon repression mechanisms (like piRNAs) are dramatically upregulated in the germ line. They are already very close to 100% effective in most cells under normal conditions, and even more so in the germ line, so that most children do not have any more transposons than their parents.

(More generally, my understanding is that germ line cells have special stuff going to make sure that the genome is passed on with minimal errors. Non-germ cells are less "paranoid" about mutations.)

Once the rate is low enough, it's handled by natural selection, same as any other mutations.

A little late to the party, but

I'm confused about the minimax strategy.

The first thing I was confused about was what sorts of rules could constrain Murphy, based on my actions. For example, in a bit-string environment, the rule "every other bit is a 0" constrains Murphy (he can't reply with "111..."), but not based on my actions. It doesn't matter what bits I flip, Murphy can always just reply with the environment that is maximally bad, as long as it has 0s in every other bit. Another example would be if you have the rule "environment must be a valid chess... (read more)

2Diffractor
Maximin, actually. You're maximizing your worst-case result. It's probably worth mentioning that "Murphy" isn't an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it's just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we're playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the "Murphy" talk, it's "wouldn't it be good to use resources in order to guard against worst-case outcomes within the available set of possibilities?" and this is just ordinary risk aversion. For that equation argmaxπinfe∈BEπ⋅e[U], B can be any old set of probabilistic environments you want. You're not spending any resources or effort, a hypothesis just is a set of constraints/possibilities for what reality will do, a guess of the form "Murphy's operating under these constraints/must pick an option from this set." You're completely right that for constraints like "environment must be a valid chess board", that's too loose of a constraint to produce interesting behavior, because Murphy is always capable of screwing you there. This isn't too big of an issue in practice, because it's possible to mix together several infradistributions with a prior, which is like "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked". And as it turns out, you'll end up completely ignoring hypotheses where Murphy can screw you over no matter what you do. You'll choose your policy to do well in the hypotheses/scenarios where Murphy is more tightly constrained, and write the "you automatically lose" hypotheses off because it doesn't matter what you pick, you'll lose in those. But there is a big unstudied problem of "what sorts of hypotheses