Are you entering into a sub function of the original x/y assessment here? As in if X is done, Y, but Y is a function in itself of assessing the optimal reward for X?
If it's still important to add a reward of Y (in addition to the personal value of having completed X), you probably need to substitute with something novel and maintain the understanding that it is a reward for X (even if not the originally scoped one).
Two things to say here:
(1) The view articulated in that answer, that the Second Law only applies to systems that are genuinely closed, would render the Law empirically useless. There are no systems of this sort, except for the entire universe. But we appeal to the Second Law all the time to account for the time-directedness of systems that aren't completely closed (such as ice melting in a glass of water, or gas spreading through a room). We're really working with an approximate sense of closure, one that allows us to describe reasonably insulated systems as closed (with the denotation of "reasonably" depending on context), even though technically they are exchanging some amount of energy with their environments. If we go by the standards in that post, then yes, no system we observe would be governed by the Second Law. But by the same token, the "system plus observer" supersystem wouldn't be governed by the Second Law either, since this supersystem isn't closed. So then I don't see the point of defending the Second Law by including the observer in the system.
(2) The "begging the question" charge I raised in my post is not merely hypothetical. Shalizi is genuinely skeptical of Landauer's principle, the claim that information erasure must have an entropic cost. So invoking Landauer's principle won't fly against him. I think the right response to the sort of problems he raises with the principle (best captured in the John Norton paper linked in his post) is a view of the sort I recommend above. I'd probably need to say a lot more to make this obvious, but I won't unless you're specifically interested.
This post highlights for me that we don't have a good understanding of what things like "more rational" and "more sane" mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don't think we can get it from studies of the general population. (We can locally define "more sane" as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of "more sane" that we're using in other contexts.)
Not that this answers your question, but there's a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.
Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
2) Can we write a version of this program that would reject at least some spurious proofs?
It's trivial to do at least some:
def A(P):
if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u
and P does not contain a proof step "A(P)=x" or "A(P)!=x" for any x:
return a
else:
do whatever
I don't think Eliezer is right when he says that Mach's principle (the way he interprets it) is widely accepted. It's true that the general theory of relativity is formulated so that there is no privileged coordinate frame. However, Mach's principle goes beyond this, saying that there is no privileged state of motion. On the usual interpretation of GR, this latter claim is false. Inertial motion can be distinguished from other states of motion, in a coordinate-independent way. Inertial worldlines are just the ones that follow geodesics.
Now Eliezer points out that by changing the space-time curvature, we can change inertial motion to non-inertial motion. This is true, but relativists don't usually treat curvature the way they treat coordinate frames. A coordinate frame is conventional, something that we apply to the universe for convenience. Space-time curvature, on the other hand, is out there. There is a genuine fact of the matter about the curvature of space-time. And it follows that there is genuine fact of the matter about which worldlines are inertial.
Maybe Eliezer is right that we should treat curvature as conventional, but this is not the way most relativists think of it. Also, it doesn't seem like a very compelling position. If curvature is conventional, then so is the space-time metric, which means so is geometry. This leads to a thorough-going Poincare-esque instrumentalism, which is a consistent world-view but one that I find unattractive. And knowing what Eliezer says about quantum mechanics, I suspect he would find it unattractive as well.
T'qnl zngr. Jrypbzr gb lbhe arj yvsr, jbexvat va gur engvbanyvgl zvarf. Juvyr lbh jrer fyrrcvat ba gur cynar, jr fpnaarq lbhe oenva naq znqr n pbcl. Gung pbcl jvyy or yvivat va Iveghny Nyvpr... gur ovt anabpbzchgre pbzcyrk lbh frr bire gurer... naq ur'yy or va punetr bs lbhe vagrenpgvbaf jvgu gur bhgfvqr jbeyq abj.
I actually like The Simple Truth but I don't feel that it makes a good introduction to the Sequences.
Same here, though I think it does depend on the readers background. People who strongly disbelieve in the concept of objective truth might find it helpful to have that taken care of before starting the sequences proper, but even then I'm not sure if the simple truth is the best way.
I've just skimmed Shalizi's paper, so I might be wrong, but it seems to me his argument can be summarized as follows:
If we suppose that entropy is a measure of subjective uncertainty, then it would only increase if the subject lost information about the state of the system as it evolves. If the dynamical laws governing the microscopic evolution of the system are information-preserving, then this loss of information can only come from the way in which the subject updates his/her beliefs about the system's state. But if the subject updates by simply conditionalizing on the system's new macroscopic state, then this cannot happen. Bayesian conditionalization can only add information; it cannot subtract information. So, generically, updating one's beliefs about the system by conditionalization will lead to decrease in uncertainty about the system and therefore a decrease in the system's entropy.
I don't think points (1) and (3) in Eliezer's comment are an adequate response to this argument. Point (1) says that when the observer measures the system in order to conditionalize, the entropy of the observer's memory registers increases, which I guess is supposed to compensate for the decrease in system entropy induced by measurement. But this is a non-response. When we do statistical mechanics, we are not usually interested in the entropy of the system plus the observer; we are just interested in the entropy of the system, and it is this entropy that is observed to increase. Also, the response seems to beg the question. On what grounds does Eliezer claim that measurement increases the entropy of the observer's memory? Couldn't Shalizi's argument just be re-applied at this level?
Eliezer's point 3 (as far as I can make sens of it) is that in a quantum universe, from a within-a-branch perspective, the system evolution will not be unitary (and therefore not information-preserving) because the system will have decohered. This is the same point jimrandomh makes here. This is fair enough, but I don't think the Bayesian should be happy attributing entropy increase solely to quantum world-splitting. Statistical mechanics originated with the assumption that the underlying laws are classical, and in the majority of applications this assumption is retained for computational convenience. If the Bayesian position amounts to a rejection of a majority of the work done in statistical mechanics, it seems a pretty big bullet to bite.
Eliezer's point 2 is ultimately where I think the action's at. We don't update statistical distributions simply by conditionalization. Every statistical mechanics text points out that there is a coarse-graining step. When we update our distribution, we coarse-grain over the fine details of the distribution, "smoothing" it out. It is this step that accounts for entropy increase. Now Shalizi's response is that if you are a Bayesian then adding this non-Bayesian step is epistemically incoherent. One way to respond to this is as Eliezer does: Yup, none of us are perfect Bayesians. We are not even close to logically omniscient, so we are doomed to incoherence.
I think there's another response, which is that the best way to think about the probability distributions in statistical mechanics is not as accurate representations of our degrees of belief. The distributions are constructed to remove distinctions between microscopic states that are irrelevant to our macroscopic interactions with the system. Suppose I pour a blob of milk into a cup of coffee on the right side of the cup and then stir. Eventually the milk will be completely mixed with the coffee. If I had poured the blob on the left side of the cup, the milk would also eventually have ended up in a mixed state. Now, technically, my state of knowledge about the microstate of the mixed cup is different in these two cases. In the first case I know that the microstate must be one that evolves from the milk being poured on the right. In the second case I know it must be one that evolves from the milk being poured on the left. If the dynamics of the cup are information-preserving, then these are disjoint subsets of phase space. If I was updating as a Bayesian, the distributions would be totally different from one another.
But the thing is, the original position of the blob of milk makes no difference to my practical ability to interact with the milk and coffee system now that the milk is mixed. I might remember this original position, but I cannot now use that information to extract work from the system. My causal capacities are not sufficiently fine-grained to allow me to do that. So the information is irrelevant to how I now treat the system, from a thermodynamic point of view. To conserve computational resources, I might as well pick a distribution that ignores this information. That distribution will not be the distribution that best represents my knowledge of the system, but it will be the distribution that most effectively allows me to plan interactions with the system.
So I guess ultimately I agree with Shalizi. Thinking of thermodynamic entropy as the same thing as subjective uncertainty is wrong. This doesn't mean it doesn't have a lot to do with subjective uncertainty, though, since our uncertainty about systems is a very important constraint on our ability to interact with them.
1 KB seems very optimistic. Uniquely identifying each neuron would require the log of the number of neurons in the brain, or 36 bits. Figuring five thousand connections per neuron, that's 36 * 5000 to store which synapse goes where, and (64 + 36) * 5000 to store which synapse goes where, plus the signal intensity and metadata. In short, it'd actually be more like 500 KB per neuron, or 50,000 TB.
Granted that's before compression, but still.
I've wondered about this as well, since wrote an essay on reinforcement schedules inherent in pinball games. I use the pomodoro technique, which fancy as it sounds, is just a timer that lets you check email and blogs after doing 'work' for 20/25 minutes. If you use a bit of software to manage it, it provides its own reinforcement to continue. ("you have completed x number of chunkc of work today/this week...").
The amazing thing is that this is a scientifically productive rule—finding a new representation that gets rid of epiphenomenal distinctions, often means a substantially different theory of physics with experimental consequences!
This "Anti-Epiphenomenal Physics" is well known by a less fancy name of symmetry. Looking for hidden symmetries (and applying the Noether theorem to them, whenever possible) is the basic tool theorists use all the time. If anything, they go farther than that by reconstructing broken symmetries or even designing theories with hard-to-imagine symmetries.
I don't know if the intention here is to debate other people's choices, but: my wife started The Simple Truth because it was the first sequence post on the list and quickly became frustrated and annoyed that it didn't seem to lead anywhere and seemed to be composed of "in jokes." She didn't try to read further into the Sequences because of the bad impression she got off this article, which is an unusually weird, long, rambling, quirky article.
I actually like The Simple Truth but I don't feel that it makes a good introduction to the Sequences. But hey, this is just one data point.
The capricious-psi literature actually includes several proposed mechanisms which could lead to "anti-inductive" psi. Some of these mechanisms are amenable to mitigation strategies (such as not trying to use psi effects for material advantage, and keeping one's experiments confidential); others are not.
I agree with everything in this article, except for one thing where I am undecided:
Furthermore, I agree with every essay I've ever read by Yvain, I use "believe whatever gwern believes" as a heuristic/algorithm for generating true beliefs, and don't disagree with anything I've ever seen written by Vladimir Nesov, Kaj Sotala, Luke Muelhauser, komponisto, or even Wei Dai; policy debates should not appear one-sided, so it's good that they don't.
But that is only because I have yet to read anything big not by EY on this site.
I think the sequences are amazing, entertaining, helpful and accurate. The only downside is that EY's writing style can seem condescending to some.
If I intended to encode my beliefs (which I don't), I couldn't, because I don't:
- know what's the precise difference between 0 and 1
- understand 2 - what's total reductionism, especially in contrast to ordinary reductionism
- see any novel insight in 9, which leads me to suspect I am missing the point
Hi,
You are being invited to participate in a research study conducted by the University of Pennsylvania. Your participation is voluntary which means you can choose whether or not you want to participate.
The Laboratory of Cognition and Neural Stimulation at the University of Pennsylvania is involved in research using transcranial direct current stimulation (tDCS). In recent years this technology has increased in popularity, and evidence suggests that some individuals may be constructing their own stimulators for personal use. We are interested in examining the reasons behind this. Please answer the questions below, and email them to braintdcs@gmail.com to give us insight into why people make their own tDCS machines.
Questions
- Where did you first learn about tDCS?
- Have you built your own tDCS machine?
- Where did you get the information to build the machine?
- Why did you want to try brain stimulation?
- How long have you been using tDCS?
- What were your experiences with this technology?
- Did you ever experience any side-effects?
The research team may use information about you collected from your responses. By completing the questionnaire, you are giving your consent to participate in this study. Once you email us, your responses are not considered confidential since emails do not protect confidentiality.
Thanks,
Research Specialist Laboratory of Cognition and Neural Stimulation University of Pennsylvania
This sounds more like a tool AI! I thought that agent AIs generally had more persistent utility measures - this looks like the sort of thing where the AI has NO utility maximizing behavior until a problem is presented, then temporarily instantiates a problem-specific utility function (like the above).
Assume there is an Agent AI that has a goal of solving math problems. It gets as input a set of axioms and a target statement, and wants to output a proof or disproof of the statement (or maybe a proof of undecidability) as fast as possible. It runs in some idealized computing environment and knows its own source code. It also has access to a similar idealized virtual computing environment where it can design and run any programs. After solving a problem, it is restored to its initial state.
Then:
(1) It has (apparently) sufficient ingredients for FOOM-ing: complex problem to solve and self-modification.
(2) It is safe, because its outside-world-related knowledge is limited to a set of axioms, a target statement, a description of an ideal computing environment, and its own source code. Even AIXI would not be able to usefully extrapolate the real world from that - there would be lots of wildly different equiprobable worlds, where these things would exist. And since the system is restored to the initial state after each run, there is no possibility of its collecting and gathering more knowledge in between runs.
(3) It does not require solutions to metaethics, or symbol grounding. The problem statement and utility function are well-defined and can be stated precisely, right now. All it needs to work is understanding of intelligence.
This would be a provably safe "Tool AGI": Math Oracle. It is an obvious thing, but I don't see it discussed, not sure why. Was it already dismissed for some reasons?
Sounds like you've got the "things from the stars" story flipped - in that parable, we (or our more-intelligent doppelgangers) are the AI, being simulated in some computer by weird 5-dimensional aliens. The point of the story is that high processing speed and power relative to whoever's outside the computer is a ridiculously great advantage.
Yeah, I think the idea behind keeping the transcripts unavailable is to force an outside view - "these people thought they wouldn't be convinced, and they were" rather than "but I wouldn't be convinced by that argument". Though possibly there are other, shadier reasons! As for the encryption metaphor, I guess in this case the encryption is known (people) but the attack is unknown - and in fact whatever attack would actually be used by an AI would be different and better, so we don't really get a chance to prepare to defend against it.
And yep, that's another standard objection - we can't just make safely constrained AIs, because someone else will make an unconstrained AI, therefore the most important problem to work on is how to make a safe and unconstrained AI before we die horribly.
Hm. This is an intriguing point. I thought by "maximize the actual outcome according to its own criteria of optimality" you meant U, which is my understanding of what an Oracle would do, but instead you meant it would produce plans so as to maximize P, rather than producing plans that would maximize P if implemented, is that about right?
I guess you'd have to produce some list of plans such that each would produce high value for P if selected (which includes an expectation that they would be successfully implemented if selected), given that they appear on the list and all the other plans do as well... you wouldn't necessarily have to worry about other influences the plan list might have, would you?
Perhaps if we had a more concrete example:
Suppose we ask the AI to advise us on building a sturdy bridge over some river (valuing both sturdiness and bridgeness, probably other things like speed of building, etc.). Stuart_Armstrong's version would select a list of plans such that given that the operators will view that list, if they select one of the plans, then the AI predicts that they will successfully build a sturdy bridge (or that a sturdy bridge will otherwise come into being). I admit I find the subject a little confusing, but does that sound about right?
I don't understand the disagreement with splitting the reputation. For example, a really trivially easy way to do it would be like this: On every post, have a thumbs-up/thumbs-down vote button that is specific just to that post, and then have a separate thumbs-up/thumbs-down button that appears next to the name of the user who made the post.
If you just dislike that particular post because it is off-topic, but you think the poster had the intention that it was on-topic (you just dispute that they were correct in their intention), then just downvote the question and not the user. Then the user voting is a signal of an individual's favor in the community and the post voting is a signal of the community's preferences for topical content.
I'm not advocating that we go through the trouble of doing it that way, but it would be an easy way to decouple the second order effect by which a user can feel personally discouraged if a post he or she thought was relevant and interesting is not seen that way by others. Their reputation as a contributing member may remain unchanged; but that particular post is signaled as uninteresting/noisy.
I would like a FAQ that functions much the way the guidelines function at the Stack Exchange websites. Without any guidelines, downvotes are chaotic and lose meaning. If a typical user doesn't like a post, but the reason for dislike is not covered by the FAQ, they can still write a comment, or make a post in one of the Stack Exchange meta sites (to argue constructively for getting their preference category into the FAQ/guidelines). These signal the information and successfully decouple it from what the community says it wants in the FAQ.
The linked paper explicitly assumes that
The evolution operator T is invertible.
But if you use QM in the conventional way, then this assumption doesn't hold. Suppose you have a state X1 which can evolve into either X2 or X3 with equal probability. You would say that state X1 evolves into the weighted set [1/2 X2 + 1/2 X3]. Shalizi proves that this set has no more entropy than X1 did.
But we, as observers or as part of that system, only get to look at one of the branches, either X2 or X3. Picking which of those two branches we get to look at adds one bit of new entropy, and this selection is not invertible. This is where the increase in entropy with time comes from. What Shalizi has done, is to use math in which all entropy originates in quantum branching, then forget that quantum branching happens.
On first blush that seems to be a semantic argument. It doesn't seem you actually disagree with EY, but rather you seem to object to the use of the Physics and put in its place "Physical law" and put "mathematical objects" in place of "mathematics."
Is this an accurate description of what you are trying to say?
Hello, fellow minicampers, this is Ethan! Hello to everyone else too :)
Monday night a few of us went blues dancing, and rather than being all awkward like I've done in the past, I used Critch's smile association method and ended up really enjoying myself!
And I spent the 14-16 hour drive from San Francisco back to Tucson with excellent posture (based on Luke and Cat's recommendation that it made me look fantastic), smiling and thinking something like "Yeah, I'm a badass," every time I thought of my posture to make a positive association with posture and with self-modification.
Just started using remember the milk, and I made a list of priorities / medium and short term goals using freemind.
Thanks for pointers into what is a large and complex subject. I'm not remotely worried about things coming in from the stars. As for letting the AI out of the jar, I'm a bit perplexed. The transcripts are not available for review? If not, what seems relevant is the idea that an ideal encryption system has to be public so the very smartest people can try to poke holes in it. Of course, the political will to keep an AI in the box may be lacking -- if you don't let it out, someone else will let another one out somewhere else. Seems related to commercial release of genetically modified plants, which in some cases may have been imprudent.
Yeah, I think I agree with everything here as far as it goes, though I haven't looked at it carefully. I'm not sure originality is as crisp a concept as you want it to be, but I can imagine us both coming up with a list of propositions that we believe captures everything in the Sequences that some reasonable person somewhere might conceivably disagree with, weighted by how reasonable we think a person could be and still disagree with that proposition, and that we'd end up with very similar lists (perhaps with fairly different weights). .
I agree that the signal being sent is coarse-grained.
I agree that finegraining it is a lovely thing for people to do if they can do it in a way that's low-cost to everyone else.
I disagree with your implicit separation of signalling community (dis)approval on the one hand, and reputation costs on the other. The reputation in this case is precisely a function of community (dis)approval; I don't see how you can sensibly separating them. If I endorse explicitly signalling community (dis)approval at all (which I do), I can't help but endorse explicitly raising/lowering reputation.
My only concern with the FAQ approach is the question of what an individual voter whose reasons for (dis)approval don't align with the FAQ ought to do. If I'm understanding you, your idea is that the FAQ trumps the actual preferences of people in the community -- that is, I'm expected to vote in accordance with the FAQ rather than my own preferences. That makes the existence and contents of the FAQ an implicit power structure, and such things are best approached with caution.
That said, I don't object to it if so approached.
One thing I am not clear about is whether you are saying that a tool AI spontaneously develops what appears like intentionality or not. It sure seems that that is what you are saying, initially with a human in the feedback loop, until the suggestion to "create an AI with these motivations" is implemented. If so, then why are you saying that "there's some daylight between superintelligent tools and agents"?
It just cares about correctly reporting the plans that give the highest values for P.
This is what I meant by "not running a consequentialist algorithm": what matters here is the way in which P depends on a plan.
If P is saying something about how human operators would respond to observing the plan, it introduces a consequentialist aspect into AI's optimization criteria: it starts to matter what are the consequences of producing a plan, its value depends on the effect produced by choosing it. On the other hand, if P doesn't say things like that, it might be the case that the value of a plan is not being evaluated consequentialistically, but that might make it more difficult to specify what constitutes a good plan, since plan's (expected) consequences give a natural (basis for a) metric of its quality.
What's good practice for scientific papers (in terms of remaining dispassionate) is probably good practice in general.
In terms of epistemic rationality, you can get by fine by raising only points of disagreement and keeping it implicit that you accept everything you do not dispute. But in terms of creating effective group cooperation, which has instrumental value, this strategy performs poorly.
Well, my understanding is that when a Tool AI makes a list of the best plans according to P, and an Oracle AI chooses an output maximizing U, the Oracle cares about something other than "giving the right answer to this question" - it cares about "answering questions" in general, or whatever, something that gives it a motive to manipulate things outside of the realm of the particular question under consideration.
The "external" distinction is that the Oracle potentially gets utility from something persistent and external to the question. Basically, it's an explicit utility maximizer, and that causes problems. This is just my understanding of the arguments, though, I'm not sure whether the distinction is coherent in the final working!
Edit: And in fact, a Tool isn't trying to produce output that maximizes P! It doesn't care about that. It just cares about correctly reporting the plans that give the highest values for P.
Tools come up with plans to maximize some utility measure P, but they don't actually have any external criteria of optimality.
What's the distinction between "external" optimality criteria and the kind that describes the way Tool AIs choose their output among all possible outputs? (A possible response is that Tool AIs are not themselves running a consequentialist algorithm, which would make it harder to stipulate the nature of their optimization power.)
I think there's a distinction between Oracle and Tool AI - that Oracles are taken to be utility maximizers with some persistent utility function having something to do with giving good advice, and Tools are not. In this formulation, Tools come up with plans to maximize some utility measure P, but they don't actually have any external criteria of optimality.
I suppose they could still give useless responses like "hit me with a hammer right here so I think P is maximized, trust me guys it'll be great", but, well, this problem is not necessarily insuperable (as many humans reject wireheading, at least given that it is not available).
Suppose you separate the Sequences into "original" and "unoriginal".
The "unoriginal" segment is very likely to be true: agreeing with all of it is fairly straightforward, and disagreeing with all of it is ridiculously extreme.
To a first approximation, we can say that the middle-ground stance on any given point in the "original" statement is uncertainty. That is, accepting that point and rejecting it are equally extreme. If we use the general population for reference, of course, that is nowhere near correct: even considering the possibility that cryonics might work is a fairly extreme stance, for instance.
But taking the approximation at face value tells us that agreeing with every "original" claim, and disagreeing with every "original" claim, are equally extreme positions. If we now add the further stipulation that both positions agree with every "unoriginal" claim, they both move slightly toward the Sequences, but not by much.
So actually (1) "I agree with everything in the sequences" and (2) "Everything true in the Sequences is unoriginal, everything original in them is false" are roughly equally extreme. If anything, we have made an error in favor of (1). On the other hand, (3) "Everything in the Sequences ever is false" is much more extreme because it also rejects the "unoriginal" claims, each of which is almost certainly true.
P.S. If you are like me, you are wondering about what "extreme" means now. To be extremely technical (ha) I am interpreting it as measuring the probability of a position re: Sequences that you expect a reasonable, boundedly-rational person to have. For instance, a post that says "Confirmation bias is a thing" is un-controversial, and you expect that reasonable people will believe it with probability close to 1. A post that says "MWI is obviously true" is controversial, and if you are generous you will say that there is a probability of 0.5 that someone will agree with it. This might be higher or lower for other posts in the "original" category but on the whole the approximation of 0.5 is probably favorable to the person that agrees with everything.
So when I conclude that (1) and (2) are roughly equally extreme, I am saying that a "reasonable person" is roughly equally likely to end up at either one of them. This is an approximation, of course, but they are certainly both closer to each other than they are to (3).
If Harry's theory is right, squibs can't be normal genetic descendants (mutation not withstanding) of wizards, but adultery is a very real, very common thing. Cannon does not rule out the possibility, though given that the books were meant to be accessible to children it's not surprising that Rowling doesn't go into detail on the matter.
Nice, this post stipulates a meaningful definition of "Tool AI": essentially an Oracle AI tasked with proposing plans of action. An important optimality property of a plan is receptiveness of human operators to that plan, as proposed by Tool AI, since the consequences of producing a plan are dominated by the judgment of human operators upon receiving it.
On one hand, this might drive Tool AI to create misleading/deceptive seductive plans to maximize the actual outcome according to its own criteria of optimality, which are probably unsatisfactory from human perspective (hence the usefulness of deception from AI's perspective). On the other hand, taking into account human receptiveness to its plans might make them more reasonable, so that "kill all humans" won't actually be produced as a result, because human operators won't accept it, and so it won't be an effective plan.
These properties seem to characterize Oracle AIs in general, but the "plan" intended interpretation of AI's output makes it easier to place in correspondence human operators' judgment with AI's estimate of output's appropriateness/effectiveness. For example, it's harder to establish a similar property for Predictor AIs where the intended interpretation of their output is only indirectly related to human judgment of its quality (and the setup is not optimized for the possibility of drawing such judgment).
I believe the standard objections are that it's far more intelligent and quick-of-thought than us, so: it can beat your firewalls; it's ludicrously persuasive; it can outwit us with advice that subtly serves its ends; it could invent "basilisks" like the world's funniest joke; and even if we left it alone on a mainframe with no remote access and no input or output, it could work out how to escape and/or kill us with clever use of cooling fans or something.
Here's an example of why Eliezer suggests that you be much more paranoid.
I haven't read much in the super-intelligent AI realm, but perhaps a relatively naive observer has some positive value. If we get to the point of producing AI that seems remotely super-intelligent, we'll stick firewalls around it. I don't think the suggested actions of a super-intelligent AI will be harmful in an incomprehensible way. An exception would if it created something like the world's funniest joke. The problem with HAL was that they gave him control of spacecraft functions. I say we don't give 'hands' to the big brains, and we don't give big brains to the hands, and then I won't lose much sleep.
Sorry, I should have been clearer; I think it is disingenuous to downvote for that reason if there is no FAQ or guideline expressing the current instantiation of the community's preferences for what is on- or off-topic.
I am all for using a numerical mechanism like voting to aggregate information, but it does come with a reputation cost for people who make the effort to create a post or pass on a link. So it does more than just signal what is liked or disliked; it also has a personal element that may discourage people from trying. If the expectations are clearly spelled out, then voting/downvoting is fine and can be interpreted against that informative backdrop.
Also, this sort of voting allows us to aggregate a coarse "yes" or "no" kind of preference about a post, but I think it would be pretty difficult to impute nuanced preferences, such as classifying topics and sub-topics as on-topic or off-topic, just by aggregating these votes. There's no clear delineation of the "why" behind the vote, and that metadata is more important for understanding the squiggly, discontinuous boundary between "on-topic" and "off-topic". Without the "why" metadata, we're getting a workable, but very coarse, low-resolution, smoothed boundary between the two. I advocate that a FAQ or guidelines for downvoting is a low-cost method to raise the resolution of that boundary.
I just read that essay and I disagree with it. Stating one's points of disagreement amounts to giving the diffs between your mind and that of an author. What's good practice for scientific papers (in terms of remaining dispassionate) is probably good practice in general. The way to solve the cooperation problem is not to cancel out professing disagreement with professing agreement, it's to track group members' beliefs (e.g. by polling them) and act as a group on whatever the group consensus happens to be. In other words, teach people the value of majoritarianism and its ilk and tell them to use this outside view when making decisions.
Let me make one.
Suppose you are reading your favorite blogs, when the idea strikes you, "Okay, I need to do X, but I can't do it without an incentive. I shall order chicken wings, which are delicious, upon X's completion."
Dozens of minutes later, X is finished! But wait! You fell victim to the planning fallacy! Everywhere in the city that delivers chicken wings is closed now because X took longer than you thought it would.
In this case, it would be fairly senseless to wait until the next day to order the wings, as by then the reward would be completely disconnected from the action. Driving 35 minutes to get them would also be pretty senseless. I don't know about driving 15 minutes.
This seems like a fairly difficult problem, but also one that simply will not occur very often, especially if you make your incentive something that's unlikely to be difficult to obtain by the time you finish X.
The physicality of the reconstruction alone would require impossibly good casting. A leg that's actually injured, a body with epigenetics consistent with the inferred diets of ancient Egyptian monarchs, food preferences consistent with dental wear marks... and so on. That leaves aside finding someone with the right physicality who can even act, and learn Egyptian, and so on and so forth. Which still leaves aside that there would be a lot in the actor that was not in Tut. Jim Carrey was not Andy Kaufman.
View more: Next



Subscribe to RSS Feed