List of Problems That Motivated UDT
I noticed that recently I wrote several comments of the form "UDT can be seen as a step towards solving X" and thought it might be a good idea to list in one place all of the problems that helped motivate UDT1 (not including problems that came up subsequent to that post).
- decision making for minds that can copy themselves
- Doomsday Argument
- Sleeping Beauty
- Absent-Minded Driver
- Presumptuous Philosopher
- anthropic reasoning for non-sentient AIs
- Simulation Argument
- indexical uncertainty in general
- wireheading/Cartesianism (how to formulate something like AIXI that cares about an external world instead of just its sensory inputs)
- How to make decisions if all possible worlds exist? (a la Tegmark or Schmidhuber, or just in the MWI)
- Quantum Immortality/Suicide
- Logical Uncertainty (how to formulate something like Godel machine that can make reasonable decisions involving P=NP)
- uncertainty about hypercomputation (how to avoid assuming we must be living in a computable universe)
- What are probabilities?
- What are decisions and what kind of consequences should be considered when making decisions?
- Newcomb's Problem
- Smoking Lesion
- Prisoner's Dilemma
- Counterfactual Mugging
- FAI
How can we ensure that a Friendly AI team will be sane enough?
One possible answer to the argument "attempting to build FAI based on Eliezer's ideas seems infeasible and increases the risk of UFAI without helping much to increase the probability of a good outcome, and therefore we should try to achieve a positive Singularity by other means" is that it's too early to decide this. Even if our best current estimate is that trying to build such an FAI increases risk, there is still a reasonable chance that this estimate will turn out to be wrong after further investigation. Therefore, the counter-argument goes, we ought to mount a serious investigation into the feasibility and safety of Eliezer's design (as well as other possible FAI approaches), before deciding to either move forward or give up.
(I've been given to understand that this is a standard belief within SI, except possibly for Eliezer, which makes me wonder why nobody gave this counter-argument in response to my post linked above. ETA: Carl Shulman did subsequently give me a version of this argument here.)
This answer makes sense to me, except for the concern that even seriously investigating the feasibility of FAI is risky, if the team doing so isn't fully rational. For example they may be overconfident about their abilities and thereby overestimate the feasibility and safety, or commit sunken cost fallacy once they have developed lots of FAI-relevant theory in the attempt to study feasibility, or become too attached to their status and identity as FAI researchers, or some team members may disagree with a consensus of "give up" and leave to form their own AGI teams and take the dangerous knowledge developed with them.
So the question comes down to, how rational is such an FAI feasibility team likely to be, and is that enough for the benefits to exceed the costs? I don't have a lot of good ideas about how to answer this, but the question seems really important to bring up. I'm hoping this post this will trigger SI people to tell us their thoughts, and maybe other LWers have ideas they can share.
Neuroimaging as alternative/supplement to cryonics?
Paul Christiano recently suggested that we can use neuroimaging to form a complete mathematical characterization of a human brain, which a sufficiently powerful superintelligence would be able to reconstruct into a working mind, and the neuroimaging part is already possible today, or close to being possible.
In fact, this project may be possible using existing resources. The complexity of the human brain is not as unapproachable as it may at first appear: though it may contain 1014 synapses, each described by many parameters, it can be specified much more compactly. A newborn’s brain can be specified by about 109 bits of genetic information, together with a recipe for a physical simulation of development. The human brain appears to form new long-term memories at a rate of 1-2 bits per second, suggesting that it may be possible to specify an adult brain using 109 additional bits of experiential information. This suggests that it may require only about 1010 bits of information to specify a human brain, which is at the limits of what can be reasonably collected by existing technology for functional neuroimaging.
Paul was using this idea as part of an FAI design proposal, but I'm highlighting it here since it seems to have independent value as an alternative or supplement to cryonics. That is, instead of (or in addition to) trying to get your body to be frozen and then preserved in liquid nitrogen after you die, you periodically take neuroimaging scans of your brain and save them to multiple backup locations (1010 bits is only about 1 gigabyte), in the hope that a friendly AI or posthuman will eventually use the scans to reconstruct your mind.
Are there any neuroimaging experts around who can tell us how feasible this really is, and how much such a scan might cost, now or in the near future?
ETA: Given the presence of thermal noise and the fact that a set of neuroimaging data may contain redundant or irrelevant information, 1010 bits ought to be regarded as just a rough lower bound on how much data needs to be collected and stored. Thanks to commenters who pointed this out.
Strong intutions. Weak arguments. What to do?
I thought Ben Goertzel made an interesting point at the end of his dialog with Luke Muehlhauser, about how the strengths of both sides' arguments do not match up with the strengths of their intuitions:
One thing I'm repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree -- for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences -- you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions -- but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that "superhuman AGI might plausibly kill everyone." The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a "wide range" of intelligences. Etc.
On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can't fully rationally substantiate this intuition either -- all I can really fully rationally argue for is something weaker like "It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly." In my case just like yours, reason is far weaker than intuition.
What do we do about this disagreement and other similar situations, both as bystanders (who may not have strong intuitions of their own) and as participants (who do)?
I guess what bystanders typically do (although not necessarily consciously) is evaluate how reliable each party's intuitions are likely to be, and then use that to form a probabilistic mixture of the two sides' positions.The information that go into such evaluations could include things like what cognitive processes likely came up with the intuitions, how many people hold each intuition and how accurate each individual's past intuitions were.
If this is the best we can do (at least in some situations), participants could help by providing more information that might be relevant to the reliability evaluations, and bystanders should pay more conscious attention to such information instead of focusing purely on each side's arguments. The participants could also pretend that they are just bystanders, for the purpose of making important decisions, and base their beliefs on "reliability-adjusted" intuitions instead of their raw intuitions.
Questions: Is this a good idea? Any other ideas about what to do when strong intuitions meet weak arguments?
Related Post: Kaj Sotala's Intuitive differences: when to agree to disagree, which is about a similar problem, but mainly from the participant's perspective instead of the bystander's.
How can we get more and better LW contrarians?
I'm worried that LW doesn't have enough good contrarians and skeptics, people who disagree with us or like to find fault in every idea they see, but do so in a way that is often right and can change our minds when they are. I fear that when contrarians/skeptics join us but aren't "good enough", we tend to drive them away instead of improving them.
For example, I know a couple of people who occasionally had interesting ideas that were contrary to the local LW consensus, but were (or appeared to be) too confident in their ideas, both good and bad. Both people ended up being repeatedly downvoted and left our community a few months after they arrived. This must have happened more often than I have noticed (partly evidenced by the large number of comments/posts now marked as written by [deleted], sometimes with whole threads written entirely by deleted accounts). I feel that this is a waste that we should try to prevent (or at least think about how we might). So here are some ideas:
- Try to "fix" them by telling them that they are overconfident and give them hints about how to get LW to take their ideas seriously. Unfortunately, from their perspective such advice must appear to come from someone who is themselves overconfident and wrong, so they're not likely to be very inclined to accept the advice.
- Create a separate section with different social norms, where people are not expected to maintain the "proper" level of confidence and niceness (on pain of being downvoted), and direct overconfident newcomers to it. Perhaps through no-holds-barred debate we can convince them that we're not as crazy and wrong as they thought, and then give them the above-mentioned advice and move them to the main sections.
- Give newcomers some sort of honeymoon period (marked by color-coding of their usernames or something like that), where we ignore their overconfidence and associated social transgressions (or just be extra nice and tolerant towards them), and take their ideas on their own merits. Maybe if they see us take their ideas seriously, that will cause them to reciprocate and take us more seriously when we point out that they may be wrong or overconfident.
OTOH, I don’t think group think is a big problem. Criticism by folks like Will Newsome, Vladimir Slepnev and especially Wei Dai is often upvoted. (I upvote almost every comment of Dai or Newsome if I don’t forget it. Dai makes always very good points and Newsome is often wrong but also hilariously funny or just brilliant and right.) Of course, folks like this Dymytry guy are often downvoted, but IMO with good reason.
Reframing the Problem of AI Progress
"Fascinating! You should definitely look into this. Fortunately, my own research has no chance of producing a super intelligent AGI, so I'll continue. Good luck son! The government should give you more money."
Stuart Armstrong paraphrasing a typical AI researcher
I forgot to mention in my last post why "AI risk" might be a bad phrase even to denote the problem of UFAI. It brings to mind analogies like physics catastrophes or astronomical disasters, and lets AI researchers think that their work is ok as long as they have little chance of immediately destroying Earth. But the real problem we face is how to build or become a superintelligence that shares our values, and given that this seems very difficult, any progress that doesn't contribute to the solution but brings forward the date by which we must solve it (or be stuck with something very suboptimal even if it doesn't kill us), is bad. The word "risk" connotes a small chance of something bad suddenly happening, but slow steady progress towards losing the future is just as worrisome.
The usual way of stating the problem also invites lots of debate that are largely beside the point (as far as determining how serious the problem is), like whether intelligence explosion is possible, or whether a superintelligence can have arbitrary goals, or how sure we are that a non-Friendly superintelligence will destroy human civilization. If someone wants to question the importance of facing this problem, they really instead need to argue that a superintelligence isn't possible (not even a modest one), or that the future will turn out to be close to the best possible just by everyone pushing forward their own research without any concern for the big picture, or perhaps that we really don't care very much about the far future and distant strangers and should pursue AI progress just for the immediate benefits.
(This is an expanded version of a previous comment.)
against "AI risk"
Why does SI/LW focus so much on AI-FOOM disaster, with apparently much less concern for things like
- bio/nano-tech disaster
- Malthusian upload scenario
- highly destructive war
- bad memes/philosophies spreading among humans or posthumans and overriding our values
- upload singleton ossifying into a suboptimal form compared to the kind of superintelligence that our universe could support
Why, for example, is lukeprog's strategy sequence titled "AI Risk and Opportunity", instead of "The Singularity, Risks and Opportunities"? Doesn't it seem strange to assume that both the risks and opportunities must be AI related, before the analysis even begins? Given our current state of knowledge, I don't see how we can make such conclusions with any confidence even after a thorough analysis.
SI/LW sometimes gives the impression of being a doomsday cult, and it would help if we didn't concentrate so much on a particular doomsday scenario. (Are there any doomsday cults that say "doom is probably coming, we're not sure how but here are some likely possibilities"?)
Modest Superintelligences
I'm skeptical about trying to build FAI, but not about trying to influence the Singularity in a positive direction. Some people may be skeptical even of the latter because they don't think the possibility of an intelligence explosion is a very likely one. I suggest that even if intelligence explosion turns out to be impossible, we can still reach a positive Singularity by building what I'll call "modest superintelligences", that is, superintelligent entities, capable of taking over the universe and preventing existential risks and Malthusian outcomes, whose construction does not require fast recursive self-improvement or other questionable assumptions about the nature of intelligence. This helps to establish a lower bound on the benefits of an organization that aims to strategically influence the outcome of the Singularity.
- MSI-1: 105 biologically cloned humans of von Neumann-level intelligence, highly educated and indoctrinated from birth to work collaboratively towards some goal, such as building MSI-2 (or equivalent)
- MSI-2: 1010 whole brain emulations of von Neumann, each running at ten times human speed, with WBE-enabled institutional controls that increase group coherence/rationality (or equivalent)
- MSI-3: 1020 copies of von Neumann WBE, each running at a thousand times human speed, with more advanced (to be invented) institutional controls and collaboration tools (or equivalent)
(To recall what the actual von Neumann, who we might call MSI-0, accomplished, open his Wikipedia page and scroll through the "known for" sidebar.)
Building a MSI-1 seems to require a total cost on the order of $100 billion (assuming $10 million for each clone), which is comparable to the Apollo project, and about 0.25% of the annual Gross World Product. (For further comparison, note that Apple has a market capitalization of $561 billion, and annual profit of $25 billion.) In exchange for that cost, any nation that undertakes the project has a reasonable chance of obtaining an insurmountable lead in whatever technologies end up driving the Singularity, and with that a large measure of control over its outcome. If no better strategic options come along, lobbying a government to build MSI-1 and/or influencing its design and aims seems to be the least that a Singularitarian organization could do.
A Problem About Bargaining and Logical Uncertainty
Suppose you wake up as a paperclip maximizer. Omega says "I calculated the millionth digit of pi, and it's odd. If it had been even, I would have made the universe capable of producing either 1020 paperclips or 1010 staples, and given control of it to a staples maximizer. But since it was odd, I made the universe capable of producing 1010 paperclips or 1020 staples, and gave you control." You double check Omega's pi computation and your internal calculator gives the same answer.
Then a staples maximizer comes to you and says, "You should give me control of the universe, because before you knew the millionth digit of pi, you would have wanted to pre-commit to a deal where each of us would give the other control of the universe, since that gives you 1/2 probability of 1020 paperclips instead of 1/2 probability of 1010 paperclips."
Is the staples maximizer right? If so, the general principle seems to be that we should act as if we had precommited to a deal we would have made in ignorance of logical facts we actually possess. But how far are we supposed to push this? What deal would you have made if you didn't know that the first digit of pi was odd, or if you didn't know that 1+1=2?
On the other hand, suppose the staples maximizer is wrong. Does that mean you also shouldn't agree to exchange control of the universe before you knew the millionth digit of pi?
To make this more relevant to real life, consider two humans negotiating over the goal system of an AI they're jointly building. They have a lot of ignorance about the relevant logical facts, like how smart/powerful the AI will turn out to be and how efficient it will be in implementing each of their goals. They could negotiate a solution now in the form of a weighted average of their utility functions, but the weights they choose now will likely turn out to be "wrong" in full view of the relevant logical facts (e.g., the actual shape of the utility-possibility frontier). Or they could program their utility functions into the AI separately, and let the AI determine the weights later using some formal bargaining solution when it has more knowledge about the relevant logical facts. Which is the right thing to do? Or should they follow the staples maximizer's reasoning and bargain under the pretense that they know even less than they actually do?
Other Related Posts: Counterfactual Mugging and Logical Uncertainty, If you don't know the name of the game, just tell me what I mean to you
Where do selfish values come from?
Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:
- have a perception-determined utility function (like AIXI)
- have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
- have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)
Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.
But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.
So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)