The case for value learning
This post is mainly fumbling around trying to define a reasonable research direction for contributing to FAI research. I've found that laying out what success looks like in the greatest possible detail is a personal motivational necessity. Criticism is strongly encouraged.
The power and intelligence of machines has been gradually and consistently increasing over time, it seems likely that at some point machine intelligence will surpass the power and intelligence of humans. Before that point occurs, it is important that humanity manages to direct these powerful optimizers towards a target that humans find desirable.
This is difficult because humans as a general rule have a fairly fuzzy conception of their own values, and it seems unlikely that the millennia of argument surrounding what precisely constitutes eudaimonia are going to be satisfactorily wrapped up before the machines get smart. The most obvious solution is to try to leverage some of the novel intelligence of the machines to help resolve the issue before it is too late.
Lots of people regard using a machine to help you understand human values as a chicken and egg problem. They think that a machine capable of helping us understand what humans value must also necessarily be smart enough to do AI programming, manipulate humans, and generally take over the world. I am not sure that I fully understand why people believe this.
Part of it seems to be inherent in the idea of AGI, or an artificial general intelligence. There seems to be the belief that once an AI crosses a certain threshold of smarts, it will be capable of understanding literally everything. I have even heard people describe certain problems as "AI-complete", making an explicit comparison to ideas like Turing-completeness. If a Turing machine is a universal computer, why wouldn't there also be a universal intelligence?
To address the question of universality, we need to make a distinction between intelligence and problem solving ability. Problem solving ability is typically described as a function of both intelligence and resources, and just throwing resources at a problem seems to be capable of compensating for a lot of cleverness. But if problem-solving ability is tied to resources, then intelligent agents are in some respects very different from Turing machines, since Turing machines are all explicitly operating with an infinite amount of tape. Many of the existential risk scenarios revolve around the idea of the intelligence explosion, when an AI starts to do things that increase the intelligence of the AI so quickly that these resource restrictions become irrelevant. This is conceptually clean, in the same way that Turing machines are, but navigating these hard take-off scenarios well implies getting things absolutely right the first time, which seems like a less than ideal project requirement.
If an AI that knows a lot about AI results in an intelligence explosion, but we also want an AI that's smart enough to understand human values, is it possible to create an AI that can understand human values, but not AI programming? In principle it seems like this should be possible. Resources useful for understanding human values don't necessarily translate into resources useful for understanding AI programming. The history of AI development is full of tasks that were supposed to be solvable only by a machine smart enough to possess general intelligence, where significant progress was made in understanding and pre-digesting the task, allowing problems in the domain to be solved by much less intelligent AIs.
If this is possible, then the best route forward is focusing on value learning. The path to victory is working on building limited AI systems that are capable of learning and understanding human values, and then disseminating that information. This effectively softens the AI take-off curve in the most useful possible way, and allows us to practice building AI with human values before handing them too much power to control. Even if AI research is comparatively easy compared to the complexity of human values, a specialist AI might find thinking about human values easier than reprogramming itself, in the same way that humans find complicated visual/verbal tasks much easier than much simpler tasks like arithmetic. The human intelligence learning algorithm is trained on visual object recognition and verbal memory tasks, and it uses those tools to perform addition. A similarly specialized AI might be capable of rapidly understanding human values, but find AI programming as difficult as humans find determining whether 1007 is prime. As an additional incentive, value learning has an enormous potential for improving human rationality and the effectiveness of human institutions even without the creation of a superintelligence. A system that helped people better understand the mapping between values and actions would be a potent weapon in the struggle with Moloch.
Building a relatively unintelligent AI and giving it lots of human values resources to help it solve the human values problem seems like a reasonable course of action, if it's possible. There are some difficulties with this approach. One of these difficulties is that after a certain point, no amount of additional resources compensates for a lack of intelligence. A simple reflex agent like a thermostat doesn't learn from data and throwing resources at it won't improve its performance. To some extent you can make up for intelligence with data, but only to some extent. An AI capable of learning human values is going to be capable of learning lots of other things. It's going to need to build models of the world, and it's going to have to have internal feedback mechanisms to correct and refine those models.
If the plan is to create an AI and primarily feed it data on how to understand human values, and not feed it data on how to do AI programming and self-modify, that plan is complicated by the fact that inasmuch as the AI is capable of self-observation, it has access to sophisticated AI programming. I'm not clear on how much this access really means. My own introspection hasn't allowed me anything like hardware level access to my brain. While it seems possible to create an AI that can refactor its own code or create successors, it isn't obvious that AIs created for other purposes will have this ability on accident.
This discussion focuses on intelligence amplification as the example path to superintelligence, but other paths do exist. An AI with a sophisticated enough world model, even if somehow prevented from understanding AI, could still potentially increase its own power to threatening levels. Value learning is only the optimal way forward if human values are emergent, if they can be understood without a molecular level model of humans and the human environment. If the only way to understand human values is with physics, then human values isn't a meaningful category of knowledge with its own structure, and there is no way to create a machine that is capable of understanding human values, but not capable of taking over the world.
In the fairy tale version of this story, a research community focused on value learning manages to use specialized learning software to make the human value program portable, instead of only running on human hardware. Having a large number of humans involved in the process helps us avoid lots of potential pitfalls, especially the research overfitting to the values of the researchers via the typical mind fallacy. Partially automating introspection helps raise the sanity waterline. Humans practice coding the human value program, in whole or in part, into different automated systems. Once we're comfortable that our self-driving cars have a good grasp on the trolley problem, we use that experience to safely pursue higher risk research on recursive systems likely to start an intelligence explosion. FAI gets created and everyone lives happily ever after.
Whether value learning is worth focusing on seems to depend on the likelihood of the following claims. Please share your probability estimates (and explanations) with me because I need data points that originated outside of my own head.
- There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming. [poll:probability]
- Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.[poll:probability]
- Humans are capable of pre-digesting parts of the human values problem domain. [poll:probability]
- Successful techniques for value discovery of non-humans, (e.g. artificial agents, non-human animals, human institutions) would meaningfully translate into tools for learning human values. [poll:probability]
- Value learning isn't adequately being researched by commercial interests who want to use it to sell you things. [poll:probability]
- Practice teaching non-superintelligent machines to respect human values will improve our ability to specify a Friendly utility function for any potential superintelligence.[poll:probability]
- Something other than AI will cause human extinction sometime in the next 100 years.[poll:probability]
- All other things being equal, an additional researcher working on value learning is more valuable than one working on corrigibility, Vingean reflection, or some other portion of the FAI problem. [poll:probability]
Could a digital intelligence be bad at math?
One of the enduring traits that I see in most characterizations of artificial intelligences is the idea that an AI would have all of the skills that computers have. It's often taken for granted that a general artificial intelligence would be able to perfectly recall information, instantly multiply and divide 5 digit numbers, and handily defeat Gary Kasparov at chess. For whatever reason, the capabilities of a digital intelligence are always seen as encompassing the entire current skill set of digital machines.
But this belief is profoundly strange. Consider how much humans struggle to learn arithmetic. Basic arithmetic is really simple. You can build a bare bones electronic calculator/arithmetic logic unit on a breadboard in a weekend. Yet humans commonly spend years learning how to perform those same simple operations. And the mental arithmetic equipment humans assemble at the end of this is still relatively terrible: slow, labor intensive, and prone to frequent mistakes.
It is not totally clear why humans are this bad at math. It is almost certainly unrelated to brains computing using neurons instead of transistors. Based on personal experience and a cursory literature review, counting seems to rely primarily on identifying repeated structures in a linked list, and seems to be stored as verbal memory. When we first learn the most basic arithmetic we rely on visual pattern matching, and as we do more math basic math operations get stored in a look-up table in verbal memory. This is an absolutely bonkers way to implement arithmetic.
While humans may be generally intelligent, that general intelligence seems to be accomplished using some fairly inelegant kludges. We seem to have a preferred framework for understanding built on our visual and verbal systems, and we tend to shoehorn everything else into that framework. But there's nothing uniquely human about that problem. It seems to be characteristic of learning algorithms in general, and so if our artificial learner started off by learning skills unrelated to math, it might learn arithmetic via a similarly convoluted process. While current digital machines do arithmetic via a very efficient process, a digital mind that has to learn those patterns may arrive at a solution as slow and convoluted as the one humans rely on.
Conflicting advice on altruism
As far as I can tell, rather than having a single well-defined set of preferences or utility function, my actions more closely reflect the outcome of a set of competing internal drives. One of my internal drives is strongly oriented towards a utilitarian altruism. While the altruist internal drive doesn't dominate my day-to-day life, compared to the influence of more basic drives like the desires for food, fun, and social validation, I have traditionally been very willing to drop whatever I'm doing and help someone who asks for, or appears to need help. This altruistic drive has an even more significant degree of influence on my long term planning, since my drives for food, fun, etc. are ambivalent between the many possible futures in which they can be well-satisfied.
I'm not totally sure to what extent strong internal drives are genetic or learned or controllable, but I've had a fairly strong impulse towards altruism for well over a decade. Unfortunately, even over fairly long time frames it isn't clear to me that I've been a particularly "effective" altruist. This discussion attempts to understand some of the beliefs and behaviors that contributed to my personal failure/success as an altruist, and may also be helpful to other people looking to engage in or encourage similar prosocial habits.
Game Theory Model
Imagine a perfect altruist competing in a Prisoner's Dilemma style game. The altruist in this model is by definition a pure utilitarian who wants to maximize the average utility, but is completely insensitive to the distribution of the utility.1 A trivial real world example similar to this would be something like picking up litter in a public place. If the options are Pick up (Cooperate) and Litter (Defect) then an altruist might choose to pick up litter even though they themselves don't capture enough of the value to justify the action. Even if you're skeptical that unselfish pure utilitarians exist, the payoff matrix and much of this analysis applies to a broader range of prosocial behaviors where it's difficult for a single actor to capture the value he or she generates.
The prisoner's dilemma payoff matrix for the game in which the altruist is competing looks something like this:
| Agent B | ||
| Agent A | Cooperate | Defect |
| Cooperate | 2,2 | -2,4 |
| Defect | 4,-2 | -1,-1 |
Other examples with altered payoff ratios are possible, but this particular payoff matrix creates an interesting inversion of the typical strategy for the prisoner's dilemma. If we label the altruist Agent A (A for Altruist), then A's dominant strategy is Cooperate. Just as in the traditional prisoner's dilemma, A prefers if B also cooperates, but A will cooperate regardless of what B does. The iterated prisoner's dilemma is even more interesting. If A and B are allowed to communicate before and between rounds, A may threaten to employ a tit-for-tat-like strategy and to defect in the future against defectors, but this threat is somewhat hollow, since regardless of threats, A's dominant strategy in any given round is still to cooperate.
A population of naive altruists is somewhat unstable for the same reason that a population of naive cooperators is unstable. It's vulnerable to infiltration by defectors. The obvious meta-strategies for individual altruists and altruist populations are to either become proficient at identifying defectors and to ignore/avoid them or to successfully threaten defectors into cooperating. Both the identify/avoid and the threaten/punish tactics have costs associated with them, and which approach is a better strategy depends on how much players are expected to change over the course of time/a series of games. Incorrigible defectors cannot be threatened/punished and must be avoided,while for more malleable defectors it may be possible to threaten them into cooperation.
If we assume that agent B is selfish and we express the asymmetry in the agent values in terms of our payoff matrix, then the symmetric payoff matrix above is equivalent to the top portion of a new payoff matrix given by
| Agent B | ||
| Agent A | Cooperate | Defect |
| Cooperate | 2,2 | 1,4 |
| Defect | 1,-2 | -1,-1 |
| Avoid | 0,0 | 0,0 |
The only difference between the two matrices is in this latter case we've given the altruist an avoid option. There is no simple way to include the threaten option, since threaten relies on trying to convince Agent B that Agent A is either unreasonable or not an altruist and including that sort of bluff in the formal model makes is difficult to create payoff matrices that are both simple and reasonable. However, we can still make a few improvements to our formal model before we're forced to abandon it and talk about the real world.
Adding Complexity
The relatively simple payoff matrices in the previous section can easily be made more realistic and more complicated. In the iterated version of the game, if the total number of times A can cooperate in games is limited, then for each game in which she cooperates, she incurs an opportunity cost equal to the difference between her received payoff and her ideal payout. Under this construction an altruist who cooperates with a defector receives a negative utility as long as games with other cooperators are available.
| Agent B | ||
| Agent A | Cooperate | Defect |
| Cooperate | 2,2 | -1,4 |
| Defect | -1,-2 | -3,-1 |
| Avoid | 0,0 | 0,0 |
In this instance, A no longer has a dominant strategy. A should cooperate with B if she thinks that B will cooperate, but A should avoid B if she thinks that B will defect. A thus has a strong incentive to build a sophisticated model of B, which can be used either to convince B to cooperate or at the very least correctly predict B's defection. For a perfect altruist, more information and judgment of agent B leads to better average outcomes.
The popularity of gatekeeper organizations like GiveWell and Charity Navigator in altruist communities makes a lot of sense if those communities are aware of their vulnerability to defectors. Because charitable dollars are so fungible, giving money to a charity is an instance where opportunity costs play a significant role. While meta-charities offer some other advantages, a significant part of their appeal, especially for organizations like Charity Navigator, is helping people avoid "bad" charities.
Interestingly, with this addition, A's behavior may to start to look less and less like pure altruism. Even if A is totally indifferent to the distribution of utility, if A can reliably identify some other altruists then she will preferentially cooperate with them and avoid games with unknown agents in which there is a risk of defection. The benefits of cooperation could then disproportionately accrue within the altruist in-group, even if none of the altruists intend that outcome.
An observer who had access only to the results of the games and not the underlying utility functions of the players would be unlikely to conclude that the clique of A-like agents that exhibited strong internal cooperation and avoided games with all other players had a purely altruistic utility function. Their actions pattern-match much more readily to something more selfish and more like typical human tribal behavior, suggesting either a self-serving or an "us versus them" utility function instead of one that has increasing the average payoff as its goal. If we include the threaten/punish option, the altruist population may look even less like a population of altruists.
That erroneous pattern match isn't a huge issue for the perfectly rational pure altruist in our game theory model. Unfortunately, human beings are often neither of those things. A significant amount of research suggests that people's beliefs are strongly influenced by their actions, and what they think those actions say about them. An actual human that started with the purely altruistic utility function of Agent A in this section, who rationally cooperated with a set of other easily identified altruists, might very well alter his utility function to seem more consistent with his actions. The game theoretic model, in which the values of the agent are independent of the agents choices starts to break down.
While very few individuals are perfect altruists/pure utilitarians as defined here, a much larger fraction of the population nominally considers the altruist value system to be an ethical ideal. The ideal that people have approximately equal value may not always be reflected in how most people live, but many people espouse such a belief and even want to believe it. We see this idea under all sorts of labels: altruism, being a utilitarian, trying to "love your neighbor as yourself", believing in the spiritual unity of humankind, or even just an innate sense of fairness.
Someone who is trying to be an altruist may have altruism or a similar ethical injunction as one of many of their internal drives, and the drive for altruism may be relatively weak compared to their desires for personal companionship, increased social status, greater material wealth, etc. For this individual, the primary threat to the effectiveness of their prosocial behavior is not the possibility that they might cooperate with a defector; it is instead the possibility that their selfish drives might overwhelm their desire to act altruistically, and they themselves might not cooperate.
Received Wisdom on Altruism
Much of the cultural wisdom in my native culture that addresses how to be a good altruist is geared towards people who are trying to be altruists, rather than towards altruists who are trying to be effective. The best course of action in the two situations is often very different, but it took me a considerable amount of time to realize the distinction.
For people trying to be altruists, focusing on the opportunity costs of their altruism is exactly the wrong thing to do. Imagining all the other things that they could buy with their money instead of giving it to a homeless person or donating it to the AMF will make it very unlikely they will give the money away. Judging the motivations of others often provides ample excuses for not helping someone. Seeking out similar cooperators can quickly turn into self-serving tribalism and indifference towards people unlike the tribe. Most people have really stringent criteria for helping others, and so most given the chance to help, most people don't.
The cultural advice I received on altruism tended to focus on avoiding these pitfalls. It stressed ideas like, "Do whatever good you can, wherever you are", and emphasized not to judge or condemn others, but to give second chances, to try and believe in the fundamental goodness of people, and to try to cooperate and value non-tribe members and even enemies.
When I was trying to be an altruist, I took much of this cultural how-to advice on altruism very seriously and for much of my life often helped/cooperated with anyone who asked, regardless of whether the other person was likely to defect. Even when people literally robbed me I would rationalize that whoever stole my bike must have really needed a bike, and so the even my involuntary "cooperation" with the thief probably was a net positive from a utilitarian standpoint.
Effective Altruism
I don't think I've been particularly effective as an altruist because I haven't been judgmental enough, because I've been too focused on doing whatever good I could where I was instead of finding the places I could do the most good and moving myself to those places. I'm now trying to spend nearly as much energy identifying opportunities to do good, as I do actively trying to improve the world.
At the same time, I'm still profoundly wary of the instinct not to help, or of thinking, "This isn't my best opportunity to do good" because I know that's it's very easy to get in the habit of not helping people. I'm trying to move away from my instinct towards reactive helping anyone who asks towards something that looks more like proactive planning, but I'm not at all convinced that most other people should be trying to move in that same direction.
As with achieving any goal, success requires a balance between insufficient planning and analysis paralysis. I think for altruism in particular, this balance was and is difficult to strike in part because of the large potential for motivated selfish reasoning, but also because most of my (our?) cultural wisdom emphasizes convenient immediate action as the correct form of altruism. Long term altruistic planning is typically not much mentioned or discussed, possibly because most people just aren't that strongly oriented towards utilitarian values.
Conclusion
If helping others is something that you're committed enough to that a significant limitation on your effectiveness is that you often help the wrong people, then diverting energy into judging who you help and consciously considering opportunity costs is probably a good idea. If helping others is something you'd like to do, but you rarely find yourself actually doing, the opposite advice may be apropos.
1. In idealized formulations of game theory, "utility" is intended to describe not just physical or monetary gain, but to include effects like desire for fairness, moral beliefs, etc. Symmetric games are fairly unrealistic under that assumption, and such a definition of utility would preclude our altruist from many games altogether. Utility in this first example is defined only in terms of personal gain, and explicitly does not include the effects of moral satisfaction, desire for fairness, etc.
Dumbing Down Human Values
I want to preface everything here by acknowledging my own ignorance. I have relatively little formal training in any of the subjects this post will touch upon and that this chain of reasoning is very much a work in progress.
I think the question of how to encode human values into non-human decision makers is a really important research question. Whether or not one accepts the rather eschatological arguments about the intelligence explosion, the coming singularity, etc. there seems to be tremendous interest in the creation of software and other artificial agents that are capable of making sophisticated decisions. Inasmuch as the decisions of these agents have significant potential impacts, we want those decisions to be made with some sort of moral guidance. Our approach towards the problem of creating machines that preserve human values thus far has primarily relied on a series of hard-coded heuristics, e.g. saws that stop spinning if they come into contact with human skin. For very simple machines, these sorts of heuristics are typically sufficient, but they constitute a very crude representation of human values.
We're at the border, in many ways, of creating machines where these sorts of crude representations are probably not sufficient. As a specific example, IBM's Watson is now designing treatment programs for lung cancer patients. The design of a treatment program implies striking a balance between treatment cost, patient comfort, aggressiveness of targeting the primary disease, short and long-term side effects, secondary infections, etc. It isn't totally clear how those trade-offs are being managed, although there's still a substantial amount of human oversight/intervention at this point.
The use of algorithms to discover human preferences is already widespread. While these typically operate in restricted domains such as entertainment recommendations, it seems at least in principle possible that with the correct algorithm and a sufficiently large corpus of data, a system not dramatically more advanced than existing technology could learn some reasonable facsimile of human values. This is probably worth doing.
The goal would be to have a sufficient representation of human values using as dumb a machine as possible. This putative value-learning machine could be dumb in the way that Deep Blue was dumb, by being a hyper-specialist in the problem domain of chess/learning human values and having very little optimization power outside of that domain. It could also be dumb in the way that evolution is dumb, obtaining satisfactory results more through an abundance of data and resources that through any particular brilliance.
Computer chess benefited immensely from 5 decades of work before Deep Blue managed to win a game against Kasparov. While many of the algorithms developed for computer chess have found applications outside of that domain, some of them are domain specific. A specialist human value learning system may also require substantial effort on domain specific problems. The history, competitive nature, and established ranking system for chess made it attractive problem for computer scientists because it was relatively easy to measure progress. Perhaps the goal for a program designed to understand human values would be that it plays a convincing game of "Would you rather?" although as far as I know no one has devised an ELO system for it.
Similarly, a relatively dumb but more general AI, may require relatively large, preferably somewhat homogeneous data sets to come to conclusions that are even acceptable. Having successive generations of AI train on the same or similar data sets could provide a useful way of tracking progress/feedback mechanism for determining how successful various research efforts are.
The benefit of this research approach is that not only is it a relatively safe path towards a possible AGI, in the event that the speculative futures of mind-uploads and superintelligences do not take place, there's still substantial utility in having devised a system that is capable of making correct moral decisions in limited domains. I want my self-driving car to make a much larger effort to avoid a child in the road than a plastic bag. I'd be even happier if it could distinguish between an opossum and someone's cat.
When I design research projects, one of the things I try to ensure is that if some of my assumptions are wrong, the project fails gracefully. Obviously it's easy to love the Pascal's Wager-like impact statement of FAI, but if I were writing it up for an NSF grant I'd put substantially more emphasis on the importance of my research even if fully human level AI isn't invented for another 200 years. When I give the elevator pitch version of FAI, I've found placing a strong emphasis on the near future and referencing things people have encountered before such as computers playing jeopardy or self-driving cars makes them much more receptive to the idea of AI safety and allows me to discuss things like the potential for an unfriendly superintelligence without coming across as a crazed prophet of the end times.
I'm also just really really curious to see how well something like Watson would perform if I gave it a bunch of sociology data and asked if a human would rather find 5 dollars or stub a toe. There doesn't seem to be a huge categorical difference between the being able to answer the Daily Double and reasoning about human preferences, but I've been totally wrong about intuitive jumps that seemed much smaller than that one in the past, so it's hard to be too confident.
How probable is Molecular Nanotech?
Circa a week ago I posted asking whether bringing up molecular nanotechnology(MNT) as a possible threat avenue for an unfriendly artificial intelligence made FAI research seem less credible because MNT seemed to me to be not obviously possible. I was told to some extent, to put up and address the science of MNT or shut up. A couple of people also expressed an interest in seeing a more fact and less PR oriented discussion, so I got the ball rolling and you all have no one to blame but yourselves. I should note before starting, that I do not personally have a strong opinion on whether Drexler-style MNT is possible. This isn't something I've researched previously, and I'm open to being convinced one way or the other. If MNT turns out to be likely at the end of this investigation, then hopefully this discussion can provide a good resource for LW/FAI on the topic for people like myself not yet convinced that MNT is the way of future. As far as I'm concerned, at this point all paths lead to victory.
While Nanosystems was the canonical reference mentioned in the last conversation. I purchased it, then about 2/3rds of the way through this I figured Engines of Creation was giving me enough to work with and cancelled my order. If the science in Nanosystems is really much better than in EoC I can reorder it, but I figured we'd get started for free. 50 bucks is a lot of money to spend on an internet argument.
Before I begin I would like to post the following disclaimers.
1. I am not an expert in many of the claims that border on MNT. I did work at a Nanotechnology center for a year, but that experience was essentially nothing like what Drexler describes. More relevantly I am in the process of completing a Ph.D. in Physics, and my thesis work is on computational modeling of novel materials. I don't really like squishy things, so I'm very much out of my depth when it comes to discussions as to what ribosomes can and cannot accomplish, and I'll happily defer to other authorities on the more biological subjects. With that being said, several of my colleagues run MD simulations of protein folding all day every day, and if a biology issue is particularly important, I can shoot some emails around the department and try and get a more expert opinion.
2. There are several difficulties in precisely addressing Drexler's arguments, because it's not always clear to me at least exactly what his arguments are. I've been going through Engines of Creation and several of his other works, and I'll present my best guess outline here. If other people would like to contribute specific claims about molecular nanotech, I'll be happy to add them to the list and do my best to address them.
3. This discussion is intended to be scientific. As was pointed out previously, Drexler et al. have made many claims about time tables of when things might be invented. Judging the accuracy of these claims is difficult because of issues with definitions as mentioned in the previous paragraph. I'm not interested in having this discussion encompass Drexler's general prediction accuracy. Nature is the only authority I'm interested in consulting in this thread. If someone wants to make a Drexler's prediction accuracy thread, they're welcome to do so.
4. If you have any questions about the science underlying anything I say, don't hesitate to ask. This is a fairly technical topic, and I'm happy to bring anyone up to speed on basic physics/chemistry terms and concepts.
Discussion
I'll begin by providing some background and highlighting why exactly I am not already convinced that MNT, and especially AI-assisted rapid MNT is the future, and then I'll try and address some specific claims made by Drexler in various publications.
Conservation of energy:
Modelling is hard:
Solving the Schrodinger equation is essentially impossible. We can solve it more or less exactly for the Hydrogen atom, but things get very very difficult from there. This is because we don't have a simple solution for the three-body problem, much less the n-body problem. Approximately, the difficulty is that because each electron interacts with every other electron, you have a system where to determine the forces on electron 1, you need to know the position of electrons 2 through N, but the position of each of those electrons depends somewhat on electron 1. We have some tricks and approximations to get around this problem, but they're only justified empirically. The only way we know what approximations are good approximations is by testing them in experiments. Experiments are difficult and expensive, and if the AI is using MNT to gain infrastructure, then we can assume it doesn't already have the infrastructure to run its own physics lab.
A factory isn't the right analogy:
The discussion of nanotechnology seems to me to have an enormous emphasis on Assemblers, or nanofactories, but a factory doesn't run unless it has a steady supply of raw materials and energy resources both arriving at the correct time. The evocation of a factory calls to mind the rigid regularity of an assembly line, but the factory only works because it's situated in the larger, more chaotic world of the economy. Designing new nanofactories isn't a problem of building the factory, but a problem of designing an entire economy. There has to be a source of raw material, an energy source, and means of transporting material and energy from place to place. And, with a microscopic factory, Brownian motion may have moved the factory by the time the delivery van gets there. This fact makes the modelling problem orders of magnitude more difficult. Drexler makes a big deal about how his rigid positional world isn't like the chaotic world of the chemists, but it seems like the chaos is still there; building a factory doesn't get rid of the logistics issue.
Chaos
The reason we can't solve the n-body problem, and lots of other problems such as the double pendulum and the weather is because it turns out to be a rather unfortunate fact of nature that many systems have a very sensitive dependence on initial conditions. This means that ANY error, any unaccounted for variable, can perturb a system in dramatic ways. Since there will always be some error (at the bare minimum h/4π) this means that our AI is going to have to do Monte Carlo simulations like the rest of us smucks and try to eliminate as many degrees of freedom as possible.
The laws of physics hold
I didn't think it would be necessary to mention this, but I believe that the laws of physics are pretty much the laws of physics we know right now. I would direct anyone who suggests that an AI has a shot at powering MNT with cold fusion, tachyons, or other physical phenomena not predicted by the standard model to this post. I am not saying there is no new no physics, but we understand quantum mechanics really well, and the Standard Model has been confirmed to enough decimal places that anyone who suggests something the Standard Model says can't happen is almost certainly wrong. Even if they have experimental evidence that is supposed to 99.9999% percent correct.
Specific Claims
Drexler's claims about what we can do now with respect to materials science in general are true. This should be unsurprising. It is not particularly difficult to predict the past. Here are 6 claims he makes that we can't currently accomplish which I'll try and evaluate:
- Building "gear-like" nanostructures is possible (Toward Integrated Nanosystems)
- Predicting crystal structures from first principles is possible (Toward Integrated Nanosystems)
- Genetic engineering is a superior form of chemical synthesis to traditional chemical plants. (EoC 6)
- "Biochemical engineers, then, will construct new enzymes to assemble new patterns of atoms. For example, they might make an enzyme-like machine which will add carbon atoms to a small spot, layer on layer. If bonded correctly, the atoms will build up to form a fine, flexible diamond fiber having over fifty times as much strength as the same weight of aluminum." (EoC 10)
- Proteins can make and break diamond bonds (EoC 11)
- Proteins are "programmable" (EoC 11)
2. True. This isn't true yet, but should be possible. I might even work on this after I graduate, if don't go hedge fund or into AI research.
3. Not wrong, but misleading. The statement "Genetic engineers have now programmed bacteria to make proteins ranging from human growth hormone to rennin, an enzyme used in making cheese." is true in the same sense that copying and pasting someone else's code constitutes programming. Splicing a gene into a plasmid is sweet, but genetic programming implies more control than we have. Similarly, the statement: "Whereas engineers running a chemical plant must work with vats of reacting chemicals (which often misarrange atoms and make noxious byproducts), engineers working with bacteria can make them absorb chemicals, carefully rearrange the atoms, and store a product or release it into the fluid around them." implies that bacterial synthesis leads to better yields (false), that bacteria are careful(meaningless), and implies greater control over genetically modified E.Coli than we have.
4a. False. Flexible diamond doesn't make any sense. Diamond is sp3 bonded carbon and those bonds are highly directional. They're not going to flex.. Metals are flexible because metallic bonds, unlike covalent bonds, don't confine the electrons in space. Whatever this purported carbon fiber is, it either won't be flexible, or it won't be diamond.
4b. False. It isn't clear that this is even remotely possible. Enzymes don't work like this. Enzymes are catalysts for existing reactions. There is no existing reaction that results in a single carbon atom. That's an enormously energetically unfavorable state. Breaking a single carbon carbon double bond requires something like 636 kJ/mol (6.5eV) of energy. That's roughly equivalent to burning 30 units of ATP at the same time. How? How do you get all that energy into the right place at the right time? How does your enzyme manage to hold on to the carbons strongly enough to pull them apart?
5. "A flexible, programmable protein machine will grasp a large molecule (the workpiece) while bringing a small molecule up against it in just the right place. Like an enzyme, it will then bond the molecules together. By bonding molecule after molecule to the workpiece, the machine will assemble a larger and larger structure while keeping complete control of how its atoms are arranged. This is the key ability that chemists have lacked." I'm no biologist, but this isn't how proteins work. Proteins aren't Turing machines. You don't set the state and ignore them. The conformation of a protein depends intimately on its environment. The really difficult part here is that the thing it's holding, the nanopart you're trying to assemble is a big part of the protein's environment. Drexler complains around how proteins are no good because they're soft and squishy, but then he claims they're strong enough to assemble diamond and metal parts. But if the stiff nanopart that you're assembling has a dangling carbon bond waiting to filled then it's just going to cannibalize the squishy protein that's holding it. What can a protein held together by Van der Waals bonds do to a diamond? How can it control the shape it takes well enough to build a fiber?
6. All of these tiny machines are repeatedly described as programmable, but that doesn't make any sense. What programs are they capable of accepting or executing? What set of instructions can a collection of 50 carbon atoms accept and execute? How are these instructions being delivered? This gets back to my factory vs. economy complaint. If nothing else, this seems an enormously sloppy use of language.
Some things that are possible
I think we have or will have the technology to build some interesting artificial inorganic structures in very small quantities, primarily using ultra-cold, ultra-high-vacuum laser traps. It's even possible that eventually we could create some functional objects this way, though I can't see any practical way to scale that production up.
"Nanorobots" will be small pieces of metal or dieletric material that we manipulate with lasers or sophisticated magnetic fields, possibly attached to some sort of organic ligand. This isn't much of a prediction, we pretty much do this already. The nanoworld will continue to be statistical and messy.
We will gain some inorganic control over organics like protein and DNA (though not organic over inorganic). This hasn't really been done yet that I'm aware of, but stronger bonds>weaker bonds makes sense. I think there are people trying to read DNA/proteins by pushing the strands through tiny silicon windows. I feel like I heard a seminar along those lines, though I'm pretty sure I slept through it.
That brings me through the first 12 pages of EoC or so. More to follow. Let me know if the links don't work or the formatting is terrible or I said something confusing. Also, please contribute any specific MNT claims you'd like evaluated, and any resources or publications you think are relevant. Thank you.
Bibliography
For FAI: Is "Molecular Nanotechnology" putting our best foot forward?
Molecular nanotechnology, or MNT for those of you who love acronyms, seems to be a fairly common trope on LW and related literature. It's not really clear to me why. In many of the examples of "How could AI's help us" or "How could AI's rise to power" phrases like "cracks protein folding" or "making a block of diamond is just as easy as making a block of coal" are thrown about in ways that make me very very uncomfortable. Maybe it's all true, maybe I'm just late to the transhumanist party and the obviousness of this information was with my invitation that got lost in the mail, but seeing all the physics swept under the rug like that sets off every crackpot alarm I have.
I must post the disclaimer that I have done a little bit of materials science, so maybe I'm just annoyed that you're making me obsolete, but I don't see why this particular possible future gets so much attention. Let us assume that a smarter than human AI will be very difficult to control and represents a large positive or negative utility for the entirety of the human race. Even given that assumption, it's still not clear to me that MNT is a likely element of the future. It isn't clear to me than MNT is physically practical. I don't doubt that it can be done. I don't doubt that very clever metastable arrangements of atoms with novel properties can be dreamed up. Indeed, that's my day job, but I have a hard time believing the only reason you can't make a nanoassembler capable of arbitrary manipulations out of a handful of bottles you ordered from Sigma-Aldrich is because we're just not smart enough. Manipulating individuals atoms means climbing huge binding energy curves, it's an enormously steep, enormously complicated energy landscape, and the Schrodinger Equation scales very very poorly as you add additional particles and degrees of freedom. Building molecular nanotechnology seems to me to be roughly equivalent to being able to make arbitrary lego structures by shaking a large bin of lego in a particular way while blindfolded. Maybe a super human intelligence is capable of doing so, but it's not at all clear to me that it's even possible.
I assume the reason than MNT is added to a discussion on AI is because we're trying to make the future sound more plausible via adding burdensome details. I understand that AI and MNT is less probable than AI or MNT alone, but that both is supposed to sound more plausible. This is precisely where I have difficulty. I would estimate the probability of molecular nanotechnology (in the form of programmable replicators, grey goo, and the like) as lower than the probability of human or super human level AI. I can think of all sorts of objection to the former, but very few objections to the latter. Including MNT as a consequence of AI, especially including it without addressing any of the fundamental difficulties of MNT, I would argue harms the credibility of AI researchers. It makes me nervous about sharing FAI literature with people I work with, and it continues to bother me.
I am particularly bothered by this because it seems irrelevant to FAI. I'm fully convinced that a smarter than human AI could take control of the Earth via less magical means, using time tested methods such as manipulating humans, rigging elections, making friends, killing its enemies, and generally only being a marginally more clever and motivated than a typical human leader. A smarter than human AI could out-manipulate human institutions and out-plan human opponents with the sort of ruthless efficiency that modern computers beat humans in chess. I don't think convincing people that smarter than human AI's have enormous potential for good and evil is particularly difficult, once you can get them to concede that smarter than human AIs are possible. I do think that waving your hands and saying super-intelligence at things that may be physically impossible makes the whole endeavor seem less serious. If I had read the chain of reasoning smart computer->nanobots before I had built up a store of good-will from reading the Sequences, I would have almost immediately dismissed the whole FAI movement a bunch of soft science fiction, and it would have been very difficult to get me to take a second look.
Put in LW parlance, suggesting things not known to be possible by modern physics without detailed explanations puts you in the reference class "people on the internet who have their own ideas about physics". It didn't help, in my particular case, that one of my first interactions on LW was in fact with someone who appears to have their own view about a continuous version of quantum mechanics.
And maybe it's just me. Maybe this did not bother anyone else, and it's an incredible shortcut for getting people to realize just how different a future a greater than human intelligence makes possible and there is no better example. It does alarm me though, because I think that physicists and the kind of people who notice and get uncomfortable when you start invoking magic in your explanations may be the kind of people FAI is trying to attract.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)