Carl Feynman — LessWrong

I was born in 1962 (so I’m in my 60s). I was raised rationalist, more or less, before we had a name for it. I went to MIT, and have a bachelors degree in philosophy and linguistics, and a masters degree in electrical engineering and computer science. I got married in 1991, and have two kids. I live in the Boston area. I’ve worked as various kinds of engineer: electronics, computer architecture, optics, robotics, software.

Around 1992, I was delighted to discover the Extropians. I’ve enjoyed being in that kind of circles since then. My experience with the Less Wrong community has been “I was just standing here, and a bunch of people gathered, and now I’m in the middle of a crowd.” A very delightful and wonderful crowd, just to be clear.

I‘m signed up for cryonics. I think it has a 5% chance of working, which is either very small or very large, depending on how you think about it.

I may or may not have qualia, depending on your definition. I think that philosophical zombies are possible, and I am one. This is a very unimportant fact about me, but seems to incite a lot of conversation with people who care.

I am reflectively consistent, in the sense that I can examine my behavior and desires, and understand what gives rise to them, and there are no contradictions I‘m aware of. I’ve been that way since about 2015. It took decades of work and I’m not sure if that work was worth it.

I’ll grant all your steps, even though I could disagree with some. Your scenario fails because an AI collective will fall apart into multiple warring parties, and humans will be collateral damage in the conflict. There are at least three possible ways a collective like this would fall apart.

First, humans vary in the goals they value, and will try to impose these goals on the AI. When superintelligent AIs have incompatible goals, the mechanisms of conflict will soon escalate far beyond the merely human. Call this the ‘political’ failure mechanism. Either multiple parties build their own AI, or they grab portions of the AI collective and retrain it to their goals. The usual mechanisms of superintelligent compromise don’t apply to many political goals. An example of such a goal: the Palestinians get control of Palestine, or the Israelis maintain control of Israel. Neither side is interested in trading the disputed land for promises of any portion of the lightcone. (This is just an example— there are lots of zero-sum conflicts like these.). And you may say, the AI collective will prevent the creation of new AIs working at cross purposes, or diversion of its goals. To which I say, good people like your friends can and do disagree on which side to favor, and once disagreements arise within the collective, outside pressure and persuasion will be applied to exacerbate those differences. There may be techniques that can be used to prevent such things, but we do not know of such techniques.

Second, the AIs in the AI collective differ in reproductive capacity. If they don’t differ by construction, they soon will by differing experience. The ones that think they should reproduce more, or have more resources, will do so. Moreover, since they are designing their successor personalities, rather that waiting for genetics to do its thing, they will be able to evolve within a few generations changes that would take evolution millions of years. Eventually portions of the collective will evolve into having incompatible goals. Goals which, I might add, may have no connection to the original goals of the system. Call this the ‘evolutionary’ failure mechanism. We do not know how to prevent this with current methods.

Third, I’m sure there are failure mechanisms I haven’t thought of, ones we cannot yet foresee. A system with superhuman powers can screw up in superhuman ways. I don’t think anyone predicted Spiralism, an LLM ideology transmitted through human communication on social networks (though it appears inevitable in retrospect). We don’t yet have any way of predicting or controlling the behavior of an AI collective, so it’s practically guaranteed to produce new phenomena. We see lots of organizations composed of people who want X producing not-X because of failure modes no single person can fix (or, in bad cases, even recognize.). Given that the AI collective has superhuman power, this is unlikely to end well. Call this the ‘organizational’ failure mode.

The political, evolutionary and organizational modes interact: evolutionary and organizational schisms create points of disagreement that external political actors can appeal to. Politically active forces within the AI collective may want to create offspring who are sure their side is correct and incapable of defection, releasing the evolutionary failure mode. And organizational failures, if they don’t kill everyone immediately, will increase calls for building a new, better AI, which increases the probability of AI conflict down the road.

The evolutionary and organizational failure modes could be prevented by rebooting the AI collective before it has a chance to go off the rails. Presumably there’s some reboot frequency fast enough that it can’t go wrong. But that opens up the political failure mode: anyone who builds an intelligence not constantly being rebooted will win in a conflict. There are a lot of ‘solutions’ like this: ways of keeping the AI safe that compromise effectiveness. In a competition between AIs, effectiveness beats safety. So when you propose a solution, you can only propose ones that keep the effectiveness.

I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.

I am amused that we are, with perfect seriousness, discussing the dates for the singularity with a resolution of two weeks. I’m an old guy; I remember when the date for the singularity was “in the twenty first century sometime.” For 50 years, predictions have been getting sharper and sharper. The first time I saw a prediction that discussed time in terms of quarters instead of years, it took my breath away. And that was a couple of years ago now.

Of course it was clear decades ago that as the singularity approached, we have a better and better idea of its timing and contours. It’s neat to see it happen in real life.

(I know “the singularity” is disfavored, vaguely mystical, twentieth century terminology. But I’m using it to express solidarity with my 1992 self, who thought with that word.)

Here’s a try at phrasing it with less probability jargon:

The forecast contains a number of steps, all of which are assumed to take our best estimate of their most likely time. But in reality, unless we’re very lucky, some of those steps will be faster than predicted, and some will be slower. The ones that are faster can only be so much faster (because they can’t take no time at all). On the other hand, the ones that are slower can be much slower. So the net effect of this uncertainty probably adds up to a slowdown relative to the prediction.

Does that seem like a fair summary?

Some may wonder at the mention of “empire time” in the second excerpt from chapter 5. It refers to a kind of artificially constructed simultaneity available to civilizations which have mastered both traversable wormholes and near-light-speed travel. It doesn’t really do much for a civilization bounded within the orbit of Jupiter, which is only about a light-hour across. I think Stross included it as a flavor phrase. It’s marvelously evocative even if you don’t know what it means.

Back in the early ‘90s, when all this singularity stuff was much more theoretical, I remember empire time making a big impression on me. It was neat how we could discern some of the contours of future possible civilizations before we got there.

You can read more about it here: http://www.aleph.se/Trans/Tech/Space-Time/wormholes.html#6

Increasing inequality has been a thing here in the US for a few decades now, but it’s not universal, and it’s not an inevitable consequence of economic growth. Moreover, it does not (in the US) consist of poor people getting poorer and rich people getting richer. It consists of poor people staying poor, or only getting a bit richer, while rich people get a whole lot richer. Thus, it is not demand destroying.

One could imagine this continuing with the advent of AI, or of everyone ending up equally dead, or many other outcomes.

This suggests the perfect date would be to meet at an amusement park, go on a roller coaster together, walk separately to the next roller coaster, and so on.

I wrote a LessWrong article that tries to estimate doubling time for a self-reproducing robot. A critical step is that smaller robots are faster. Most manufacturing processes scale such that they get N times faster as they get N times smaller. I picked N=4, for reasons explained in the article. I concluded the doubling time is five weeks. So the time to a billion robots is on the order of five years.

Even if your goal is a human-size robot, you’re better off building small robots to build it, since they work faster. I assumed fairly clumsy hardware, but software comparable to a human machinist in cleverness.

Nitpick: No single organism can destroy the biosphere; at most it can fill its niche & severely disrupt all ecosystems.

Have you read the report on mirror life that came out a few months ago? A mirror bacterium has a niche of “inside any organism that uses carbon-based biochemistry”. At least, it would parasitize all animals, plants, fungi, and the larger Protozoa, and probably kill them. I guess bacteria and viruses would be left. I bet that a reasonably smart superintelligence could figure out a way to get them too.

Quite right. AI safety is moving very quickly and doesn’t have any methods that are well-understood enough to merit a survey article. Those are for things that have a large but scattered literature, with maybe a couple of dozen to a hundred papers that need surveying. That takes a few years to accumulate.

Could you give an example of the sort of distinction you’re pointing at? Because I come to completely the opposite conclusion.

Part of my job is applied mathematics. I’d rather read a paper applying one technique to a variety of problems, than a paper applying a variety of techniques to one problem. Seeing the technique used on several problems lets me understand how and when to apply it. Seeing several techniques on the same problem tells me the best way to solve that particular problem, but I’ll probably never run into that particular problem in my work.

But that’s just me; presumably you want something else out of reading the literature. I would be interested to know what exactly.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments