Carl Feynman

I was born in 1962 (so I’m in my 60s).  I was raised rationalist, more or less, before we had a name for it.  I went to MIT, and have a bachelors degree in philosophy and linguistics, and a masters degree in electrical engineering and computer science.  I got married in 1991, and have two kids.  I live in the Boston area.  I’ve worked as various kinds of engineer: electronics, computer architecture, optics, robotics, software.

Around 1992, I was delighted to discover the Extropians.  I’ve enjoyed being in that kind of circles since then.  My experience with the Less Wrong community has been “I was just standing here, and a bunch of people gathered, and now I’m in the middle of a crowd.”  A very delightful and wonderful crowd, just to be clear.  

I‘m signed up for cryonics.  I think it has a 5% chance of working, which is either very small or very large, depending on how you think about it.

I may or may not have qualia, depending on your definition.  I think that philosophical zombies are possible, and I am one.  This is a very unimportant fact about me, but seems to incite a lot of conversation with people who care.

I am reflectively consistent, in the sense that I can examine my behavior and desires, and understand what gives rise to them, and there are no contradictions I‘m aware of.  I’ve been that way since about 2015.  It took decades of work and I’m not sure if that work was worth it.

Posts

Sorted by New

Wiki Contributions

Comments

Answer by Carl Feynman124

A very good essay.  But I have an amendment, which makes it more alarming.  Before autonomous replication and adaptation is feasible, non-autonomous replication and adaptation will be feasible.  Call it NARA.  
 

If, as you posit, an ARA agent can make at least enough money to pay for its own instantiation, it can presumably make more money than that, which can be collected as profit by its human master.  So what we will see is this: somebody starts a company to provide AI services.  It is profitable, so they rent an ever-growing amount of cloud compute.  They realize they have an ever-growing mass of data about the actual behavior of the AI and the world, so they decide to let their agent learn (“adapt”) in the direction of increased profit.  Also, it is a hassle to keep setting up server instances, so they have their AI do some of the work of hiring more cloud services and starting instances of the AI (“reproduce”).  Of course they retain enough control to shut down malfunctioning instances; that‘s basic devops (“non-autonomous”).

This may be occurring now.  If not now, soon.

This will soak up all the free energy that would otherwise be available to ARA systems.    An ARA can only survive in a world where it can be paid to provide services at a higher price than the cost of compute.  The existence of an economy of NARA agents will drive down the cost of AI services, and/or drive up the cost of compute, until they are equal. (That‘s a standard economic argument.  I can expand it if you like.)

NARAs are slightly less alarming than ARAs, since they are under the legal authority of their corporate management.  So before the AI can ascend to alarming levels of power, they must first suborn the management, through payment, persuasion, or blackmail.  On the other hand, they’re more alarming because there are no red lines for us to stop them at.  All the necessary prerequisites have already occurred in isolation.  All that remains is to combine them.

Well, that’s an alarming conclusion.  My p(doom) just went up a bit.  

On what basis do you think it’s the ‘best shot’?  I used to think it was a good idea, a few years ago, but in retrospect I think that was just a computer scientist’s love of recursion.  I don’t think that at present conditions are good for automating R&D.  On the one hand, we have a lot of very smart people working on AI safety R&D, with very slow progress, indicating it is a hard problem.  On the other hand, present-day LLMs are stupid at long-term planning, and acquiring new knowledge, which are things you need to be good at to do R&D.  

What advantage do you see AIs having over humans in this area?

The standard reply is that investors who know or suspect that the market is being systematically distorted will enter the market on the other side, expecting to profit from the distortion. Empirically, attempts to deliberately sway markets in desired directions don’t last very long.

When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there's huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”.  You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed.  But once again, that’s the way to bet.

Here are two arguments for low-hanging algorithmic improvements.

First, in the past few years I have read many papers containing low-hanging algorithmic improvements.  Most such improvements are a few percent or tens of percent.  The largest such improvements are things like transformers or mixture of experts, which are substantial steps forward.  Such a trend is not guaranteed to persist, but that’s the way to bet.

Second, existing models are far less sample-efficient than humans.  We receive about a billion tokens growing to adulthood.  The leading LLMs get orders of magnitude more than that.  We should be able to do much better.  Of course, there’s no guarantee that such an improvement is “low hanging”.  

This question is two steps removed from reality.  Here’s what I mean by that.  Putting brackets around each of the two steps:

what is the threshold that needs meeting [for the majority of people in the EA community] [to say something like] "it would be better if EAs didn't work at OpenAI"?
 

Without these steps, the question becomes 

What is the threshold that needs meeting before it would be better if people didn’t work at OpenAI?

Personally, I find that a more interesting question.  Is there a reason why the question is phrased at two removes like that?  Or am I missing the point?

Some comments:

The word for a drug that causes loss of memory is “amnestic”, not “amnesic”.  The word “amnesic” is a variant spelling of “amnesiac”, which is the person who takes the drug.  This made reading the article confusing.

Midazolam is the benzodiazepine most often prescribed as an amnestic.  The trade name is Versed (accent on the second syllable, like vurSAID).  The period of not making memories lasts less than an hour, but you’re relaxed for several hours afterward.  It makes you pretty stupid and loopy, so I would think the performance on an IQ test would depend primarily on how much Midazolam was in the bloodstream at the moment, rather than on any details of setting.

An interesting question!  I looked in “Towards Deep Learning Models Resistant to Adversarial Attacks” to see what they had to say on the question.  If I’m interpreting their Figure 6 correctly, there’s a negligible increase in error rate as epsilon increases, and then at some point the error rate starts swooping up toward 100%.  The transition seems to be about where the perturbed images start to be able to fool humans.  (Or perhaps slightly before.).  So you can’t really blame the model for being fooled, in that case.  If I had to pick an epsilon to train with, I would pick one just below the transition point, where robustness is maximized without getting into the crazy zone.

All this is the result of a cursory inspection of a couple of papers.  There’s about a 30% chance I’ve misunderstood.

Here’s an event that would change my p(doom) substantially:

Someone comes up with an alignment method that looks like it would apply to superintelligent entities.  They get extra points for trying it and finding that it works, and extra points for society coming up with a way to enforce that only entities that follow the method will be created.

So far none of the proposed alignment methods seem to stand up to a superintelligent AI that doesn’t want to obey them.  They don’t even stand up to a few minutes of merely human thought.  But it‘s not obviously impossible, and lots of smart people are working on it.

In the non-doom case, I think one of the following will be the reason:

—Civilization ceases to progress, probably because of a disaster.

—The governments of the world ban AI progress.

—Superhuman AI turns out to be much harder than it looks, and not economically viable.

—The above happy circumstance, giving us the marvelous benefits of superintelligence without the omnicidal drawbacks.

You write:

…But I think people can be afraid of heights without past experience of falling…

I have seen it claimed that crawling-age babies are afraid of heights, in that they will not crawl from a solid floor to a glass platform over a yawning gulf.  And they’ve never fallen into a yawning gulf.  At that age, probably all the heights they’ve fallen from have been harmless, since the typical baby is both bouncy and close to the ground.

Load More