RHollerith

Richard Hollerith. 15 miles north of San Francisco. hruvulum@gmail.com

My probability that AI research will end all human life is .92. It went up drastically when Eliezer started going public with his pessimistic assessment in April 2022. Till then my confidence in MIRI (and knowing that MIRI has enough funding to employ many researchers) was keeping my probability down to about .4. (I am glad I found out about Eliezer's assessment.)

Currently I am willing to meet with almost anyone on the subject of AI extinction risk.

Last updated 26 Sep 2023.

Posts

Sorted by New

6rhollerith_dot_com's Shortform

15One Medical? Expansion of MIRI?

11y

8Computer-mediated communication and the sense of social connectedness

14y

-7LW was started to help altruists

14y

Wikitag Contributions

Comments

Sorted by

Newest

rhollerith_dot_com's Shortform

RHollerith2d*20

Maybe "motto" is the wrong word. I meant words / concepts to use in a comment or in a conversation.

"Those companies that created ChatGPT, etc? If allowed to continue operating without strict regulation, they will cause an intelligence explosion."

rhollerith_dot_com's Shortform

RHollerith2d2-6

There's a good chance that "postpone the intelligence explosion for a few centuries" is a better motto than "stop AI" or "pause AI".

Someone should do some "market research" on this question.

Can we ever ensure AI alignment if we can only test AI personas?

Answer by RHollerithMar 17, 2025*30

All 3 of the other replies to your question overlook the crispest consideration: namely, it is not possible to ensure the proper functioning of even something as simple as a circuit for division (such as we might find inside a CPU) through testing alone: there are too many possible inputs (too many pairs of possible 64-bit divisors and dividends) to test in one lifetime even if you make a million perfect copies of the circuit and test them in parallel.

Let us consider very briefly what else besides testing an engineer might do to ensure (or "verify" as the engineer would probably say) the proper operation of a circuit for dividing. The circuit is composed of 64 sub-circuits, each responsible for producing one bit of the output (i.e., the quotient to be calculated), and an engineer will know enough about arithmetic to know that the sub-circuit for calculating bit N should bear a close resemblance to the one for bit N+1: it might not be exactly identical, but any differences will be simple enough to be understood by a digital-design engineer -- usually: in 1994, a bug was found in the floating-point division circuit of the Intel Pentium CPU, precipitating a product recall that cost Intel about $475 million. After that, Intel switched to a more reliable, but much more ponderous technique called "formal verification" of its CPUs.

My point is that the question you are asking is sort of a low-stakes question (if you don't mind my saying) because there is a sharp limit to how useful testing can be: testing can reveal that the designers need to go back to the drawing board, but human designers can't go back to the drawing board billions of times (because there is not enough time because human designers are not that fast) so most of the many tens or hundreds of bits of human-applied optimization pressure that will be required for any successful alignment effort will need to come from processes other than testing. Discussion of these other processes is more pressing than any discussion of testing.

Eliezer's "Einstein's Arrogance is directly applicable here although I see that that post uses "bits of evidence" and "bits of entanglement" instead of "bits of optimization pressure".

Another important consideration is that there is probably no safe way to run most of the tests we would want to run on an AI much more powerful than we are.

Which meat to eat: CO₂ vs Animal suffering

RHollerith20d100

Let me reassure you that there’s more than enough protein available in plant-based foods. For example, here’s how much grams of protein there is in 100 gram of meat

That is misleading because most foods are mostly water, included the (cooked) meats you list, but the first 4 of the plant foods you list have had their water artificially removed: soy protein isolate; egg white, dried; spirulina algae, dried; baker’s yeast.

Moreover, the human gut digests and absorbs more of animal protein than of plant protein. Part of the reason for this is the plant protein includes more fragments that are impervious to digestive enzymes in the human gut and more fragments (e.g., lectins) that interfere with human physiology.

Moreover, there are many people who can and do eat 1 or even 2 lb of cooked meat every day without obvious short-term consequences whereas most people who would try to eat 1 lb of spirulina (dry weight) or baker's yeast (dry weight) in a day would probably get acute distress of the gut before the end of the day even if the spirulina or yeast was mixed with plenty of other food containing plenty of water, fiber, etc. Or at least that would be my guess (having eaten small amounts of both things): has anyone made the experiment?

LWLW's Shortform

RHollerith1mo20

The very short answer is that the people with the most experience in alignment research (Eliezer and Nate Soares) say that without an AI pause lasting many decades the alignment project is essentially hopeless because there is not enough time. Sure, it is possible the alignment project succeeds in time, but the probability is really low.

Eliezer has said that AIs based on the deep-learning paradigm are probably particularly hard to align, so it would probably help to get a ban or a long pause on that paradigm even if research in other paradigms continues, but good luck getting even that because almost all of the value currently being provided by AI-based services are based on deep-learning AIs.

One would think that it would be reassuring to know that the people running the labs are really smart and obviously want to survive (and have their children survive) but it is only reassuring before one listens to what they say and reads what they write about their plans on how to prevent human extinction and other catastrophic risks. (The plans are all quite inadequate.)

halwer's Shortform

RHollerith1mo20

I'm going to use "goal system" instead of "goals" because a list of goals is underspecified without some method for choosing which goal prevails when two goals "disagree" on the value of some outcome.

wouldn’t we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world?

That is contradictory: the AI's goal system is the single source of truth for the effectiveness and how much of an improvement is any change in the world.

Poll on AI opinions.

RHollerith1mo20

I would need a definition of AGI before I could sensibly answer those questions.

ChatGPT is already an artificial general intelligence by the definition I have been using for the last 25 years.

Open Thread Winter 2024/2025

RHollerith1mo*20

I think the leaders of the labs have enough private doubts about the safety of their enterprise that if an effective alignment method were available to them, they would probably adopt the method (especially if the group that devised the method do not seem particularly to care who gets credit for having devised it). I.e., my guess is that almost all of the difficulty is in devising an effective alignment method, not getting the leading lab to adopt it. (Making 100% sure that the leading lab adopts it is almost impossible, but acting in such a way that the leading lab will adopt it with p = .6 is easy, and the current situation is so dire that we should jump at any intervention with a .6 chance of a good outcome.)

Eliezer stated recently (during an interview on video) that the deep-learning paradigm seems particularly hard to align, so it would be nice to get the labs to focus on a different paradigm (even if we do not yet have a way to align the different paradigm) but that seems almost impossible unless and until the other paradigm has been developed to the extent that it can create models that are approximately as capable as deep-learning models.

The big picture is that the alignment project seems almost completely hopeless IMHO because of the difficulty of aligning the kind of designs the labs are using and the difficulty of inducing the labs to switch to easier-to-align designs.

how do the CEOs respond to our concerns?

RHollerith1mo30

Your question would have been better without the dig at theists and non-vegans.

If Neuroscientists Succeed

RHollerith1mo30

However, whereas the concept of an unaligned general intelligence has the advantage of being a powerful, general abstraction, the HMS concept has the advantage of being much easier to explain to non-experts.

The trouble with the choice of phrase "hyperintelligent machine sociopath" is that it gives the other side of the argument and easy rebuttal, namely, "But that's not what we are trying to do: we're not trying to create a sociopath". In contrast, if the accusation is that (many of) the AI labs are trying to create a machine smarter than people, then the other side cannot truthfully use the same easy rebuttal. Then our side can continue with, "and they don't have a plan for how to control this machine, at least not any plan that stands up to scrutiny". The phrase "unaligned superintelligence" is an extremely condensed version of the argument I just outlined (where the verb "control" has been replaced with "align" to head off the objection that control would not even be desirable because people are not wise enough and not ethical enough to be given control over something so powerful).