Oleg Trott — LessWrong

LESSWRONG
LW

Replying toHow unusual is the fact that there is no AI monopoly?

How unusual is the fact that there is no AI monopoly?

"why didn't the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality"

Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:

https://papers.nips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf

So this idea is old. Only some specific architectural aspects are new.

Replying toDoes VETLM solve AI superalignment?

Oleg Trott2y

Does VETLM solve AI superalignment?

I suspect this labeling and using the labels is still harder that you think though, since individual tokens don't have truth values.

Why should they?

You could label each paragraph, for example. Then, when the LM is trained, the correct label could come before each paragraph, as a special token: <true>, <false>, <unknown> and perhaps <mixed>.

Then, during generation, you'd feed it <true> as part of the prompt, and when it generates paragraph breaks.

Similarly, you could do this on a per-sentence basis.

Replying toDoes VETLM solve AI superalignment?

Oleg Trott2y

Does VETLM solve AI superalignment?

The idea that we're going to produce a similar amount of perfectly labeled data doesn't seem plausible.

That's not at all the idea. Allow me to quote myself:

Here’s what I think we could do. Internet text is vast – on the order of a trillion words. But we could label some of it as “true” and “false”. The rest will be “unknown”.

You must have missed the words "some of" in it. I'm not suggesting labeling all of the text, or even a large fraction of it. Just enough to teach the model the concept of right and wrong.

It shouldn't take long, especially since I'm assuming a human-level ML algorithm here, that is, one with data efficiency comparable to that of humans.

Replying toDoes VETLM solve AI superalignment?

Oleg TrottAug 08, 2024

Does VETLM solve AI superalignment?

Carlson's interview, BTW. It discusses LessWrong in the first half of the video. Between X and YouTube, the interview got 4M views -- possibly the most high-profile exposure of this site?

I'm kind of curious about the factual accuracy: "debugging" / struggle sessions, polycules, and the 2017 psychosis -- Did that happen?

-1

Replying toDoes VETLM solve AI superalignment?

Oleg Trott2y

Does VETLM solve AI superalignment?

What do VELM and VETLM offer which those other implementable proposals don't? And what problems do VELM and VETLM not solve?

VETLM solves superalignment, I believe. It's implementable (unlike CEV), and it should not be susceptible to wireheading (unlike RLHF, instruction following, etc) Most importantly, it's intended to work with an arbitrarily good ML algorithm -- the stronger the better.

So, will it self-improve, self-replace, escape, let you turn it off, etc.? Yes, if it thinks that this is what its creators would have wanted.

Will it be transparent? To the point where it can self-introspect and, again if it thinks that being transparent is what its creators would have wanted. If it thinks that this is a worthy goal to pursue, it will self-replace with increasingly transparent and introspective systems.

Replying toDoes VETLM solve AI superalignment?

Oleg Trott2y

Does VETLM solve AI superalignment?

New proposals are useful mainly insofar as they overcome some subset of barriers which stopped other solutions.

CEV was stopped by being unimplementable, and possibly divergent:

The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004.

VELM and VETLM are easily implementable (on top of a superior ML algorithm). So does this fit the bill?

Does VETLM solve AI superalignment?

Oleg Trott

Eliezer Yudkowsky’s main message to his Twitter fans is:

Aligning human-level or superhuman AI with its creators’ objectives is also called “superalignment”. And a month ago, I proposed a solution to that. One might call it Volition Extrapolated by Language Models (VELM).

Apparently, the idea was novel (not the “extrapolated volition” part):

But it suffers from the fact that language models are trained on large bodies of Internet text. And this includes falsehoods. So even in the case of a superior learning algorithm^[1], a language model using it on Internet text would be prone to generating falsehoods, mimicking those who generated the training data.

So a week later, I proposed a solution to that problem too.... (read 229 more words →)

-1

Replying toNew Blog Post Against AI Doom

Oleg Trott2y

New Blog Post Against AI Doom

That post was completely ignored here: 0 comments and 0 upvotes during the first 24 hours.

I don't know if it's the timing or the content.

On HN, which is where I saw it, it was ranked #1 briefly, as I recall. But then it got "flagged", apparently.

Replying toAI existential risk probabilities are too unreliable to inform policy

Oleg Trott2y

AI existential risk probabilities are too unreliable to inform policy

Machine Learning Street Talk interview of one of the authors:

AI existential risk probabilities are too unreliable to inform policy

Oleg Trott

Melanie Mitchell called the article "excellent". It also got a bit of discussion on HN.

(This is an FYI. I don't necessarily approve or endorse the links I post.)

Replying toThe Assassination of Trump's Ear is Evidence for Time-Travel

Oleg Trott2y

The Assassination of Trump's Ear is Evidence for Time-Travel

There was an article in New Scientist recently about "sending particles back in time". I was a physics major, but I might have skipped the time travel class, so I don't have an opinion on this. But Sabine Hossenfelder posted a video, arguing that New Scientist misrepresented the actual research.

Replying toThe $100B plan with "70% risk of killing us all" w Stephen Fry [video]

Oleg Trott2y

The $100B plan with "70% risk of killing us all" w Stephen Fry [video]

Side note: the link didn't make it to the front page of HN, despite early upvotes. Other links with worse stats (votes at a certain age) rose to the very top. Anyways, it's currently ranked 78. I guess I don't really understand how HN ranks things. I hope someone will explain this to me. Does the source "youtube" vs "nytimes" matter? Do flag-votes count as silent mega-downvotes? Does the algorithm punish posts with numbers in them?

The $100B plan with "70% risk of killing us all" w Stephen Fry [video]

Oleg Trott

A high production value 16-minute video that summarizes the popular safety concerns, featuring Hinton, Russell and Claude 3.5.

Recursion in AI is scary. But let’s talk solutions.

Oleg Trott

(Originally on substack)

Right now, we have Moore’s law. Every couple of years, computers get twice better. It’s an empirical observation that’s held for many decades and across many technology generations. Whenever it runs into some physical limit, a newer paradigm replaces the old. Superconductors, multiple layers and molecular electronics may sweep aside the upcoming hurdles.

But imagine if people who are doing R&D themselves started working twice faster every couple of years, as if time moved slower for them? We’d get doubly exponential improvement.

And this is what we’ll see once AI is doing the R&D instead of people.

Infinitely powerful technology may give us limitless possibilities: revolutionizing medicine, increasing longevity, and bringing universal prosperity.... (read 519 more words →)

Alignment: "Do what I would have wanted you to do"

Oleg Trott

Yoshua Bengio writes^[1]:

nobody currently knows how such an AGI or ASI could be made to behave morally, or at least behave as intended by its developers and not turn against humans

I think I do^[2]. I believe that the difficulties of alignment arise from trying to control something that can manipulate you. And I think you shouldn't try.

Suppose you have a good ML algorithm (Not the stuff we have today that needs 1000x more data than humans), and you train it as a LM.

There is a way to turn a (very good) LM into a goal-driven chatbot via prompt engineering alone, which I'll assume the readers can figure out. You give it a... (read 154 more words →)

Fix simple mistakes in ARC-AGI, etc.

Oleg Trott

ARC-AGI is a diverse artificial dataset that aims to test general intelligence. It's sort of like an IQ test that's played out on rectangular grids.

Last month, @ryan_greenblatt proposed an approach that used GPT-4o to generate about 8000 Python programs per task. It then selected the programs that worked on the "training" examples given, and ran them to actually solve the "test" query.

His approach achieved 72% accuracy on the part of the benchmark that humans have been measured to get 85% accuracy on.

I have an idea for an improvement, on top of this approach. It should be relatively cheap. I don't have time to work on this myself, but I hope someone else... (read 165 more words →)

I'm a bit skeptical of AlphaFold 3

Oleg Trott

(also on https://olegtrott.substack.com)

So this happened: DeepMind (with 48 authors, including a new member of the British nobility) decided to compete with me. Or rather, with some of my work from 10+ years ago.

Apparently, AlphaFold 3 can now predict how a given drug-like molecule will bind to its target protein. And it does so better than AutoDock Vina (the most cited molecular docking program, which I built at Scripps Research):

On top of this, it doesn’t even need a 3D structure of the target. It predicts it too!

But I’m a bit skeptical. I’ll try to explain why.

Consider a hypothetical scientific dataset where all data is duplicated: Perhaps the scientists had trust issues and tried... (read 492 more words →)