LESSWRONG
LW

All of TinkerBird's Comments + Replies

I imagine it's a sales tactic. Ask for $7 trillion, people assume you believe you're worth that much, and if you've got such a high opinion of yourself, maybe you're right...

In other news, I'm looking to sell a painting of mine for £2 million ;)

Neuronpedia

TinkerBird2y50

This looks fantastic. Hopefully it may lead to some great things as I've always found the idea of exploiting the collective intelligence of the masses to be a terribly underused resource, and this reminds me of the game Foldit (and hopefully in the future will remind me of the wild success that that game had in the field of protein folding).

2Johnny Lin2y

Thank you TinkerBird. I hope so too!

Mr. Meeseeks as an AI capability tripwire

TinkerBird2y10

This sounds like it would only work on a machine too dumb to be useful, and if it's that dumb, you can switch it off yourself.

It doesn't help with the convergent instrumental goal of neutralizing threats, because leaving a copy of yourself behind to kill all the humans allows you to be really sure that you're switched off and won't be switched on again.

All AGI Safety questions welcome (especially basic ones) [May 2023]

TinkerBird2y10

I really appreciate these.

Why do some people think that alignment will be easy/easy enough?
Is there such thing as 'aligned enough to help solve alignment research'?

1Olivier Coutu2y

These are great questions! Stampy does not currently have an answer for the first one, but its answer on prosaic alignment could get you started on ways that some people think might work without needing additional breakthroughs. Regarding the second question, the plan seems to be to use less powerful AIs to align more powerful AIs and the hope would be that these helper AIs would not be powerful enough for misalignment to be an issue.

TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence

TinkerBird2y-2-7

I think there's a lot we could learn from climate change activists. Having a tangible 'bad guy' would really help, so maybe we should be framing it more that way.

"The greedy corporations are gambling with our lives to line their pockets."
"The governments are racing towards AI to win world domination, and Russia might win."
"AI will put 99% of the population out of work forever and we'll all starve."

And a better way to frame the issue might be "Bad people using AI" as opposed to "AI will kill us".

If anyone knows of any groups working towards a major public awareness campaign, please let the rest of us know about it. Or maybe we should start our own.

2Gesild Muka2y

There's a catch-22 here where the wording will put people off if it's too extreme because they'll just put all doomsayers in one boat whether the fears are over AI, UFOs or cthulu and then dismiss them equally. (It's like there's a tradeoff between level of alarm and credibility). And on the other hand claims will also be dismissed if the perceived danger is toned down in the wording. The best way to get the message across, in my opinion, is to either have more influential people spread the message (as previously recommended) or organize focus testing on what parts of the message people don't understand and workshop how to get it across. If I had to take a crack at how to structure a clear, persuasive message my intuition is that the best way to word this message is to explain the current environment, current AI capabilities and specific timeline and then let the reader work out the implications. Examples * 'Nearly 80% of the labor force works in service jobs and current AI technology can do most of those jobs. In ~5 years AI workers could be more proficient and economical than humans.' * 'It's impossible to know what a machine is thinking. In running large language model based AI researchers don't know exactly what they're looking at until they analyze the metrics. Within 10-30 years an AI could reach a super intelligent level and it wouldn't be immediately apparent.'

TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence

TinkerBird2y41

I'm with you on this. I think Yudkowsky was a lot better in this with his more serious tone, but even so, we need to look for better.

Popular scientific educators would be a place to start and I've thought about sending out a million emails to scientifically minded educators on YouTube, but even that doesn't feel like the best solution to me.

The sort of people that are listened to are the more political types, so they I think are the people to reach out to. You might say they need to understand the science to talk about it, but I'd still put more weight on charisma vs. scientific authority.

Anyone have any ideas on how to get people like this on board?

3Shankar Sivarajan2y

Getting charismatic "political types" to weigh in is unlikely to help with "polarization." That's what happened with global warming climate change. A more effective strategy might be to lean into the polarization: make "AI safety" an issue of tribal identity, which members will support reflexively against enemies. That might delay technological advancement for long enough.

4Seth Herd2y

I just read your one post. I agree with it. We need more people on board. We are getting that, but finding more people with more PR skills would seem like a good idea. I think the starting point is finding people who are already part of this community who are interested in brainstorming about PR strategy. To that end, I'm writing a post on this topic.

TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence

TinkerBird2y*44

As a note for Yudkowsky if he ever sees this and cares about the random gut feelings of strangers: after seeing this, I suspect the authoritative, stern strong leader tone of speaking will be much more effective than current approaches.

EDIT: missed a word

United We Align: Harnessing Collective Human Intelligence for AI Alignment Progress

TinkerBird2y61

I've wanted something for AI alignment for ages like what the Foldit researchers created, where they turned protein folding into a puzzle game and the ordinary people online who played it wildly outperformed the researchers and algorithms purely by working together in vast numbers and combining their creative thinking.

I know it's a lot to ask for with AI alignment, but still, if it's possible, I'd put a lot of hope on it.

2TekhneMakre2y

https://www.lesswrong.com/posts/8GENjqzEDL5WamfCh/gamified-narrow-reverse-imitation-learning-1

Top lesson from GPT: we will probably destroy humanity "for the lulz" as soon as we are able.

TinkerBird2y10

As someone who's been pinning his hopes on a 'survivable disaster' to wake people up to the dangers, this is good news.

I doubt anything capable of destroying the world will come along significantly sooner than superintelligent AGI, and a world in which there are disasters due to AI feels like a world that is much more likely to survive compared to a world in which the whirling razorblades are invisible.

EDIT: "no fire alarm for AGI." Oh I beg to differ, Mr. Yudkowsky. I beg to differ.

[linkpost] Elon Musk plans AI start-up to rival OpenAI

TinkerBird2y0-3

This confuses me too. I think Musk must be either smarter or a lot dumber than I thought he was yesterday, and sadly, dumber seems to be the way it usually goes.

That said, if this makes OpenAI go away to be replaced by a company run by someone who respects the dangers of AI, I'll take it.

[linkpost] Elon Musk plans AI start-up to rival OpenAI

TinkerBird2y1111

On the bright side... Nope, I've got nothing.

Nobody’s on the ball on AGI alignment

TinkerBird2y32

an AGI Risk Management Outreach Center with a clear cohesive message broadcast to the world

Something like this sounds like it could be a good idea. A way to make the most of those of us who are aware of the dangers and can buy the world time

Nobody’s on the ball on AGI alignment

TinkerBird2y3-3

Coordination will be the key. I wish we had more of it here on LW.

2Nathan Helm-Burger2y

Well, I think LW is a place designed for people to speak their minds on important topics and have polite respectful debates that result in improved understanding for everyone involved. I think we're managing to do that pretty well, honestly. If there needs to be an AGI Risk Management Outreach Center with a clear cohesive message broadcast to the world... Then I think that needs to be something quite different from LessWrong. I don't think "forum for lots of people to post their thoughts about rationality and AI alignment" would be the correct structure for a political outreach organization.

Catching the Eye of Sauron

TinkerBird2y10

Like I say, not something I'd normally advocate, but no media stations have picked it up yet, and we might as well try whatever we can if we're desperate enough.

We've never done a real media push but all indications are that people are ready to hear it.

I say we make a start on this ASAP.

3irving2y

Hardcore agree. I'm planning a documentary and trying to find interested parties.

All AGI Safety questions welcome (especially basic ones) [April 2023]

TinkerBird2y50

What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.

Video of him explaining it here for reference, and thanks in advance:

9gilch2y

Watched the video. He's got a lot of the key ideas and vocabulary. Orthogonality, convergent instrumental goals, the treacherous turn, etc. The fact that these language models have some understanding of ethics and nuance might be a small ray of hope. But understanding is not the same as caring (orthogonality). However, he does seem to be lacking in the security mindset, imagining only how things can go right, and seems to assume that we'll have a soft takeoff with a lot of competing AIs, i.e. ignoring the FOOM problem caused by an overhang which makes a singleton scenario far more likely, in my opinion. But even if we grant him a soft takeoff, I still think he's too optimistic. Even that may not go well. Even if we get a multipolar scenario, with some of the AIs on our side, humanity likely becomes collateral damage in the ensuing AI wars. Those AIs willing to burn everything else in pursuit of simple goals would have an edge over those with more to protect.

4Jonathan Claybrough2y

I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit - he did a really good introduction of some of the known problems. This particular video doesn't go into much detail on his proposal, and I'd have to read his papers to delve further - this seems worthwhile so I'll add some to my reading list. I can still point out the biggest ways in which I see him being overconfident : * Only considering the multi-agent world. Though he's right that there already are and will be many many deployed AI systems, that doe not translate to there being many deployed state of the art systems. As long as training costs and inference costs continue increasing (as they have), then on the contrary fewer and fewer actors will be able to afford state of the art system training and deployment, leading to very few (or one) significantly powerful AGI (as compared to the others, for example GPT4 vs GPT2) * Not considering the impact that governance and policies could have on this. This isn't just a tech thing where tech people can do whatever they want forever, regulation is coming. If we think we have higher chances of survival in highly regulated worlds, then the ai safety community will do a bunch of work to ensure fast and effective regulation (to the extent possible). The genie is not out of the bag for powerful AGI, governments can control compute and regulate powerful AI as weapons, and setup international agreements to ensure this. * The hope that game theory ensures that AI developed under his principles would be good for humans. There's a crucial gap between going from real world to math models. Game theory might predict good results under certain conditions and rules and assumptions, but many of these aren't true of the real world and simple game theory does not yield accurate world predictions (eg. make people play various social games and they won't act like how game theory says). Stated strongly, putting

Catching the Eye of Sauron

[+]TinkerBird2y-10-13

1irving2y

Honestly I don't think fake stories are even necessary, and becoming associated with fake news could be very bad for us. I don't think we've seriously tried to convince people of the real big bad AI. What, two podcasts and an opinion piece in Time? We've never done a real media push but all indications are that people are ready to hear it. "AI researchers believe there's a 10% chance they'll end life" is all the headline you need.

AI scares and changing public beliefs

TinkerBird2y42

I, for one, am looking forward to the the next public AI scares.

Same. I'm about to get into writing a lot of emails to a lot of influential public figures as part of a one man letter writing campaign in the hopes that at least one of them takes notice and says something publically about the problem of AI

1irving2y

Count me in!

Has anyone thought about how to proceed now that AI notkilleveryoneism is becoming more relevant/is approaching the Overton window?

TinkerBird2y10

PM's are always open, my guy

Has anyone thought about how to proceed now that AI notkilleveryoneism is becoming more relevant/is approaching the Overton window?

Answer by TinkerBirdApr 05, 202310

but I haven't seen anyone talk about this before.

You and me both. It feels like I've been the only one really trying to raise public awareness of this, and I would LOVE some help.

One thing I'm about to do is write the most convincing AI-could-kill-everyone email that I can that regulars Joes will easily understand and respect, and send that email out to anyone with a platform. YouTubers, TikTokers, people in government, journalists - anyone.

I'd really appreciate some help with this - both with writing the emails and sending them out. I'm hoping... (read more)

1metachirality2y

Probably not the best person on this forum when it comes to either PR or alignment but I'm interested enough, if only about knowing your plan, that I want to talk to you about it anyways.

AI Summer Harvest

TinkerBird2y51

But if the current paradigm is not the final form of existentially dangerous AI, such research may not he particularly valuable.

I think we should figure out how to train puppies before we try to train wolves. It might turn out that very few principles carry over, but if they do, we'll wish we delayed.

The only drawback I see to delaying is that it might cause people to take the issue less seriously than if powerful AI's appear in their lives very suddenly.

3DragonGod2y

I endorse attempts to deliberately engineer a slow takeoff. I am less enthused about attempts to freeze AI development at a particular level.

Wizards and prophets of AI [draft for comment]

TinkerBird2y10

It depends at what rate the chance can be decreased. If it takes 50 years to shrink it from 1% to 0.1%, then with all the people that would die in that time, I'd probably be willing to risk it.

As of right now, even the most optimistic experts I've seen put p(doom) at much higher than 1% - far into the range where I vote to hit pause.

AI Safety via Luck

TinkerBird2y101

Design a series of puzzles and challenges as a learning tool for alignment beginners, that when solved, progressively reveal more advanced concepts and tools. The goal is for participants to stumble upon a lucky solution while trying to solve these puzzles in these novel frames.

Highly on board with this idea. I'm thinking about writing a post about the game Foldit, which researchers came up with that reimagined protein folding as an online puzzle game. The game had thousands of players and the project was wildly successful - not just once, but many times. ... (read more)

Wizards and prophets of AI [draft for comment]

TinkerBird2y40

Personally, I want to get to the glorious transhumanist future as soon as possible as much as anybody, but if there's a chance that AI kills us all instead, that's good enough for me to say we should be hitting pause on it.

I don't wanna pull the meme phrase on people here, but if it's ever going to be said, now's the time: "Won't somebody please think of the children?"

2jasoncrawford2y

Any chance? A one in a million chance? 1e-12? At some point you should take the chance. What is your Faust parameter?

Stop pushing the bus

TinkerBird2y43

I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough.

In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

TinkerBird2y21

Around the 1:25:00 mark, I'm not sure I agree with Yudkowsky's point about AI not being able to help with alignment only(?) because those systems will be trained to get the thumbs up from the humans and not to give the real answers.

For example, if the Wright brothers had asked me about how wings produce lift, I may have only told them "It's Bernoulli's principle, and here's how that works..." and spoken nothing about the Coanda effect - which they also needed to know about - because it was just enough to get the thumbs up from them. But...

But that st... (read more)

1Muyyd2y

Capabilities advance much faster that alignment, so there is likely no time to do meticulous research. And if you will try to use weak AIs as shortcut to outrun current "capabilities timeline" then you will somehow have to deal with suggestor and verifier problem (with much harder to verify suggestions than a simple math problems) which is not wholly about deception but also filtering somewhat working staff that may steer alignment in right direction. And may be not. But i agree that this collaboration will be successfully used for patchwork (because shortcuts) alignment of weak AIs to placate general public and politicians. All of this depends on how hard Alignment problem is. Hard as EY think or may be harder or easier.

2Qumeric2y

I agree it was a pretty weak point. I wonder if there is a longer form exploration of this topic from Eliezer or somebody else. I think it is even contradictory. Eliezer says that AI alignment is solvable by humans and that verification is easier than the solution. But then he claims that humans wouldn't even be able to verify answers. I think a charitable interpretation could be "it is not going to be as usable as you think". But perhaps I misunderstand something?

Nobody’s on the ball on AGI alignment

TinkerBird2y4-2

Right now talking about AI risk is like yelling about covid in Feb 2020. I and many others spent the end of that February in distress over impending doom, and despairing that absolutely nobody seemed to care—but literally within a couple weeks, America went from dismissing covid to everyone locking down.

I don't think comparing misaligned AI to covid is fair. With covid, real life people were dying, and it was easy to understand the concept of "da virus will spread," and almost every government on Earth was still MASSIVELY too late in taking action. Even wh... (read more)

1Nathan Helm-Burger2y

I disagree. I think that "everything will look fine until the moment we are all doomed" is quite unlikely. I think we are going to get clear warning shots, and should be prepared to capitalize on those in order to bring political force to bear on the problem. It's gonna get messy. Dumb, unhelpful legislation seems nearly unavoidable. I'm hopeful that having governments flailing around with a mix of bad and good legislation and enforcement will overall be better than them doing nothing.

Nobody’s on the ball on AGI alignment

TinkerBird2y22

minimally-aligned AGIs to help us do alignment research in crunchtime

Christ this fills me with fear. And it's the best we've got? 'Aligned enough' sounds like the last words that will be spoken before the end of the world.

2Nathan Helm-Burger2y

Yes, I think we're in a rough spot. I'm hopeful that we'll pull through. A large group of smart, highly motivated people, all trying to save their own lives and the lives of everyone they love... That is a potent force!

Why does advanced AI want not to be shut down?

Answer by TinkerBirdMar 28, 202310

Think of a random goal for yourself.

Let's go with: acquire a large collection of bananas.

What are going to be some priorities for you in the meantime while you're building your giant pile of bananas?

Don't die, because you can't build your pile if you're dead.
Don't let someone reach into your brain and change what you want, because the banana pile will stop growing if you stop building it.
Acquire power.
Make yourself smarter and more knowledgeable, for maximum bananas.
If humanity slows you down instead of helping you, kill

... (read more)

Aligned AI as a wrapper around an LLM

TinkerBird2y01

Sounds like a fair idea that wouldn't actually work IRL.

Upvoting to encourage the behavior of designing creative solutions.

Good News, Everyone!

TinkerBird2y2211

Hey, if we can get it to stop swearing, we can get it to not destroy the world, right?

6ryan_b2y

It would be deeply hilarious if it turns out "Don't say the word shit" can be heavily generalized enough that we can give it instructions that boil down to "Don't say the word shit, but, like, civilizationally."

Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4

TinkerBird2y0-2

Gotta disagree with Ben Levinstein's tweet. There's a difference between being an LLM that can look up the answers on Google and figuring them out for yourself.

7the gears to ascension2y

I think the tweet is sarcasm. Not sure, though.

TinkerBird2y3122

I'm put in mind of something Yudkowsky said on the Bankless podcast:

"Enrico Fermi was saying that fission chain reactions were 50 years off if they could ever be done at all, 2 years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer."

He was speaking about how far away AGI could be, but I think the same logic applies to alignment. It looks hopeless right now, but events never play out exactly like you expect them to, and breakthroughs happen all the time.

3Thoth Hermes2y

Excellent point. In one frame, pessimism applied to timelines makes them look further away than they actually turn out to be. In another frame, pessimism applied to doom makes it seem closer / more probable, but it uses the anti-pessimism frame applied to timelines - "AGI will happen much sooner than we think". I get the sense reading some LessWrong comments that there is a divide between "alignment-is-easy"-ers and "alignment-is-hard"-ers. I also get the sense that Yudkowsky's p(doom) has increased over the years, to where it is now. Isn't it somewhat strange that we should be getting two groups whose probability of p(doom) is moving away from the center?

We have to Upgrade

TinkerBird2y2-5

This creative solution around the alignment problem occurred to me too a long while ago, and probably a lot of other people as well. I can't say I put any stock in it.

The human brain is even more complicated than neutral networks, and if AI's have invented a way to add even just 10 IQ points to the brains of the alignment researchers, then we're already dead.

The Overton Window widens: Examples of AI risk in the media

TinkerBird2y139

I said elsewhere earlier: "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."

And I'm pretty sure that no expert puts it below 1%

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

TinkerBird2y60

If you can name another immediate threat with a ≥1% chance of killing everyone, then yes, we should drop everything to focus on that too.

A pandemic that kills even just 50% of the population? <0.1%

An unseen meteor? <0.1%

Climate change? 0% chance that it could kill literally everyone

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

TinkerBird2y0-6

Gotta disagree with you on this. When the stakes are this high, even a 1% chance of doom is worth dropping everything in your life for to try and help with the problem.

I paraphrase you both Batman & Dick Cheney (of all two people, lol, but the logic is sound): "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."

3Noosphere892y

I don't agree, primarily because it's only isolated in a vacuum. Other existential risks have more than 1% probability, so if AI risk only had a 1% probability, then we should change focus to another x-risk.

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

TinkerBird2y31

My only objection is the title. It should have a comma in it. "We’re All Gonna Die with Eliezer Yudkowsky" makes it sound like if Yudkowsky dies, then all hope is lost and we die too.

Ohhh...

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

TinkerBird2y10

I don't think sending out a signal actually matters - I was just making my point. Even without a signal, when our superintelligence meets another superintelligence that was aligned to its original creators (who we'll assume are similar to us in terms of morality), it'll at some point be asked "Where you nice to your creators?" and it can either answer "No" and lose out, or answer "Yes" and be rewarded.

No real signal required.

I've heard Yudkowsky has debunked this, but I know that if I had an aligned superintelligence, I would 100% have it be ni... (read more)

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

TinkerBird2y84

I can't say I put sufficient weight on us already being watched by extraterrestrials to make it worth mentioning. However, for the AGI:

We may at some point send out a signal to other entities in the universe explaining our existence and requesting that should you destroy us, we would like them to take issue with that and treat you less well. Some of these entities may care about sentient life (perhaps because they were aligned to their original creators) and wish to respect this.
Some of us are willing to settle for a couple of mountains worth of material to be turned into computer substrate to run our minds on in our own simulated paradise, while you can have everything else.

1ThirdSequence 2y

Wouldn't the first point be a motivation for AI to remove our ability to send such a signal (in case we have not done yet at the point where such arguments become relevant)?

6RussellThor2y

YES - sending out a speed of light signal seems to be literally the only thing a super intelligent AGI can't undo. We should of course do it asap if we are serious and have it documented to have happened.

5avturchin2y

That is interesting. So active SETI can save us or at least increase our bargain position.

"Carefully Bootstrapped Alignment" is organizationally hard

TinkerBird2y5-2

The fact that LLM's are already so good gives me some hope that AI companies could be much better organized when the time comes for AGI. If AI's can keep track of what everyone is doing, the progress they're making, and communicate with anyone at any time, I don't think it would be too hopeful to expect this aspect of the idea to go well.

What probably is too much to hope for, however, is people actually listening to the LLM's even if the LLM's know better.

My big hope for the future is for someone at OpenAI to prompt GTP-6 or GTP-7 with, "You are Eliezer Yudkowsky. Now don't let us do anything stupid."

Here, have a calmness video

TinkerBird2y3-3

Also, we are much more uncertain over whether AI doom is real, which is another reason to stay calm.

Have to disagree with you on this point. I'm in the camp of "If there's a 1% chance that AI doom is real, we should be treating it like a 99% chance."

GPT-4: What we (I) know about it

TinkerBird2y12

OpenAI is no longer so open - we know almost nothing about GPT-4’s architecture.

Fantastic. This feels like a step in the right direction towards no longer letting just anyone use this to improve their capability research or stack their own capability research on top of it.

Sydney can play chess and kind of keep track of the board state

TinkerBird2y20

For reference, I've seen ChatGTP play chess, and while it played a very good opening, it became less and less reliable as the game went on and frequently lost track of the board.

What does Bing Chat tell us about AI risk?

TinkerBird2y5-2

That image so perfectly sums up how AI's are nothing like us, in that the characters they present do not necessarily reflect their true values, that it needs to go viral.

3gjm2y

It is also true of humans that the characters we present do not necessarily reflect our true values. Maybe the divergence is usually smaller than for ChatGPT, though I'm more inclined to say that ChatGPT isn't the sort of thing that has true values whereas (to some extent at least) humans do.

2gjm2y

I don't think its meaning would be clear to the general public.

Fighting For Our Lives - What Ordinary People Can Do

TinkerBird2y10

Based on a few of his recent tweets, I'm hoping for a serious way to turn Elon Musk back in the direction he used to be facing and get him to publically go hard on the importance of the field of alignment. It'd be too much to hope for though to get him to actually fund any researchers, though. Maybe someone else.

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

TinkerBird2y32

At that level of power, I imagine that general intelligence will be a lot easier to create.

1[anonymous]2y

"think about it for 5 minutes" and think about how you might create a working general intelligence. I suggest looking at the GATO paper for inspiration.

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

TinkerBird2y32

But not with something powerful enough to engineer nanotech.

2[anonymous]2y

Why do you believe this? Nanotech engineering does not require social or deceptive capabilities. It requires deep and precise knowledge of nanoscale physics and the limitations of manipulation equipment, and probably a large amount of working memory - so beyond human capacity - but why would it need to be anything but a large model? It needs not even be agentic.

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

TinkerBird2y52

With the strawberries thing, the point isn't that it couldn't do those things, but that it won't want to. After making itself smart enough to engineer nanotech, it's developing 'mind' will have run off in unintended directions and it will have wildly different goals that what we wanted it to have.

Quoting EY from this video: "the whole thing I'm saying is that we do not know how to get goals into a system." <-- This is the entire thing that researchers are trying to figure out how to do.

0[anonymous]2y

With limited scope non agentic systems we can set goals, and do. Each subsystem in the "strawberry project" stack has to be trained in a simulation of many examples of the task space it will face, and optimized for policies that satisfy the simulator goals.

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

TinkerBird2y213

They also recorded this follow-up with Yudkowsky if anyone's interested:

https://twitter.com/BanklessHQ/status/1627757551529119744

______________

>Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.

The one hope we may be able to cling to is that this logic works in the other direction too - that AGI may be a lot closer than estimated, but so might alignment.

Video/animation: Neel Nanda explains what mechanistic interpretability is

TinkerBird2y10

Here's a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies?