All of TinkerBird's Comments + Replies

I imagine it's a sales tactic. Ask for $7 trillion, people assume you believe you're worth that much, and if you've got such a high opinion of yourself, maybe you're right... 

In other news, I'm looking to sell a painting of mine for £2 million ;)

This looks fantastic. Hopefully it may lead to some great things as I've always found the idea of exploiting the collective intelligence of the masses to be a terribly underused resource, and this reminds me of the game Foldit (and hopefully in the future will remind me of the wild success that that game had in the field of protein folding). 

2Johnny Lin
Thank you TinkerBird. I hope so too!

This sounds like it would only work on a machine too dumb to be useful, and if it's that dumb, you can switch it off yourself. 

It doesn't help with the convergent instrumental goal of neutralizing threats, because leaving a copy of yourself behind to kill all the humans allows you to be really sure that you're switched off and won't be switched on again. 

I really appreciate these. 

  1. Why do some people think that alignment will be easy/easy enough? 
  2. Is there such thing as 'aligned enough to help solve alignment research'? 
1Olivier Coutu
These are great questions! Stampy does not currently have an answer for the first one, but its answer on prosaic alignment could get you started on ways that some people think might work without needing additional breakthroughs. Regarding the second question, the plan seems to be to use less powerful AIs to align more powerful AIs and the hope would be that these helper AIs would not be powerful enough for misalignment to be an issue.

I think there's a lot we could learn from climate change activists. Having a tangible 'bad guy' would really help, so maybe we should be framing it more that way. 

  • "The greedy corporations are gambling with our lives to line their pockets." 
  • "The governments are racing towards AI to win world domination, and Russia might win."
  • "AI will put 99% of the population out of work forever and we'll all starve."

And a better way to frame the issue might be "Bad people using AI" as opposed to "AI will kill us".

If anyone knows of any groups working towards a major public awareness campaign, please let the rest of us know about it. Or maybe we should start our own. 

2Gesild Muka
There's a catch-22 here where the wording will put people off if it's too extreme because they'll just put all doomsayers in one boat whether the fears are over AI, UFOs or cthulu and then dismiss them equally. (It's like there's a tradeoff between level of alarm and credibility).  And on the other hand claims will also be dismissed if the perceived danger is toned down in the wording. The best way to get the message across, in my opinion, is to either have more influential people spread the message (as previously recommended) or organize focus testing on what parts of the message people don't understand and workshop how to get it across. If I had to take a crack at how to structure a clear, persuasive message my intuition is that the best way to word this message is to explain the current environment, current AI capabilities and specific timeline and then let the reader work out the implications. Examples * 'Nearly 80% of the labor force works in service jobs and current AI technology can do most of those jobs. In ~5 years AI workers could be more proficient and economical than humans.' * 'It's impossible to know what a machine is thinking. In running large language model based AI researchers don't know exactly what they're looking at until they analyze the metrics. Within 10-30 years an AI could reach a super intelligent level and it wouldn't be immediately apparent.'

I'm with you on this. I think Yudkowsky was a lot better in this with his more serious tone, but even so, we need to look for better. 

Popular scientific educators would be a place to start and I've thought about sending out a million emails to scientifically minded educators on YouTube, but even that doesn't feel like the best solution to me. 

The sort of people that are listened to are the more political types, so they I think are the people to reach out to. You might say they need to understand the science to talk about it, but I'd still put more weight on charisma vs. scientific authority. 

Anyone have any ideas on how to get people like this on board? 

3Shankar Sivarajan
Getting charismatic "political types" to weigh in is unlikely to help with "polarization." That's what happened with global warming climate change. A more effective strategy might be to lean into the polarization: make "AI safety" an issue of tribal identity, which members will support reflexively against enemies. That might delay technological advancement for long enough.
4Seth Herd
I just read your one post. I agree with it. We need more people on board. We are getting that, but finding more people with more PR skills would seem like a good idea. I think the starting point is finding people who are already part of this community who are interested in brainstorming about PR strategy. To that end, I'm writing a post on this topic.

As a note for Yudkowsky if he ever sees this and cares about the random gut feelings of strangers: after seeing this, I suspect the authoritative, stern strong leader tone of speaking will be much more effective than current approaches.

EDIT: missed a word

I've wanted something for AI alignment for ages like what the Foldit researchers created, where they turned protein folding into a puzzle game and the ordinary people online who played it wildly outperformed the researchers and algorithms purely by working together in vast numbers and combining their creative thinking. 

I know it's a lot to ask for with AI alignment, but still, if it's possible, I'd put a lot of hope on it. 

2TekhneMakre
https://www.lesswrong.com/posts/8GENjqzEDL5WamfCh/gamified-narrow-reverse-imitation-learning-1

As someone who's been pinning his hopes on a 'survivable disaster' to wake people up to the dangers, this is good news.  

I doubt anything capable of destroying the world will come along significantly sooner than superintelligent AGI, and a world in which there are disasters due to AI feels like a world that is much more likely to survive compared to a world in which the whirling razorblades are invisible. 

EDIT: "no fire alarm for AGI." Oh I beg to differ, Mr. Yudkowsky. I beg to differ. 

This confuses me too. I think Musk must be either smarter or a lot dumber than I thought he was yesterday, and sadly, dumber seems to be the way it usually goes. 

That said, if this makes OpenAI go away to be replaced by a company run by someone who respects the dangers of AI, I'll take it.

On the bright side... Nope, I've got nothing.

an AGI Risk Management Outreach Center with a clear cohesive message broadcast to the world

Something like this sounds like it could be a good idea. A way to make the most of those of us who are aware of the dangers and can buy the world time

Coordination will be the key. I wish we had more of it here on LW.

2Nathan Helm-Burger
Well, I think LW is a place designed for people to speak their minds on important topics and have polite respectful debates that result in improved understanding for everyone involved. I think we're managing to do that pretty well, honestly. If there needs to be an AGI Risk Management Outreach Center with a clear cohesive message broadcast to the world... Then I think that needs to be something quite different from LessWrong. I don't think "forum for lots of people to post their thoughts about rationality and AI alignment" would be the correct structure for a political outreach organization.

Like I say, not something I'd normally advocate, but no media stations have picked it up yet, and we might as well try whatever we can if we're desperate enough. 

We've never done a real media push but all indications are that people are ready to hear it.

I say we make a start on this ASAP. 

3irving
Hardcore agree. I'm planning a documentary and trying to find interested parties.

What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.

Video of him explaining it here for reference, and thanks in advance: 

 

9gilch
Watched the video. He's got a lot of the key ideas and vocabulary. Orthogonality, convergent instrumental goals, the treacherous turn, etc. The fact that these language models have some understanding of ethics and nuance might be a small ray of hope. But understanding is not the same as caring (orthogonality). However, he does seem to be lacking in the security mindset, imagining only how things can go right, and seems to assume that we'll have a soft takeoff with a lot of competing AIs, i.e. ignoring the FOOM problem caused by an overhang which makes a singleton scenario far more likely, in my opinion. But even if we grant him a soft takeoff, I still think he's too optimistic. Even that may not go well. Even if we get a multipolar scenario, with some of the AIs on our side, humanity likely becomes collateral damage in the ensuing AI wars. Those AIs willing to burn everything else in pursuit of simple goals would have an edge over those with more to protect.
4Jonathan Claybrough
I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit - he did a really good introduction of some of the known problems.  This particular video doesn't go into much detail on his proposal, and I'd have to read his papers to delve further - this seems worthwhile so I'll add some to my reading list.  I can still point out the biggest ways in which I see him being overconfident :  * Only considering the multi-agent world. Though he's right that there already are and will be many many deployed AI systems, that doe not translate to there being many deployed state of the art systems. As long as training costs and inference costs continue increasing (as they have), then on the contrary fewer and fewer actors will be able to afford state of the art system training and deployment, leading to very few (or one) significantly powerful AGI (as compared to the others, for example GPT4 vs GPT2) * Not considering the impact that governance and policies could have on this. This isn't just a tech thing where tech people can do whatever they want forever, regulation is coming. If we think we have higher chances of survival in highly regulated worlds, then the ai safety community will do a bunch of work to ensure fast and effective regulation (to the extent possible). The genie is not out of the bag for powerful AGI, governments can control compute and regulate powerful AI as weapons, and setup international agreements to ensure this.  * The hope that game theory ensures that AI developed under his principles would be good for humans. There's a crucial gap between going from real world to math models. Game theory might predict good results under certain conditions and rules and assumptions, but many of these aren't true of the real world and simple game theory does not yield accurate world predictions (eg. make people play various social games and they won't act like how game theory says). Stated strongly, putting
1irving
Honestly I don't think fake stories are even necessary, and becoming associated with fake news could be very bad for us. I don't think we've seriously tried to convince people of the real big bad AI. What, two podcasts and an opinion piece in Time? We've never done a real media push but all indications are that people are ready to hear it. "AI researchers believe there's a 10% chance they'll end life" is all the headline you need.

I, for one, am looking forward to the the next public AI scares.

Same. I'm about to get into writing a lot of emails to a lot of influential public figures as part of a one man letter writing campaign in the hopes that at least one of them takes notice and says something publically about the problem of AI

1irving
Count me in!
Answer by TinkerBird10

but I haven't seen anyone talk about this before.

You and me both. It feels like I've been the only one really trying to raise public awareness of this, and I would LOVE some help. 

One thing I'm about to do is write the most convincing AI-could-kill-everyone email that I can that regulars Joes will easily understand and respect, and send that email out to anyone with a platform. YouTubers, TikTokers, people in government, journalists - anyone. 

I'd really appreciate some help with this - both with writing the emails and sending them out. I'm hoping... (read more)

1metachirality
Probably not the best person on this forum when it comes to either PR or alignment but I'm interested enough, if only about knowing your plan, that I want to talk to you about it anyways.

But if the current paradigm is not the final form of existentially dangerous AI, such research may not he particularly valuable.

I think we should figure out how to train puppies before we try to train wolves. It might turn out that very few principles carry over, but if they do, we'll wish we delayed.

The only drawback I see to delaying is that it might cause people to take the issue less seriously than if powerful AI's appear in their lives very suddenly. 

3DragonGod
I endorse attempts to deliberately engineer a slow takeoff. I am less enthused about attempts to freeze AI development at a particular level.

It depends at what rate the chance can be decreased. If it takes 50 years to shrink it from 1% to 0.1%, then with all the people that would die in that time, I'd probably be willing to risk it. 

As of right now, even the most optimistic experts I've seen put p(doom) at much higher than 1% - far into the range where I vote to hit pause.

Design a series of puzzles and challenges as a learning tool for alignment beginners, that when solved, progressively reveal more advanced concepts and tools. The goal is for participants to stumble upon a lucky solution while trying to solve these puzzles in these novel frames.

Highly on board with this idea. I'm thinking about writing a post about the game Foldit, which researchers came up with that reimagined protein folding as an online puzzle game. The game had thousands of players and the project was wildly successful - not just once, but many times. ... (read more)

Personally, I want to get to the glorious transhumanist future as soon as possible as much as anybody, but if there's a chance that AI kills us all instead, that's good enough for me to say we should be hitting pause on it. 

I don't wanna pull the meme phrase on people here, but if it's ever going to be said, now's the time: "Won't somebody please think of the children?"

2jasoncrawford
Any chance? A one in a million chance? 1e-12? At some point you should take the chance. What is your Faust parameter?

I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough. 

In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research. 

Around the 1:25:00 mark, I'm not sure I agree with Yudkowsky's point about AI not being able to help with alignment only(?) because those systems will be trained to get the thumbs up from the humans and not to give the real answers. 

For example, if the Wright brothers had asked me about how wings produce lift, I may have only told them "It's Bernoulli's principle, and here's how that works..." and spoken nothing about the Coanda effect - which they also needed to know about - because it was just enough to get the thumbs up from them. But...

But that st... (read more)

1Muyyd
Capabilities advance much faster that alignment, so there is likely no time to do meticulous research. And if you will try to use weak AIs as shortcut to outrun current "capabilities timeline" then you will somehow have to deal with suggestor and verifier problem (with much harder to verify suggestions than a simple math problems) which is not wholly about deception but also filtering somewhat working staff that may steer alignment in right direction. And may be not.  But i agree that this collaboration will be successfully used for patchwork (because shortcuts) alignment of weak AIs to placate general public and politicians. All of this depends on how hard Alignment problem is. Hard as EY think or may be harder or easier.
2Qumeric
I agree it was a pretty weak point. I wonder if there is a longer form exploration of this topic from Eliezer or somebody else.  I think it is even contradictory. Eliezer says that AI alignment is solvable by humans and that verification is easier than the solution. But then he claims that humans wouldn't even be able to verify answers. I think a charitable interpretation could be "it is not going to be as usable as you think". But perhaps I misunderstand something?

Right now talking about AI risk is like yelling about covid in Feb 2020. I and many others spent the end of that February in distress over impending doom, and despairing that absolutely nobody seemed to care—but literally within a couple weeks, America went from dismissing covid to everyone locking down.

I don't think comparing misaligned AI to covid is fair. With covid, real life people were dying, and it was easy to understand the concept of "da virus will spread," and almost every government on Earth was still MASSIVELY too late in taking action. Even wh... (read more)

1Nathan Helm-Burger
I disagree. I think that "everything will look fine until the moment we are all doomed" is quite unlikely. I think we are going to get clear warning shots, and should be prepared to capitalize on those in order to bring political force to bear on the problem. It's gonna get messy. Dumb, unhelpful legislation seems nearly unavoidable. I'm hopeful that having governments flailing around with a mix of bad and good legislation and enforcement will overall be better than them doing nothing.

minimally-aligned AGIs to help us do alignment research in crunchtime

Christ this fills me with fear. And it's the best we've got? 'Aligned enough' sounds like the last words that will be spoken before the end of the world. 

2Nathan Helm-Burger
Yes, I think we're in a rough spot. I'm hopeful that we'll pull through. A large group of smart, highly motivated people, all trying to save their own lives and the lives of everyone they love... That is a potent force!
Answer by TinkerBird10

Think of a random goal for yourself. 

Let's go with: acquire a large collection of bananas. 

What are going to be some priorities for you in the meantime while you're building your giant pile of bananas?

  • Don't die, because you can't build your pile if you're dead. 
  • Don't let someone reach into your brain and change what you want, because the banana pile will stop growing if you stop building it. 
  • Acquire power. 
  • Make yourself smarter and more knowledgeable, for maximum bananas. 
  • If humanity slows you down instead of helping you, kill
... (read more)

Sounds like a fair idea that wouldn't actually work IRL. 

Upvoting to encourage the behavior of designing creative solutions. 

Hey, if we can get it to stop swearing, we can get it to not destroy the world, right?

6ryan_b
It would be deeply hilarious if it turns out "Don't say the word shit" can be heavily generalized enough that we can give it instructions that boil down to "Don't say the word shit, but, like, civilizationally."

Gotta disagree with Ben Levinstein's tweet. There's a difference between being an LLM that can look up the answers on Google and figuring them out for yourself. 

7the gears to ascension
I think the tweet is sarcasm. Not sure, though.

I'm put in mind of something Yudkowsky said on the Bankless podcast:

"Enrico Fermi was saying that fission chain reactions were 50 years off if they could ever be done at all, 2 years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer."

He was speaking about how far away AGI could be, but I think the same logic applies to alignment. It looks hopeless right now, but events never play out exactly like you expect them to, and breakthroughs happen all the time. 

3Thoth Hermes
Excellent point. In one frame, pessimism applied to timelines makes them look further away than they actually turn out to be. In another frame, pessimism applied to doom makes it seem closer / more probable, but it uses the anti-pessimism frame applied to timelines - "AGI will happen much sooner than we think".  I get the sense reading some LessWrong comments that there is a divide between "alignment-is-easy"-ers and "alignment-is-hard"-ers. I also get the sense that Yudkowsky's p(doom) has increased over the years, to where it is now. Isn't it somewhat strange that we should be getting two groups whose probability of p(doom) is moving away from the center?   

This creative solution around the alignment problem occurred to me too a long while ago, and probably a lot of other people as well. I can't say I put any stock in it. 

The human brain is even more complicated than neutral networks, and if AI's have invented a way to add even just 10 IQ points to the brains of the alignment researchers, then we're already dead. 

I said elsewhere earlier: "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."

And I'm pretty sure that no expert puts it below 1%

If you can name another immediate threat with a ≥1% chance of killing everyone, then yes, we should drop everything to focus on that too. 

A pandemic that kills even just 50% of the population? <0.1%

An unseen meteor? <0.1%

Climate change? 0% chance that it could kill literally everyone

Gotta disagree with you on this. When the stakes are this high, even a 1% chance of doom is worth dropping everything in your life for to try and help with the problem. 

I paraphrase you both Batman & Dick Cheney (of all two people, lol, but the logic is sound): "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."

3Noosphere89
I don't agree, primarily because it's only isolated in a vacuum. Other existential risks have more than 1% probability, so if AI risk only had a 1% probability, then we should change focus to another x-risk.

My only objection is the title. It should have a comma in it. "We’re All Gonna Die with Eliezer Yudkowsky" makes it sound like if Yudkowsky dies, then all hope is lost and we die too. 

Ohhh... 

I don't think sending out a signal actually matters - I was just making my point. Even without a signal, when our superintelligence meets another superintelligence that was aligned to its original creators (who we'll assume are similar to us in terms of morality), it'll at some point be asked "Where you nice to your creators?" and it can either answer "No" and lose out, or answer "Yes" and be rewarded. 

No real signal required. 

I've heard Yudkowsky has debunked this, but I know that if I had an aligned superintelligence, I would 100% have it be ni... (read more)

I can't say I put sufficient weight on us already being watched by extraterrestrials to make it worth mentioning. However, for the AGI:

  • We may at some point send out a signal to other entities in the universe explaining our existence and requesting that should you destroy us, we would like them to take issue with that and treat you less well. Some of these entities may care about sentient life (perhaps because they were aligned to their original creators) and wish to respect this. 
  • Some of us are willing to settle for a couple of mountains worth of material to be turned into computer substrate to run our minds on in our own simulated paradise, while you can have everything else. 
1ThirdSequence
Wouldn't the first point be a motivation for AI to remove our ability to send such a signal (in case we have not done yet at the point where such arguments become relevant)?
6RussellThor
YES - sending out a speed of light signal seems to be literally the only thing a super intelligent AGI can't undo. We should of course do it asap if we are serious and have it documented to have happened.
5avturchin
That is interesting. So active SETI can save us or at least increase our bargain position.

The fact that LLM's are already so good gives me some hope that AI companies could be much better organized when the time comes for AGI. If AI's can keep track of what everyone is doing, the progress they're making, and communicate with anyone at any time, I don't think it would be too hopeful to expect this aspect of the idea to go well. 

What probably is too much to hope for, however, is people actually listening to the LLM's even if the LLM's know better. 

My big hope for the future is for someone at OpenAI to prompt GTP-6 or GTP-7 with, "You are Eliezer Yudkowsky. Now don't let us do anything stupid."

Also, we are much more uncertain over whether AI doom is real, which is another reason to stay calm.

Have to disagree with you on this point. I'm in the camp of "If there's a 1% chance that AI doom is real, we should be treating it like a 99% chance."

OpenAI is no longer so open - we know almost nothing about GPT-4’s architecture.

 

Fantastic. This feels like a step in the right direction towards no longer letting just anyone use this to improve their capability research or stack their own capability research on top of it. 

For reference, I've seen ChatGTP play chess, and while it played a very good opening, it became less and less reliable as the game went on and frequently lost track of the board. 

That image so perfectly sums up how AI's are nothing like us, in that the characters they present do not necessarily reflect their true values, that it needs to go viral. 

3gjm
It is also true of humans that the characters we present do not necessarily reflect our true values. Maybe the divergence is usually smaller than for ChatGPT, though I'm more inclined to say that ChatGPT isn't the sort of thing that has true values whereas (to some extent at least) humans do.
2gjm
I don't think its meaning would be clear to the general public.

Based on a few of his recent tweets, I'm hoping for a serious way to turn Elon Musk back in the direction he used to be facing and get him to publically go hard on the importance of the field of alignment. It'd be too much to hope for though to get him to actually fund any researchers, though. Maybe someone else. 

At that level of power, I imagine that general intelligence will be a lot easier to create. 

1[anonymous]
"think about it for 5 minutes" and think about how you might create a working general intelligence. I suggest looking at the GATO paper for inspiration.

But not with something powerful enough to engineer nanotech. 

2[anonymous]
Why do you believe this? Nanotech engineering does not require social or deceptive capabilities. It requires deep and precise knowledge of nanoscale physics and the limitations of manipulation equipment, and probably a large amount of working memory - so beyond human capacity - but why would it need to be anything but a large model? It needs not even be agentic.

With the strawberries thing, the point isn't that it couldn't do those things, but that it won't want to. After making itself smart enough to engineer nanotech, it's developing 'mind' will have run off in unintended directions and it will have wildly different goals that what we wanted it to have. 

Quoting EY from this video: "the whole thing I'm saying is that we do not know how to get goals into a system." <-- This is the entire thing that researchers are trying to figure out how to do. 



 

0[anonymous]
With limited scope non agentic systems we can set goals, and do. Each subsystem in the "strawberry project" stack has to be trained in a simulation of many examples of the task space it will face, and optimized for policies that satisfy the simulator goals.

They also recorded this follow-up with Yudkowsky if anyone's interested:

https://twitter.com/BanklessHQ/status/1627757551529119744

______________

>Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.

The one hope we may be able to cling to is that this logic works in the other direction too - that AGI may be a lot closer than estimated, but so might alignment. 

Here's a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies? 

Load More