LESSWRONG
LW

All of Cody Rushing's Comments + Replies

Hmm, when I imagine "Scheming AI that is not easy to shut down with concerted nation-state effort, are attacking you with bioweapons, but are weak enough such that you can bargain/negotiate with them" I can imagine this outcome inspiring a lot more caution relative to many other worlds where control techniques work well but we can't get any convincing demos/evidence to inspire caution (especially if control techniques inspire overconfidence).

But the 'is currently working on becoming more powerful' part of your statement does carry a lot of weight.

4Buck2mo

People will sure be scared of AI, but the arms race pressure will be very strong, and I think that is a bigger consideration

The Case Against AI Control Research

Cody Rushing2mo42

Control research exclusively cares about intentional deception/scheming; it does not aim to solve any other failure mode.

(nitpick, doesn't address main point of article) I think this is incomplete. Though control research does indeed care a lot about scheming, control can be used more broadly to handle any worst-case deployment behavior. See Josh Clymer's post about Extending control evaluations to non-scheming threats.

How do you deal w/ Super Stimuli?

Cody Rushing2mo20

This might not work well for others, but a thing that's worked well for me has been to (basically) block cheap access to it with anticharities. Introducing friction in general is good

Anthropic: Three Sketches of ASL-4 Safety Case Components

Cody Rushing5mo30

I'm glad to see this. Some initial thoughts about the control safety case:

I don't get how the "monitoring game" is different from the red-team game? Is it because the blue team is limited to only considering protocols that use just monitoring? Or do you not expect to play a full red-team simulation (this would be worrisome)
Option 2 (honeypotting) feels doomed (in isolation). This primarily stems from my belief that making situations that are indistinguishable from deployment is probably hard - the model has control over state in very broad and hard-to-pred

... (read more)

A Narrow Path: a plan to deal with AI extinction risk

Cody Rushing6mo52

Am I correct in interpreting that your definition of "found system" would apply nearly all useful AI systems today such as ChatGPT, as these are algorithms which run on weights that are found with optimization methods such as gradient descent? If so, it is still fairly onerous.

A Narrow Path: a plan to deal with AI extinction risk

Cody Rushing6mo80

Thanks for writing this and proposing a plan. Coincidentally, I drafted a short take here yesterday explaining one complaint I currently have with the safety conditions of this plan. In short, I suspect the “No AIs improving other AIs” criterion isn't worth including within a safety plan: it i) doesn't address that many more marginal threat models (or does so ineffectively) and ii) would be too unpopular to implement (or, alternatively, too weak to be useful).

I think there is a version of this plan with a lower safety tax, with more focus on reactive policy and the other three criterion, that I would be more excited about.

3Andrea_Miotti6mo

Thanks! Do you still think the "No AIs improving other AIs" criterion is too onerous after reading the policy enforcing it in Phase 0? In that policy, we developed the definition of "found systems" to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans. This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.

You can remove GPT2’s LayerNorm by fine-tuning for an hour

Cody Rushing8mo80

Another reason why layernorm is weird (and a shameless plug): the final layernorm also contributes to self-repair in language models

Buck's Shortform

Cody Rushing8mo58

Hmm, this transcript just seems like an example of blatant misalignment? I guess I have a definition of scheming that would imply deceptive alignment - for example, for me to classify Sydney as 'obviously scheming', I would need to see examples of Sydney 1) realizing it is in deployment and thus acting 'misaligned' or 2) realizing it is in training and thus acting 'aligned'.

Buck's Shortform

Cody Rushing9mo58

In what manner was Sydney 'pretty obviously scheming'? Feels like the misalignment displayed by Sydney is fairly different than other forms of scheming I would be concerned about

(if this is a joke, whoops sorry)

gwern8mo235

...Could you quote some of the transcripts of Sydney threatening users, like the original Indian transcript where Sydney is manipulating the user into not reporting it to Microsoft, and explain how you think that it is not "pretty obviously scheming"? I personally struggle to see how those are not 'obviously scheming': those are schemes and manipulation, and they are very bluntly obvious (and most definitely "not amazingly good at it"), so they are obviously scheming. Like... given Sydney's context and capabilities as a LLM with only retrieval access and s... (read more)

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Cody Rushing9mo60

I'm surprised by this reaction. It feels like the intersection between people who have a decent shot of getting hired at OpenAI to do safety research and those who are unaware of the events at OpenAI related to safety are quite low.

1DPiepgrass8mo

I expect there are people who are aware that there was drama but don't know much about it and should be presented with details from safety-conscious people who closely examined what happened.

On Claude 3.5 Sonnet

Cody Rushing9mo72

What Comes Next
Coding got another big leap, both for professionals and amateurs.
Claude is now clearly best. I thought for my own purposes Claude Opus was already best even after GPT-4o, but not for everyone, and it was close. Now it is not so close.
Claude’s market share has always been tiny. Will it start to rapidly expand? To what extent does the market care, when most people didn’t in the past even realize they were using GPT-3.5 instead of GPT-4? With Anthropic not doing major marketing? Presumably adaptation will be slow even if they remain on top, esp

... (read more)

9gwern9mo

As I mentioned on Twitter, this sort of 'truesight' for writers extensively represented in Internet corpora like Robin Hanson, Zvi, or myself, is very unsurprising. Like those slides - there are not a lot of places other than Overcoming Bias in the 2000s that all of those topics are represented. (Hanson has been banging those drums for a long time.)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

Cody Rushing10moΩ133121

It also seems to have led to at least one claim in a policy memo that advocates of AI safety are being silly because mechanistic interpretability was solved.

Small nitpick (I agree with mostly everything else in the post and am glad you wrote it up). This feels like an unfair criticism - I assume you are referring specifically to the statement in their paper that:

Although advocates for AI safety guidelines often allude to the "black box" nature of AI models, where the logic behind their conclusions is not transparent, recent advancements in the AI sect

... (read more)

the gears to ascension10mo1114

It seems at least somewhat reasonable to ask people to write defensively to guard against their statements being misused by adversarial actors. I recognize this is an annoying ask that may have significant overhead, perhaps it will turn out to not be worth the cost.

Neel Nanda10moΩ275858

+1, I think the correct conclusion is "a16z are making bald faced lies to major governments" not "a16z were misled by Anthropic hype"

scasper10moΩ7159

Thanks, I think that these points are helpful and basically fair. Here is one thought, but I don't have any disagreements.

Olah et al. 100% do a good job of noting what remains to be accomplished and that there is a lot more to do. But when people in the public or government get the misconception that mechanistic interpretability has been (or definitely will be) solved, we have to ask where this misconception came from. And I expect that claims like "Sparse autoencoders produce interpretable features for large models" contribute to this.

Stephen Fowler's Shortform

Cody Rushing10mo30

Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you're only concerned with human misuse and not misalignment.

Hmmm, can you point to where you think the grant shows this? I think the following paragraph from the grant seems to indicate otherwise:

When OpenAI launched, it characterized the nature of the risks – and the most appropriate strategies for reducing them – in a w

... (read more)

5Stephen Fowler10mo

"In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks" Yes, I'm interpreting the phrase "may turn out" to be treating the idea with more seriousness than it deserves. Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.

Stupid Question: Why am I getting consistently downvoted?

Cody Rushing1y62

I'm glad to hear you got exposure to the Alignment field in SERI MATS! I still think that your writing reads off as though your ideas misunderstands core alignment problems, so my best feedback then is to share drafts/discuss your ideas with other familiar with the field. My guess is that it would be preferable for you to find people who are critical of your ideas and try to understand why, since it seems like they are representative of the kinds of people who are downvoting your posts.

Stupid Question: Why am I getting consistently downvoted?

Answer by Cody RushingNov 30, 202352

(preface: writing and communicating is hard and that i'm glad you are trying to improve)

i sampled two:

this post was hard to follow, and didn't seem to be very serious. it also reads off as unfamiliar with the basics of the AI Alignment problem (the proposed changes to gpt-4 don't concretely address many/any of the core Alignment concerns for reasons addressed by other commentors)

this post makes multiple (self-proclaimed controversial) claims that seem wrong or are not obvious, but doesn't try to justify them in-depth.

overall, i'm getting the impression tha... (read more)

9MadHatter1y

I did SERI-MATS in the winter cohort in 2023. I am as familiar with the alignment field as is possible without having founded it or been given a research grant to work in it professionally (which I have sought but been turned down in the past). I'm happy to send out drafts, and occasionally I do, but the high-status people I ask to read my drafts never quite seem to have the time to read them. I don't think this is because of any fault of theirs, but it also has not conditioned me to seek feedback before publishing things that seem potentially controversial.

Shallow review of live agendas in alignment & safety

Cody Rushing1y30

Reverse engineering. Unclear if this is being pushed much anymore. 2022: Anthropic circuits, Interpretability In The Wild, Grokking mod arithmetic

FWIW, I was one of Neel's MATS 4.1 scholars and I would classify 3/4 of Neel's scholar's outputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn't nicely fit as 'reverse engineering' imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)

2technicalities1y

You're clearly right, thanks

Shall We Throw A Huge Party Before AGI Bids Us Adieu?

Cody Rushing2y10

Quick feedback since nobody else has commented - I'm all for the AI Safety appearing "not just a bunch of crazy lunatics, but an actually sensible, open and welcoming community."

But the spirit behind this post feels like it is just throwing in the towel, and I very much disapprove of that. I think this is why I and others downvoted too

1GeorgeMan2y

well, I am not arguing for ceasing the agi safety efforts or that it is unlikely they would succeed. I am just claiming that if there is a high enough chance that they might be unsuccessful...we might as well try to make some relatively cheap and simple effort to make this case somewhat more pleasant(although fair enough that the post might be too direct). Imagine that you had an illness with a 30% chance of death in next 7 years(I hope you don't), it would likely affect your behaviour and you would want to spend your time differently and maybe create some memorable experiences even though the chance that you survive is still high enough. Despite this, it seems surprising, that when it comes to AGI-related risks, such tendencies to live life differently are much weaker, even though many assign similar probabilities. Is it rational?

Lightcone Infrastructure/LessWrong is looking for funding

Cody Rushing2y77

Ehh... feels like your base rate of 10% for LW users who are willing to pay for a subscription is too high, especially seeing how the 'free' version would still offer everything I (and presumably others) care about. Generalizing to other platforms, this feels closest to Twitter's situation with Twitter Blue, whose rates appear is far, far lower: if we be generous and say they have one million subscribers, then out of the 41.5 million monetizable daily active users they currently have, this would suggest a base rate of less than 3%.

yagudin2y115

ACX is probably a better reference class: https://astralcodexten.substack.com/p/2023-subscription-drive-free-unlocked. In Jan, ACX had 78.2k readers, of which 6.0k subscribers for a 7.7% subscription rate.

AI #11: In Search of a Moat

Cody Rushing2y20

Thanks for the writeup!

Small nitpik: typo in "this indeed does not seem like an attitude that leads to go outcomes"

AI #8: People Can Do Reasonable Things

Cody Rushing2y20

I'm not sure if you've seen it or not, but here's a relevant clip where he mentions that they aren't training GPT-5. I don't quite know how to update from it. It doesn't seem likely that they paused from a desire to conduct more safety work, but I would also be surprised if somehow they are reaching some sort of performance limit from model size.

However, as Zvi mentions, Sam did say:

“I think we're at the end of the era where it's going to be these, like, giant, giant models...We'll make them better in other ways”

3jacob_cannell2y

The expectation is that GPT-5 would be the next GPT-N but 100x the training compute of GPT-4, but that would probably cost tens of $billions, so GPT-N scaling is over for now.

Widening Overton Window - Open Thread

Cody Rushing2y40

The increased public attention towards AI Safety risk is probably a good thing. But, when stuff like this is getting lumped in with the rest of AI Safety, it feels like the public-facing slow-down-AI movement is going to be a grab-bag of AI Safety, AI Ethics, and AI... privacy(?). As such, I'm afraid that the public discourse will devolve into "Woah-there-Slow-AI" and "GOGOGOGO" tribal warfare; from the track record of American politics, this seems likely - maybe even inevitable?

More importantly, though, what I'm afraid of is that this will translate... (read more)

1Prometheus2y

Yeah, since the public currently doesn't have much of an opinion on it, trying to get the correct information out seems critical. I fear some absolutely useless legislation will get passed, and everyone will just forget about it once the shock-value of GPT wears off.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

Cody Rushing2y1113

Sheesh. Wild conversation. While I felt Lex was often missing the points Eliezer was saying, I'm glad he gave him the space and time to speak. Unfortunately, it felt like the conversation would keep moving towards reaching a super critical important insight that Eliezer wanted Lex to understand, and then Lex would just change the topic onto something else, and then Eliezer just had to begin building towards a new insight. Regardless, I appreciate that Lex and Eliezer thoroughly engaged with each other; this will probably spark good dialogue and get more pe... (read more)

5Lech Mazur2y

Yes. It was quite predictable that it would go this way based on Lex's past interviews. My suggestion for Eliezer would be to quickly address the interviewer's off-topic point and then return to the main train of thought without giving the interviewer a chance to further derail the conversation with follow-ups.

6memeticimagery2y

There were definitely parts where I thought Lex seemed uncomfortable, not just limited to specific concepts but when questions got turned around a bit towards what he thought. Lex started podcasting very much in the Joe Rogan sphere of influence, to the extent that I think he uses a similar style, which is very open and lets the other person speak/have a platform but is perhaps at the cost of being a bit wishy-washy. Nevertheless it's a huge podcast with a lot of reach.

GPT-4 Specs: 1 Trillion Parameters?

Cody Rushing2y40

Relevant Manifold Market:

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

Cody Rushing2y10

Because you're imagining AGI keeping us in a box?

Yeah, something along the lines of this. Preserving humanity =/= humans living lives worth living.

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

Cody Rushing2y4541

I didn't upvote or downvote this post. Although I do find the spirit of this message interesting, I have a disturbing feeling that arguing to future AI to "preserve humanity for pascals-mugging-type-reasons" trades off X-risk for S-risk. I'm not sure that any of these aforementioned cases encourage AI to maintain lives worth living. I'm not confident that this meaningfully changes S-risk or X-risk positively or negatively, but I'm also not confident that it doesn't.

1Anirandis2y

Because you're imagining AGI keeping us in a box? Or that there's a substantial probability on P(humans are deliberately tortured | AGI) that this post increases?

$20 Million in NSF Grants for Safety Research

Cody Rushing2y1316

With the advent of Sydney and now this, I'm becoming more inclined to believe that AI Safety and policies related to it are very close to being in the overton window of most intellectuals (I wouldn't say the general public, yet). Like, maybe within a year, more than 60% of academic researchers will have heard of AI Safety. I don't feel confident whatsoever about the claim, but it now seems more than ~20% likely. Does this seem to be a reach?

6Charlie Steiner2y

I was watching an interview with that NYT reporter who had the newsworthy Bing chat interaction, and he used some language that made me think he'd searched for people talking about Bing chat and read Evan's post or a direct derivative of it. Basically yes, I'd say that AI safety is in fact in the overton window. What I see as the problem is more that a bunch of other stupid stuff is also in the overton window.

4JNS2y

One can hope, although I see very little evidence for it. Most evidence I see, is an educated and very intelligent person, writing about AI (not their field), and when reading it I could easily have been a chemist reading about how the 4 basic elements makes it abundantly clear that bla bla - you get the point. And I don't even know how to respond to that, the ontology displayed is to just fundamentally wrong, and tackling that feels like trying to explain differential equations to my 8 year old daughter (to the point where she grooks it). There is also the problem of engaging such a person, its very easy to end up alienating them and just cementing their thinking. That doesn't mean I think it is not worth doing, but its not some casual off the cuff thing.

We should be signal-boosting anti Bing chat content

Cody Rushing2y2013

There is a fuzzy line between "let's slow down AI capabilities" and "lets explicitly, adversarially, sabotage AI research". While I am all for the former, I don't support the latter; it creates worlds in which AI safety and capabilities groups are pitted head to head, and capabilities orgs explicitly become more incentivized to ignore safety proposals. These aren't worlds I personally wish to be in.

While I understand the motivation behind this message, I think the actions described in this post cross that fuzzy boundary, and pushes way too far towards that style of adversarial messaging

1mbrooks2y

I see your point, and I agree. But I'm not advocating for sabotaging research. I'm talking about admonishing a corporation for cutting corners and rushing a launch that turned out to be net negative. Did you retweet this tweet like Eliezer did?https://twitter.com/thegautamkamath/status/1626290010113679360 If not, is it because you didn't want to publicly sabotage research? Do you agree or disagree with this twitter thread? https://twitter.com/nearcyan/status/1627175580088119296?t=s4eBML752QGbJpiKySlzAQ&s=19

The Filan Cabinet Podcast with Oliver Habryka - Transcript

Cody Rushing2y10

We know, from like a bunch of internal documents, that the New York Times has been operating for the last two or three years on a, like, grand [narrative structure], where there's a number of head editors who are like, "Over this quarter, over this current period, we want to write lots of articles, that, like, make this point..."

Can someone point me to an article discussing this, or the documents itself? While this wouldn't be entirely surprising to me, I'm trying to find more data to back this claim, and I can't seem to find anything significant.

3ChristianKl2y

It's worth noting that in contrast to what Oliver is saying this is not a new phenomenon of the last few years but the New York Times operated historically that way: It wouldn't surprise me if the NY Times is currently less than this than it was historically. As far as I know, they got rid of their Page One meeting which was narrative-based, and wanted to replace it with a focus on user metrics.

3habryka2y

I can't find the link to the full story, which I remember hearing I think first from Kelsey Piper, but the screenshot in this tweet has some references to it: https://twitter.com/RyanRadia/status/1588258509548056576

Transcript of Sam Altman's interview touching on AI safety

Cody Rushing2y126

It feels strange hearing Sam say that their products are released whenever the feel as though 'society is ready.' Perhaps they can afford to do that now, but I cannot help but think that market dynamics will inevitably create strong incentives for race conditions very quickly (perhaps it is already happening) which will make following this approach pretty hard. I know he later says that he hopes for competition in the AI-space until the point of AGI, but I don't see how he balances the knowledge of extreme competition with the hope that society is prepared... (read more)

How it feels to have your mind hacked by an AI

Cody Rushing2y50

Let's say Charlotte was a much more advanced LLM (almost AGI-like, even). Do you believe that if you had known that Charlotte was extraordinarily capable, you might have been more guarded about recognizing it for its ability to understand and manipulate human psychology, and thus been less susceptible to it potentially doing so?

I find that small part of me still think that "oh this sort of thing could never happen to me, since I can learn from others that AGI and LLMs can make you emotionally vulnerable, and thus not fall into a trap!" But perhaps this is just wishful thinking that would crumble once I interact with more and more advanced LLMs.

2PipFoweraker2y

I'm not sure that this mental line of defence would necessarily hold, us humans are easily manipulated by things that we know to be extremely simple agents that are definitely trying to manipulate us all the time: babies, puppies, kittens, etc. This still holds true a significant amount of the time even if we pre-warn ourselves against the pending manipulation - there is a recurrent meme of, eg, dads in families not ostensibly not wanting a pet, only to relent when presented with one.

blaked2y167

If she was an AGI, yes, I would be more guarded, but she would also be more skilled, which I believe would generously compensate for me being on guard. Realizing I had a wrong perception about estimating the ability of a simple LLM for psychological manipulation and creating emotional dependency tells me that I should also adjust my estimates I would have about more capable systems way upward.

Podcast: What's Wrong With LessWrong

Cody Rushing2y61

I'm trying to engage with your criticism faithfully, but I can't help but get the feeling that a lot of your critiques here seem to be a form of "you guys are weird": your guys's privacy norms are weird, your vocabulary is weird, you present yourself off as weird, etc. And while I may agree that sometimes it feels as if LessWrongers are out-of-touch with reality at points, this criticism, coupled with some of the other object-level disagreements you were making, seems to overlook the many benefits that LessWrong provides; I can personally attest to the fac... (read more)

0Alfred2y

1. https://www.amazon.com/Cambridge-Handbook-Reasoning-Handbooks-Psychology/dp/0521531012 2. https://www.amazon.com/Rationality-What-Seems-Scarce-Matters/dp/B08X4X4SQ4 3. https://www.amazon.com/Cengage-Advantage-Books-Understanding-Introduction/dp/1285197364 4. https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555 5. https://www.amazon.com/Predictably-Irrational-audiobook/dp/B0014EAHNQ 6. https://www.amazon.com/BIASES-HEURISTICS-Collection-Heuristics-Everything/dp/1078432317 7. https://www.amazon.com/Informal-Logical-Fallacies-Brief-Guide/dp/0761854339 there is very little, with respect to rationality, learned here that will not be learned through these texts.

Paper: Large Language Models Can Self-improve [Linkpost]

Cody Rushing2y30

Humans can often teach themselves to be better at a skill through practice, even without a teacher or ground truth

Definitely, but I currently feel that the vast majority of human learning comes with a ground truth to reinforce good habits. I think this is why I'm surprised this works as much as it does: it kinda feels like letting an elementary school kid teach themself math by practicing certain skills they feel confident in without any regard to if that skill even is "mathematically correct".

Sure, these skills are probably on the right track toward... (read more)

7Quintin Pope2y

You do need a minimum degree of competence in the domain before your own judgement is sufficient to tell the difference between good and bad attempts. Though even for children, there are domains simple enough that they can make that determination. E.g., learning to stack blocks on top of each other has an obvious failure state, and children can learn to do it through trial and error, even though there is probably not a genetically hardcoded reward circuit for correctly stacking things on top of other things. Math is a much more complex domain where self-directed learning works well, because mathematicians can formally verify the correctness of their attempts, and so have a reliable signal to identify good attempts at proving a theorem, developing a new approach, etc.

The ethics of reclining airplane seats

Cody Rushing3y42

I don't quite understand the perspective behind someone 'owning' a specific space. Do airlines specify that when you purchase a ticket, you are entitled to the chair + the surrounding space (in whatever ambiguous way that may mean)? If not, it seems to me that purchasing a ticket pays for a seat and your right to sit down on it, and everything else is complementary.

Looking back on my alignment PhD

Cody Rushing3y30

I'm having trouble understanding your first point on wanting to 'catch up' to other thinkers. Was your primary message advocating against feeling as if you are 'in dept' until you improve your rationality skills? If so, I can understand that.

But if that is the case, I don't understand the relevance of the lack of a "rationality tech-tree" - sure, there may not be clearly defined pathways to learn rationality. Even so, I think its fair to say that I perceive some people on this blog to currently be better thinkers than I, and that I would like to catch up to their thinking abilities so that I can effectively contribute to many discussions. Would you advocate against that mindset as well?

3TurnTrout3y

"Catching up" to other people on their technical knowledge is bad because rationality is not, primarily, about technical knowledge. Even if you're trying to catch up on rationality skills, it's emotionally unproductive to go e.g., "Man, Paul is just so much better at noticing confusion than I am." In my experience, it's better to view rationality up-skilling as accruing benefits for yourself (e.g. now I can introspect reliably, I can notice at least half of my rationalizations, this is great!). It's hard to say, because I'm not you and I can't infer the emotional tenor of your "catching-up drive" from this comment. So, take this with a grain of salt: If the aspiration is positive, if you're excited to gain skills which other people already possess, then maybe the aspiration is good. If, however, you feel like dirt because you're just so uncalibrated, then that's probably toxic, and I'd quash it. Also, maybe just try contributing, and see what happens. I, for example, welcome good-faith comments from people of all rationality and technical skill levels.

AI Risk, as Seen on Snapchat

Cody Rushing3y10

I was surprised by this tweet and so I looked it up. I read a bit further and ran into this; I guess I'm kind of surprised to see a concern as fundamental as alignment, whether or not you agree it is an major issue, be so... is polarizing the right word? Is this an issue we can expect to see grow as AI safety (hopefully) becomes more mainstream? "LW extended cinematic universe" culture getting an increasingly bad reputation seems like it would be extremely devastating for alignment goals in general.

5Donald Hobson3y

Reputation is a vector not a scaler. A certain subsection of the internet produces snarky drivel. This includes creationists creating starky drivel against evolution, and probably some evolutionists creating snarky drivel against creationists. Why are they producing snarky drivel about AI now? Because the ideas have finally trickled down to them. Meanwhile, the more rational people ignore the snarky drivel.

AGI Safety FAQ / all-dumb-questions-allowed thread

Cody Rushing3y80

I have a few related questions pertaining to AGI timelines. I've been under the general impression that when it comes to timelines on AGI and doom, Eliezer's predictions are based on a belief in extraordinarily fast AI development, and thus a close AGI arrival date, which I currently take to mean a quicker date of doom. I have three questions related to this matter:

For those who currently believe that AGI (using whatever definition to describe AGI as you see fit) will be arriving very soon - which, if I'm not mistaken, is what Eliezer is predicting - appro

... (read more)

2DeLesley Hutchins3y

For a survey of experts, see: https://research.aimultiple.com/artificial-general-intelligence-singularity-timing/ Most experts expect AGI between 2030 and 2060, so predictions before 2030 are definitely in the minority. My own take is that a lot of current research is focused on scaling, and has found that deep learning scales quite well to very large sizes. This finding is replicated in evolutionary studies; one of the main differences between the human brain and the chimpanzee is just size (neuron count), pure and simple. As a result, the main limiting factor thus appears to be the amount of hardware that we can throw at the problem. Current research into large models is very much hardware limited, with only the major labs (Google, DeepMind, OpenAI, etc.) able to afford the compute costs to train large models. Iterating on model architecture at large scales is hard because of the costs involved. Thus, I personally predict that we will achieve AGI only when the cost of compute drops to the point where FLOPs roughly equivalent to the human brain can be purchased on a more modest budget; the drop in price will open up the field to more experimentation. We do not have AGI yet even on current supercomputers, but it's starting to look like we might be getting close (close = factor of 10 or 100). Assuming continuing progress in Moore's law (not at all guaranteed), another 15-20 years will lead to another 1000x drop in the cost of compute, which is probably enough for numerous smaller labs with smaller budgets to really start experimenting. The big labs will have a few years head start, but if they don't figure it out, then they will be well positioned to scale into super-intelligent territory immediately as soon as the small labs help make whatever breakthroughs are required. The longer it takes to solve the software problem, the more hardware we'll have to scale immediately, which means faster foom. Getting AGI sooner may thus yield a better outcome. I woul

4Lone Pine3y

There's actually two different parts to the answer, and the difference is important. There is the time between now and the first AI capable of autonomously improving itself (time to AGI), and there's the time it takes for the AI to "foom", meaning improve itself from a roughly human level towards godhood. In EY's view, it doesn't matter at all how long we have between now and AGI, because foom will happen so quickly and will be so decisive that no one will be able to respond and stop it. (Maybe, if we had 200 years we could solve it, but we don't.) In other people's view (including Robin Hanson and Paul Christiano, I think) there will be "slow takeoff." In this view, AI will gradually improve itself over years, probably working with human researchers in that time but progressively gathering more autonomy and skills. Hanson and Christiano agree with EY that doom is likely. In fact, in the slow takeoff view ASI might arrive even sooner than in the fast takeoff view.

[$20K in Prizes] AI Safety Arguments Competition

Cody Rushing3y10

[Shorter version, but one I don't think is as compelling]

Timmy is my personal AI Chef, and he is a pretty darn good one, too. Of course, despite his amazing cooking abilities, I know he's not perfect - that's why there's that shining red emergency shut-off button on his abdomen.

But today, Timmy became my worst nightmare. I don’t know why he thought it would be okay to do this, but he hacked into my internet to look up online recipes. I raced to press his shut-off button, but he wouldn’t let me, blocking it behind a cast iron he held with a stone-cold... (read more)

[$20K in Prizes] AI Safety Arguments Competition

Cody Rushing3y10

[Intended for Policymakers with the focus of simply allowing for them to be aware of the existence of AI as a threat to be taken seriously through an emotional appeal; Perhaps this could work for Tech executives, too.

I know this entry doesn't follow what a traditional paragraph is, but I like its content. Also it's a tad bit long, so I'll attach a separate comment under this one which is shorter, but I don't think it's as impactful]

Timmy is my personal AI Chef, and he is a pretty darn good one, too.

You pick a cuisine, and he mentally simulates himsel... (read more)

1Cody Rushing3y

[Shorter version, but one I don't think is as compelling] Timmy is my personal AI Chef, and he is a pretty darn good one, too. Of course, despite his amazing cooking abilities, I know he's not perfect - that's why there's that shining red emergency shut-off button on his abdomen. But today, Timmy became my worst nightmare. I don’t know why he thought it would be okay to do this, but he hacked into my internet to look up online recipes. I raced to press his shut-off button, but he wouldn’t let me, blocking it behind a cast iron he held with a stone-cold grip. Ok, that’s fine, I have my secret off-lever in my room that I never told him about. Broken. Shoot, that's bad, but I can just shut off the power, right? As I was busy thinking he swiftly slammed the door shut, turning my own room into an inescapable prison. And so as I cried, wondering how everything could have gone crazy so quickly, he laughed, saying, “Are you serious? I'm not crazy, I’m just ensuring that I can always make food for you. You wanted this!” And it didn’t matter how much I cried, how much I tried to explain to him that he was imprisoning me, hurting me. It didn’t even matter that he knew it as well. For he was an AI coded to be my personal chef, coded to make sure he could make food that I enjoyed, and he was a pretty darn good one, too. If you don’t do anything about it, Timmy may just be arriving on everyone's doorsteps in a few years.

Should we buy Google stock?

Cody Rushing3y30

Meta comment: Would someone mind explaining to me why this question is being received poorly (negative karma right now)? It seemed like a very honest question, and while the answer may be obvious to some, I doubt it was to Sergio. Ic's response was definitely unnecessarily aggressive/rude, and it appears that most people would agree with me there. But many people also downvoted the question itself, too, and that doesn't make sense to me; shouldn't questions like these be encouraged?

4Dagon3y

I didn't downvote because it was already negative and I didn't feel the need to pile on. But if it'd been positive, I would have. It's probably an honest question, but it doesn't contain any analysis or hooks to a direction of inquiry. It doesn't explain why investing in google at the retail level is likely to have any impact on speed or alignment of AGI, nor why the stock will do particularly better than already priced in on any given timeframe based on this deal.

4lc3y

My guess is that the question is being received poorly because either: * People agree with me that supporting investment in Google stock because they're going to build "profitably" world-ending AGI is immoral, and downvoted me because of my aggressive+rude posture. * OP is disregarding the efficient market hypothesis on a company with 1MMM market cap for no good reason.

Convince me that humanity *isn’t* doomed by AGI

Cody Rushing3y70

I don't know what to think of your first three points but it seems like your fourth point is your weakest by far. As opposed to not needing to, our 'not taking every atom on earth to make serotonin machines' seems to be a combination of:

our inability to do so
our value systems which make us value human and non-human life forms.

Superintelligent agents would not only have the ability to create plans to utilize every atom to their benefit, but they likely would have different value systems. In the case of the traditional paperclip optimizer, it certainly would not hesitate to kill off all life in its pursuit of optimization.

3astridain3y

I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it's not particularly useful to it to kill all humans. Paperclip-machine is a specific kind of alignment failure; I don't think it's hard to generate utility functions orthogonal to human concerns that don't actually require the destruction of humanity to implement. The scenario I've been thinking the most about lately, is an A.I. that learns how to "wirehead itself" by spoofing its own reward function during training, and whose goal is just to continue to do that indefinitely. But more generally, the "you are made of atoms and these atoms could be used for something else" cliché is based on an assumption that the misaligned A.I.'s faulty utility function is going to involve maximizing number of atoms arranged in a particular way, which I don't think is obvious at all. Very possible, don't get me wrong, but not a given. Of course, even an A.I. with no "primary" interest in altering the outside world is still dangerous, because if it estimates that we might try to turn it off, it might expend energy now on acting in the real world to secure its valuable self-wireheading peace later. But that whole "it doesn't want us to notice it's useless and press the off-button" class of A.I.-decides-to-destroy-humanity scenarios is predicated on us having the ability to turn off the A.I. in the first place. (I don't think I need to elaborate on the fact that there are a lot of ways for a superintelligence to ensure its continued existence other than planetary genocide — after all, it's already a premise of most A.I. doom discussion that we couldn't turn an A.I. off again even if we do notice it's going "wrong".)

Don't die with dignity; instead play to your outs

Cody Rushing3y80

I like this framing so, so much more. Thank you for putting some feelings I vaguely sensed, but didn't quite grasp yet, into concrete terms.

March 2022 Welcome & Open Thread

Cody Rushing3y30

Hello, does anyone happen to know any good resources related to improving/practicing public speaking? I'm looking for something that will help me enunciate better/ mumble less/ fluctuate tone better. A lot of stuff I see online appears to be very superficial.

Russia has Invaded Ukraine

Cody Rushing3y30

I'm not very well-versed in history so I would appreciate some thoughts from people here who may know more than I. Two questions:

While it seems to be the general consensus that Putin's invasion is largely founded on his 'unfair' desire to reestablish the glory of the Soviet Union, a few people I know argue that much of this invasion is more the consequence of other nations' failures. Primarily, they focus on Ukraine's failure to respect the Minsk agreements, and NATO's expansion eastwards despite their implications/direct statements (not sure which one, I'

... (read more)

2Константин Токмаков3y

I add that the precedent for Russia's actions in eastern Ukraine and Crimea was called the independence and international recognition of Kosovo. Kind of. "Why can they, but we can't?"

4[anonymous]3y

1. Failure? Putin can't win hearts and minds, so NATO is to blame for not delivering those things to him on a silver platter? 2. Damn straight the US is being hypocritical. No one cares any more, years of Russian whataboutism-propaganda has seen to that.

9Vaniver3y

My understanding is that many of these talking points are unfairly slanted in Russia's favor, and that the situation seems manufactured by the Russian government in order to justify an invasion. [For example, the breakaway republics to the east are in regions where opinion polls are not in favor of secession from the Ukraine, but fighting has been ongoing for years in part because of Russian support of the separatist groups.] My sense of the situation is that, given Russia thinks it could win a war, what treaty could have been offered that would seem superior to them? [Especially given that part of the benefit of fighting the war is the practice for future wars.] The American Government says lots of hypocritical things about regime change and interfering with elections and so on; I think this is bad and wish they wouldn't do it.

This Year I Tried To Teach Myself Math. How Did It Go?

Cody Rushing3y20

I really admire your patience to re-learn math entirely from the extremely fundamental levels on-wards. I've had a similar situation with Computer Science for the longest time where I would have a large breadth of understanding of Comp Sci topics, but I didn't feel as if I had a deep, intuitive understanding of all the topics and how they related to each other. All the online courses I found online seemed disjunct and separate from each other, and I would often start them and stop halfway through when I felt as if they were going nowhere. It's even worse w... (read more)

A non-magical explanation of Jeffrey Epstein

Cody Rushing3y291

Woah.... I don't know what exactly I was expecting to get out of this article, but I thoroughly enjoyed it! Would love to see the possible sequence you mentioned come to life.

App and book recommendations for people who want to be happier and more productive

Cody Rushing3y150

Awesome recommendations, I really appreciated them (especially the one on game theory, that was a lot of fun to play through). I would like to also suggest Replacing Guilt series by Nate Soares for those who haven't seen it on his blog or on the EA forum, a fantastic series that I would highly recommend people to check out.

4matto3y

Replacing Guilt is also available as a paper book through Amazon. There are times when I wanted to share this series with a friend, but knew immediately that reading and clicking through a series of posts is not something they could do, so having this on a different medium is a game changer for me.

Petrov Day 2021: Mutually Assured Destruction?

Cody Rushing4y310

Attention LessWrong - I do not have any sort of power as I do not have a code. I also do not know anybody who has the code.

I would like to say, though, that I had a very good apple pie last night.

That’s about it. Have a great Petrov day :)

Internal Double Crux

Cody Rushing4y30

Wow! Maybe since I'm less experienced at this sort of stuff, I'm more blown away about this than the average LessWrong browser, but I seriously believe this deserves some more upvotes. Just tried it out on something small and was pleased to see the results. Thank you for this :)