All of milanrosko's Comments + Replies

Question: How does this idea guarantee that the contamination did not happen on purpose otherwise or accidentally through articles like this ? (Not speaking for the companies since I am quite sure that they don't care... Just a practical consideration.)

Jozdien101

This doesn't guarantee it, you're right. But the obvious way to filter canaried data out is to simply remove any document that has the string inside it. To further filter for articles that only talk about the canary instead of intending to use it seems like a needlessly expensive task, given how few such articles there would be.

Further, given that one use of the canary is to tell whether contamination has happened by checking if the model knows the string, not filtering such articles out would still not be great.

All that said however, I think the fact that... (read more)

I am currently working on a similar post that comes from an eliminative perspective.

I wouldn't say that the presented "counting argument" is a "central reason". The central reason is an a priori notion that if "x can be achieved by scheming" someone who wants x will scheme

You forgot to mention VASOR136, a trace vapor detection unit that is very versatile. The VASOR 136 holds 36 bees in cartridges. They are all ready to detect the presence of something in the air.

This is no BS or joke.

3quiet_NaN
See here. Of course, that article is a bit light on information on detection thresholds, false-positive rates and so on as compared to dogs, mass spectrometry or chemical detection methods.  I will also note that humans have 10-20M olfactory receptor neurons, while bees have 1M neurons in total. Probably bees are under more evolutionary pressure to make optimal use of their olcfactory neurons, though. 

The Argument goes like this:

At some point, resistance from advanced AI will cause significant damage, which can be used to change the trend of unregulated AI development. It is better to actively persuade such an outcome would better as a "traitorous turn" scenario.

Premise 1 It is unlikely that regulators will hinder humans from creating AGI. Evidence: Current trends in technological advancement and regulatory behavior suggest minimal interference.

Premise 2 Due to instrumental convergence, human extinction is likely if AGI is developed unchecked. Evidence:... (read more)

You highlight a very important issue: S-Risk scenarios could emerge even in early AGI systems, particularly given the persuasive capabilities demonstrated by large language models.

While I don't believe that gradient descent would ever manifest "vengefulness" or other emotional attributes—since these traits are products of natural selection—it is plausible that an AGI could employ highly convincing strategies. For instance, it might threaten to create a secondary AI with S-Risk as a terminal goal and send it to the moon, where it could assemble the resource... (read more)

milanrosko*0-2

I have no idea what this is about but it seems to me that you are making confidential conversation about Teresa <redacted> public, possibly without her consent. Maybe because she is homeless. Can someone explain to me like I am five why this on lesswrong?

I see the point all of you are making, thank you. I agree that a last name muddies things--I deleted it from the post.

0Stephen Fowler
I agree that it is certainly morally wrong to post this if that is the persons real full name. It is less bad, but still dubious, to post someones traumatic life story on the internet even under a pseudonym. 
5Ben Millwood
I suggest you redact the name from this comment, and then OP redacts the name from the post too, keeping everything else. I think there's some rhetorical value in presenting the information like this, but I think the potential downsides outweigh the upsides here.
7the gears to ascension
I think you have somewhat of a point, but. Like. On one hand, it was in public, there's no indication it was meant to be confidential at the time. None of this is anything anyone could use to track her down. It's a humanizing, empathetic story about people having a bad time towards an end of building empathy. But the part where I maybe agree a bit is like, she may or may not have agreed to people sharing her story. My guess is she probably would; but we can't trivially ask her. It might not be too terribly hard for Declan to find her again and get her consent to share it, and my guess is she'd give it eagerly; if it's achievable for him to do that I'd suggest he should. But if he can't, I'd default to share. Maybe not with her full name though.
habryka132

(I see no indication in this story that the conversation was confidential)

But I realise we're talking at cross purposes. This is about an approach or a concept (not a policy, as I emphasized at the beginning) on how to reduce X-Risk in an unconventional way, In this example a utilitarian principle is taken and combined with the fact that a "Treatious Turn" and the "Shutdown Problem" cannot dwell side by side.

2Gordon Seidoh Worley
What is an approach but a policy about what ideas are worth exploring? However you frame it, we could work on this or not in favor of something else. Having ideas is nice, but they only matter if put into action, and since we have limited resources for putting ideas into action, we must implicitly also consider whether or not an idea is worth investing effort into. My original comment is to say that you didn't convince me this is a good idea to explore, which seems potentially useful for you to know, since I expect many readers will feel the same way and bounce off your idea because they don't see why they should care about it. I think you can easily address this by spending time making a case for why such an approach might be useful at all and then also relatively useful compared to alternatives, and I think this is especially important given the tradeoffs your proposal suggests we make (sacrificing people's lives in the name of learning what we need to know to build safe AI).

So what other policies that are less likely to result in people dying are there?

1Gordon Seidoh Worley
For one, advocating policies to pause or slow down capabilities development until we have sufficient theoretical understanding to not need to risk misaligned AI causing harm.

I might be dumb but at least I have introspection.

This is how my brain does multiplication: I categorise each fact based on the level of cognitive effort they require, ranging from intuitive to "yikes".

  • Intuitive: Immediate and automatic, without needing further thought.
  • Stored: Retained in memory, feeling slightly less reliable than “intuitive“, but has some sort of intuition in it. If challenged, I would not reconsider.
  • Memorised: Also retained, but "less securely". It has to be remembered without intuition. If challenged, I might briefly reconsider. One co
... (read more)

As an eliminative nominalist, I claim there are no abstractions.

because it's quite limited... it's a joke btw.

milanrosko-2-2

This does not seem to be rational thinking to me.

1AhmedNeedsATherapist
I would agree that the vibe is off.

When it comes to contraceptives, the placebo effect is quite limited.

4RamblinDash
If it did work, you might call it "immaculate contraception"!
2kithpendragon
I don't recall having ever read any claim that the placebo effect extends to contraceptives.
5NeroWolfe
Probably even negatively correlated. If you think you're protected, you're going to engage in sex more often without real protection than you would if you knew you were just 15 minutes away from being a parent.

Good job. Thank you and have a nice week.

Corrected to agentic and changed the part where it derails a bit. Thank you.

Thanks for the mod for the deus ex machina.

I've been a LessWrong lurker (without an account) for around ten years, ever since the Roko's Basilisk "thing", so... This comic isn't targeted at the LessWrong community but was created by it.

The unusual style, gaming-related language, and iconography typical of manga and comics help bypass the bias known as "mortality salience." I'm trying to convey this message more indirectly, aiming to engage people who might not usually be interested in these topics or who would typically engage in "worldview defense".

Anyway... (read more)

2ilm
I liked your post. Tangential, but I'd be interested to hear/read more about this. I have similar feelings but thoughts around this are very disorganized and blurry, rather than well laid out and clear.
3habryka
You're welcome! Sorry for the variance in the karma system. It does take quite a while for things to settle.

Yeah I will close this attempt. I mean currently it has -1 and some other dumb thread that is about how NYC is has like 21. Nah... Fuck this.

oumuamua149

Don't do this, please. Just wait and see. This community is forgiving about changing ones mind.

Yes I agree totally, and will do that definitely. Actually I already started to work on it.

Thank you for your advice. I will definitely consider the short-form in future for most of such dealings...
However, I still believe that there is something to this "ontological mess" thing, but the form is lacking as you point it out.

I like this community a lot because of people like you. Have a nice weekend!

I apologize if my previous tone was not as polite as your detailed response deserved. I want to acknowledge your comment and express my appreciation for your constructive feedback.

"Statement B is a 'is' statement too. 'Is it good or bad' is by definition an ought statement."

Yes obviously, but it is "more concerned" about "ought". It is hard to make formulaic examples because it is also a grammar thing.

"Who argues what? Some argue that the earth's flat. This doesn't make it worth discussing."

Sorry but this argument is very often regurgitated everywhere by "smart people" in this form. It is bit baffling to me why you think otherwise.

"This would not amount to an 'ought' statement."

Okay this again. It is quite common in discussions to "... (read more)

"You're asking for flaws in the idea, but more often posts are downvoted for being confusing, boring, or just not particularly helpful for people who use this site."

Well said.

1milanrosko
Thank you for your advice. I will definitely consider the short-form in future for most of such dealings... However, I still believe that there is something to this "ontological mess" thing, but the form is lacking as you point it out. I like this community a lot because of people like you. Have a nice weekend!
3Seth Herd
Just to put this another way: upvotes and downvotes aren't for being right or wrong. They're for making the reader less wrong. People typically upvote when they feel they've learned something or gained a new perspective on a topic they care about. Doing that is hard, because the average LW reader has read a lot of very good material. If you have material you want to share without subjecting it to that high bar, the "short form" post attached to your account might be a good way to do that. People are usually much more supportive and forgiving if you're not asking them for their time and attention by making a top-level post. Good job asking for feedback! It can be really frustrating to put a bunch of work in to a post and then have it downvoted below zero, implying that everyone wishes you hadn't bothered. I've had it happen, and it sucks. Writing well for a particular audience is really hard, but also really valuable, so good job sticking with it instead of just giving up on this community like a lot of people do when their first post is downvoted.

Haha, I didn't expect to read something like this today! I love where this is going.

Circular Definitions are problem if the set of problems contain circular definitions.

"The implication is that the AI doesn't want to learn anything new."

At first, I was confused by this statement, but then I had an epiphany. It's because understanding gradient estimation methods can be challenging and that's totally okay. Your input is valuable because it highlights how unfamiliar this topic is for many people, as most are even less familiar.

Here's the short answer: You (or neural networks ig) do not "learn" terminal goals. You can't learn not to like boobs if that's what are into. (Well something like that can happen but it's because they... (read more)

2Seth Herd
The comic still doesn't make sense to me. I'm not getting from your explanation why that robot killed that human. I can come up with multiple interpretations, but I still don't know yours. Which means that your average reader is going to find it completely confusing. And since that's at the center of that comic (which again, is really cool art, writing, and layout, so I hope people like and share it!) they're not going to "get it" and won't really get anything out of it other than "this guy thinks AI will take over for some weird reason I can't understand". As it happens, I've spent 23 years in a research lab centered on learning in the brain, doing everything from variations on backpropagation, reinforcement learning, and other types of learning, to figuring out how that learning leads to human behavior. Since then I've been applying that thinking to AGI and alignment. So if I'm not getting it, even after taking multiple tries at it, it's pretty certain that your non-expert audience won't get it either. I went back and looked at your first explanation of the metaphor of the button. I found it confusing, despite having been through similar explanations a thousand times. I suggest you polish your own thinking by trying to condense it into plain english that someone without any background would understand (since I assume your comic is meant for everyone). Then put that into the crucial spot in the comic.

Thank you very much! This means a lot to me. Okay regarding the button...

The “button” is a metaphor or placeholder for what's opposite to the machine’s intrinsic terminal goal or the loss function that guides its decision-making and actions. It's not a tangible button anymore but a remnant.

Imagine training an AI by pressing a button to initiate backpropagation. When released into the wild, the AI continues to operate as if the button is still present, constantly seeking to fulfill the goal it once associated with that action.

This is similar to how many hum... (read more)

2Seth Herd
Okay, so the comic concludes with the solar system being a "giant button-defense system". If the button is a metaphor for backpropagation, the implication is that the AI doesn't want to learn anything new. But that's not an intuitive result of training; most training is totally neutral on whether you want to do more training. For instance, if I'm wrong about some question of fact, I want to learn the truth so that I can navigate the world better, and so better accomplish my (current) goal(s). The exception is if you're talking about updating your goal(s). That's why we think an AGI would by default resist the pressing of a shutdown button, or any other "button" leading to changing its goals. If that second part is what you're referring to in the comic, I think it's unclear, and likely to be confusing to almost any audience. It was even confusing to me despite working on this topic full-time, so having seen many different variants of logic for how AI would resist change or new training. So I think you want to clarify what's going on. I suspect there's a way to do that just in the dialogue boxes around the critical turn, where the robot kills the human scientist. But I'm not sure what it is.