Quick Takes

habryka2415

Is it OK for LW admins to look at DM metadata for spam prevention reasons? 

Sometimes new users show up and spam a bunch of other users in DMs (in particular high-profile users). We can't limit DM usage to only users with activity on the site, because many valuable DMs get sent by people who don't want to post publicly. We have some basic rate limits for DMs, but of course those can't capture many forms of harassment or spam. 

Right now, admins can only see how many DMs users have sent, and not who users have messaged, without making a whole manual... (read more)

Showing 3 of 7 replies (Click to show all)
yc10

Could make this a report-based system? If the user reported a potential spam, then in the submission process ask for reasons, and ask for consent to look over the messages (between the reporter and the alleged spammer); if multiple people reported the same person it will be obvious this account is spamming with DM?

edit: just saw previous comment on this too

2eukaryote
Yeah, agree. (Also agree with Dagon in not having an existing expectation of strong privacy in LW DMs. Weak privacy, yes, like that mods wouldn't read messages as a matter of course.)  Here's how I would think to implement this unintrusively: little ℹ️-type icon on a top corner of the screen of the DM interface screen (or to the side of the "Conversation with XYZ" header, or something.) When you click on that icon, it toggles a writeup about circumstances in which information from the message might be sent to someone else (what information and who.)
2Dagon
This is going the wrong direction.  If privacy from admins is important (I argue that it's not for LW messages, but that's a separate discussion), then breaches of privacy should be exceptions for specific purposes, not allowed unless "really secret contents". Don't make this filter-in for privacy.  Make it filter-out - if it's detected as likely-spam, THEN take more intrusive measures.  Privacy-preserving measures include quarantining or asking a few recipients if they consider it harmful before delevering (or not) the rest, automated content filters, etc.  This infrastructure requires a fair bit of data-handling work to get it right, and a mitigation process where a sender can find out they're blocked and explicitly ask the moderator(s) to allow it.

Could mech interp ever be as good as chain of thought?

Suppose there is 10 years of monumental progress in mechanistic interpretability. We can roughly - not exactly - explain any output that comes out of a neural network. We can do experiments where we put our AIs in interesting situations and make a very good guess at their hidden reasons for doing the things they do.

Doesn't this sound a bit like where we currently are with models that operate with a hidden chain of thought? If you don't think that an AGI built with the current fingers-crossed-its-faithful paradigm would be safe, what percentage an outcome would mech interp have to hit to beat that?

Seems like 99+ to me.

I very much agree. Do we really think we're going to track a human-level AGI let alone a superintelligence's every thought, and do it in ways it can't dodge if it decides to?

I strongly support mechinterp as a lie detector, and it would be nice to have more as long as we don't use that and control methods to replace actual alignment work and careful thinking. The amount of effort going into interp relative to the theory of impact seems a bit strange to me.

nikola70

You should say "timelines" instead of "your timelines".

One thing I notice in AI safety career and strategy discussions is that there is a lot of epistemic helplessness in regard to AGI timelines. People often talk about "your timelines" instead of "timelines" when giving advice, even if they disagree strongly with the timelines. I think this habit causes people to ignore disagreements in unhelpful ways.

Here's one such conversation:

Bob: Should I do X if my timelines are 10 years?

Alice (who has 4 year timelines): I think X makes sense if your timelines are l... (read more)

Showing 3 of 4 replies (Click to show all)
Guive10

In general, it is difficult to give advice if whether the advice is good depends on background facts that giver and recipient disagree about. I think the most honest approach is to explicitly state what your advice depends on when you think the recipient is likely to disagree. E.g. "I think living at high altitude is bad for human health, so in my opinion you shouldn't retire in Santa Fe."

If I think AGI will arrive around 2055, and you think it will arrive in 2028, what is achieved by you saying "given timelines, I don't think your mechinterp project will ... (read more)

3Dagon
Hmm. I think there are two dimensions to the advice (what is a reasonable distribution of timelines to have, vs what should I actually do).  It's perfectly fine to have some humility about one while still giving opinions on the other.  "If you believe Y, then it's reasonable to do X" can be a useful piece of advice.  I'd normally mention that I don't believe Y, but for a lot of conversations, we've already had that conversation, and it's not helpful to repeat it.  
3mako yass
Timelines are a result of a person's intuitions about a technical milestone being reached in the future, it is super obviously impossible for us to have a consensus about that kind of thing. Talking only synchronises beliefs if you have enough time to share all of the relevant information, with technical matters, you usually don't.

Given the OpenAI o3 results making it clear that you can pour more compute to solve problems, I'd like to announce that I will be mentoring at SPAR for an automated interpretability research project using AIs with inference-time compute.

I truly believe that the AI safety community is dropping the ball on this angle of technical AI safety and that this work will be a strong precursor of what's to come.

Note that this work is a small part in a larger organization on automated AI safety I’m currently attempting to build.

Here’s the link: https://airtable.com/ap... (read more)

links 11/20/2024: https://roamresearch.com/#/app/srcpublic/page/12-20-2024

Physical object.

I might (20%) make a run of buttons that say how long since you pressed them. eg so I can push the button in the morning when I have put in my anti-baldness hair stuff and then not have to wonder whether I did.

Would you be interested in buying such a thing?

Perhaps they have a dry wipe section so you can write what the button is for.

If you would, can you upvote the attached comment.

Showing 3 of 8 replies (Click to show all)

I use daily checklists, in spreadsheet form, for this.

4Dagon
Probably not for me.  I had a few projects using AWS IoT buttons (no display, but arbitrary code run for click, double-click, or long-click of a small battery-powered wifi button), but the value wasn't really there, and I presume adding a display wouldn't quite be enough to devote the counter space.  Amusingly, it turns out the AWS version was EOL'd today - Learn about AWS IoT legacy services - AWS IoT Core
6Shankar Sivarajan
I think this would be missing the point. If it were "smart" like you describe, I definitely wouldn't buy it, and I wouldn't use it even if got it for free: I'd just get an app on my phone. What I want from such an object is infallibility, and the dumber it is, the closer it's likely to get to that ideal.
Raemon60

I’d like to hire cognitive assistants and tutors more often. This could (potentially) be you, or people you know. Please let me know if you’re interested or have recommendations.

By “cognitive assistant” I mean a range of things, but the core thing is “sit next to me, and notice when I seem like I’m not doing the optimal thing, and check in with me.” I’m interested in advanced versions who have particular skills (like coding, or Applied Quantitivity, or good writing, or research taste) who can also be tutoring me as we go.

I’d like a large rolodex of such pe... (read more)

leogao800

I decided to conduct an experiment at neurips this year: I randomly surveyed people walking around in the conference hall to ask whether they had heard of AGI

I found that out of 38 respondents, only 24 could tell me what AGI stands for (63%)

we live in a bubble

(https://x.com/nabla_theta/status/1869144832595431553)

Reply9433
Showing 3 of 11 replies (Click to show all)
2Eli Tyre
Was this possibly a language thing? Are there Chinese or Indian machine learning researchers who would use a different term than AGI in their native language?
leogao60

I'd be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I'm confident in my ability to administer this experiment on Chinese, French, and German speakers, which won't be an unbiased sample of non-native speakers, but hopefully still provides some signal.

6leogao
only 2 people walked away without answering (after saying yes initially); they were not counted as yes or no. another several people refused to even answer, but this was also quite rare. the no responders seemed genuinely confused, as opposed to dismissive. feel free to replicate this experiment at ICML or ICLR or next neurips.

Should LessWrong have an anonymous mode? When reading a post or comments, is it useful to have the username or does that introduce bias?

I had this thought after reading this review of LessWrong: https://nathanpmyoung.substack.com/p/lesswrong-expectations-vs-reality

2Dagon
I vote no.  An option for READERS to hid the names of posters/commenters might be nice, but an option to post something that you're unwilling to have a name on (not even your real name, just a tag with some history and karma) does not improve things.
12jbash
There is an option for readers to hide names. It's in the account preferences. The names don't show up unless you roll over them. I use it, to supplement my long-cultivated habit of always trying to read the content before the author name on every site[1]. As for anonymous posts, I don't agree with your blanket dismissal. I've seen them work against groupthink on some forums (while often at the same time increasing the number of low-value posts you have to wade through). Admittedly Less Wrong doesn't seem to have too much of a groupthink problem[2]. Anyway, there could always be an option for readers to hide anonymous posts. ---------------------------------------- 1. Actually I'm not sure I had to cultivate it. Back in the days of Usenet, I had to learn to actually ever look at poster's names to begin with. I do not think that I am normal in this. ↩︎ 2. ... which actually surprises me because at least some people do seem to buy into the "karma" gamification. ↩︎

Gemini 2.0 Flash Thinking is claimed to 'transparently show its thought process' (in contrast to o1, which only shows a summary): https://x.com/denny_zhou/status/1869815229078745152. This might be at least a bit helpful in terms of studying how faithful (e.g. vs. steganographic, etc.) the Chains of Thought are.   

Other recent models that show (at least purportedly) the full CoT:

4Seth Herd
Huge if true! Faithful Chain of Thought may be a key factor in whether the promise of LLMs as ideal for alignment pays off, or not. I am increasingly concerned that OpenAI isn't showing us o1s CoT because it's using lots of jargon that's heading toward a private language. I hope it's merely that it didn't want to show its unaligned "thoughts", and to prevent competitors from training on its useful chains of thought.
7Noosphere89
IMO, I think that most of the reason why they are not releasing CoT for o1 is exactly because of PR/competitive reasons, or this reason in a nutshell:

[this is a draft. I strongly welcome comments]

The Latent Military Realities of the Coming Taiwan Crisis

A blockade of Taiwan seems significantly more likely than a full-scale invasion. The US's non-intervention in Ukraine suggests similar restraint might occur with Taiwan. 

Nevertheless, Metaculus predicts a 65% chance of US military response to a Chinese invasion and separately gives 20-50% for some kind of Chinese military intervention by 2035. Let us imagine that the worst comes to pass and China and the United States are engaged in a hot war?

China's... (read more)

Showing 3 of 5 replies (Click to show all)
1Matthias Dellago
Great write up Alex! I wonder how well the transparent battlefied translates to the naval setting. 1. Detection and communication through water is significantly harder than air, requiring shorter distances. 2. Surveilling a volume scales worse than a surface. Am I missing something or do you think drones will just scale anyway?
4Alexander Gietelink Oldenziel
Great to hear this post had \geq 1 readers hah. * both the US and China are already deploying a number of surface and underwater drones. Ukraine has had a lot of success with surface suicide drones sinking several Russian ships iirc, damaging bridges etc. Outside of Ukraine and Russia, maybe Israel, nobody is really on the ball when it comes to military competitiveness. To hit home this point, consider that the US military employs about 10.000 drones of all sizes while Ukraine, with an economy 1/5 of the Netherlands, now produces 1-4 million drones a year alone. [ofc drones vary widely in size and capability so this is ofc a little misleading] It should be strongly suspected that when faced with a real peer opponent warring powers will quickly realize they need to massively up production of drones. * there is an interesting acoustic phenomenon where a confluence of environmental factors (like sea depth, temperature, range, etc) create 'sonar deadzones' where submarines are basically invisible. The exact nature of these deadzones is a closely-held state secret - as is the exact design of submarines to make them as silent as possible. As stated, my understanding is that is one of a few remaining areas where the US has a large technological advantage over her Chinese counterparts. You can't hit something you can't see so this advantage is potentially very large. As mentioned, a single torpedo hit will sink a ship; a ballistic missile hit is a mission kill; both attack submarines and ballistic missile submarines are lethal. * Although submarines can dive fairly deep, there are various constraints on how deep they typically dive. e.g. they probably want to stay in these sonar deadzones. -> There was an incident a while back where a (russian? english? french?) submarine hit another submarine (russian? englih? french?) by accident. It underscores how silent submarines are and how there are probably preferred regions underwater where submarines are much more likely t

Damn! Dark forest vibes, very cool stuff!
Reference for the sub collision: https://en.wikipedia.org/wiki/HMS_Vanguard_and_Le_Triomphant_submarine_collision

And here's another one!
https://en.wikipedia.org/wiki/Submarine_incident_off_Kildin_Island

Might as well start equipping them with fenders at this point.


And 2050 basically means post-AGI at this point. ;)

I *despise* the category of "social construction." "Race is a social construct." "Gender is a social construct."

It confusingly conflates these four:

- Socially Constituted: Something that only exists because we say it does conceptually.

- Socially Caused: A thing in the world that only exists because we chose to make it exist.

- Socially Chunked: Something that exists as a spectrum in the real world that we—with some arbitrariness—chunk into discrete categories.

- Socially Charged: Something that exists in the real world that we decide to assign social importa... (read more)

[+][comment deleted]10
nikola155

I recently stopped using a sleep mask and blackout curtains and went from needing 9 hours of sleep to needing 7.5 hours of sleep without a noticeable drop in productivity. Consider experimenting with stuff like this.

Showing 3 of 5 replies (Click to show all)

This is convincing me to buy a sleep mask and blackout curtains. One man's modus ponens is another man's modus tollens as they say.

3nikola
Time in bed
1sliu
I notice the same effect for blackout curtains. I required 8h15m of sleep with no blackout curtains, and require 9h of sleep with blackout curtains.

Ok, I really don't get why my post announcing my book got downvoted (ignored is one thing, downvoted quite another)...

Update: when I made this post the original post was on 5 Karma and out of the frontpage. Now it's 15 Karma, which is about what I expected it to get, given that it's not a core topic of LW and doesn't have a lot of information (though I now added some more information at the end), so I'm happy. Just a bit of a bummer that I had to make this post to get the original post out of the pit of obscurity it was pushed into by noise.

Showing 3 of 21 replies (Click to show all)
Viliam20

I would need more data to make an opinion on this.

At first sight, it seems to me like having a rule "if your total karma is less than 100, you are not allowed to downvote an article or a comment if doing so would push it under zero" would be good.

But I have no idea how often that happens in real life. Do we actually have many readers with karma below 100 who bother to vote?

By the way, I didn't vote on your article, but... you announced that you were writing a book i.e. it is not even finished, you didn't provide a free chapter or something... so what exact... (read more)

1M. Y. Zuo
Sometimes politics IS the core issue, or at least an important underlying cause of the core issue, so a blanket ban on discussing it is a very crude tool. Because it’s effectively banning any substantial discussion on a wide range of topics, and instead replacing it, at best, with a huge pile of euphemisms and seemingly bizarre back and forths. And at worst, nothing at all. So user competence as a factor is unlikely to be completely seperate. Or to look at it from the other angle, in an ideal world with ideal forum participants, there would very likely be a different prevailing norm.
2Seth Herd
It's not a blanket ban. Of course user competence isn't entirely separate, just mostly. In a world with ideal forum participants, we wouldn't be having this conversation :)

How to Poison the Water?

I think we've all heard the saying about the fish and the water (the joke goes, and old fish asks young fish about the water, and the the young fish ask "what's water?).

I'm curious the key failure modes or methods that tend to "poison the water", or destroy/alter an organization/scene's culture or norms negatively. Are there major patterns that communities tend to fall into as they self destruct? 

Would love for anyone to share resources or general reflections on this- I'm currently part of a (unrelated) community where I see th... (read more)

Showing 3 of 5 replies (Click to show all)
3Viliam
Yes. A bit more cynically, sometimes you have a community with no infighting and you think "that's because we are nice people", but the right answer happens to be "that's because infighting isn't profitable yet". And I think this is much more likely to happen over money rather than prestige; prestige is just a possible way to get funding. Prestige itself is less fungible and less zero-sum. For example, imagine that the two of us would start an artistic web project together: we buy a web domain, install some web publishing software, and then each of us posts two or three nice pictures each week. We keep doing it for a few months, and we acquire a group of fans. And suppose that I happen to be the one of us who has the admin password to the software, and also the web domain is registered to my name. It didn't seem important at the beginning; we didn't expect our relationship to go bad, we probably didn't really even expect the project to succeed, and I just happened to be the person with better tech skills or maybe just more free time at the moment. Anyway, the situation is such that I could remove you from the project by clicking a button, should I choose to do so. At first, you just never thought about it, and probably neither did I. (Though it seems to me that some people have the right instincts, and always try to get this kind of a role, just in case.) So, I could remove you by a click of a button, but why would I do that? I am happy to have a partner. A website with twice as many pictures is more likely to get popular. The effect is probably superlinear, because posting a picture every day will make the fans develop a habit to check out website the first thing every morning. Also, we have slightly different styles; some fans prefer my art, some prefer your art. And if I kicked you out, you could just start your own website, and your fans would follow you there. Three years later, we get so popular that some art grant agency notices us, and decides to give us
3halinaeth
This is so fascinating! Your "competent villain" example definitely resonates with me- I also had to learn the hard way to be assertive when it comes to tiny things like domain ownership which could have huge power dynamic impacts down the line. Yeah. To your founder point, it's very very possible as they are VC backed and even the VCs' interests aren't very well aligned with the community. In terms of coup, given VC backed nature + other factors it's nearly impossible to take over. But a ideological split/fork might certainly be possible! Now I'm curious as to history of successful coups. Would the leader usually have to be a prominent member of the old faction as well? Or is it possible for someone with minor power/influence in the old regime to lead a successful coup as well? I definitely need to study my history, thanks for the food for thought.
Viliam20

In my experience, I only remember one example of a successful "coup". It was a private company that started small, and then became wildly successful. Two key employees were savvy enough to realize that this is not necessary a good news for them. The founders, those will definitely become rich. But a rich company will hire more employees, which means that a relative importance of each one of them will decrease. And the position of the founders towards those two will probably become something like: "okay guys, you spent a decade working hard to make all of t... (read more)

I just saw a post from AI Digest on a Self-Awareness benchmark and I just thought, "holy fuck, I'm so happy someone is on top of this".

I noticed a deep gratitude for the alignment community for taking this problem so seriously. I personally see many good futures but that’s to some extent built on the trust I have in this community. I'm generally incredibly impressed by the rigorous standards of thinking, and the amount of work that's been produced.

When I was a teenager I wanted to join a community of people who worked their ass off in order to make sure hu... (read more)

Yes, problems, yes, people are being really stupid, yes, inner alignment and all of it's cousins are really hard to solve. We're generally a bit fucked, I agree. The brickwall is so high we can't see the edge and we have to bash out each brick one at a time and it is hard, really hard.

I get it people, and yet we've got a shot, don't we? The probability distribution of all potential futures is being dragged towards better futures because of the work you put in and I'm very grateful for that.

Like, I don't know how much credit to give LW and the alignment com... (read more)

Parable of the Purple Creatures

Some Medieval townsfolk thought witches were poisoning their wells. Witches, of course, are people—often ugly—who are in league with Satan, can do magic, and have an affinity for broomsticks. These villagers wanted to catch the witches so they could punish them. Everyone in the town felt that witches should be punished. So they set up a commission to investigate the matter.

Well, it turned out little purple alien creatures were poisoning their wells. 

They *weren’t* human. 

They *couldn’t* do magic. 

They *weren’t*... (read more)

Showing 3 of 8 replies (Click to show all)

Same. It's especially true because if a knight saw a T-Rex he wouldn't hesitate to call it a dragon. He wouldn't be like, "What this is is ambiguous." 

1Sam Rosen
I just say, "parable of the purple creatures" when I talk to friends, who I have talked about this with before and understand the concept.  It came up yesterday for me when I was talking to a friend and he was like, "If God made some rules that he would punish us for not following, but the rules weren't intrinsically motivating, should we call that moral realism?" And I was like parable of the purple creatures bro. 
2Noosphere89
This is related to conceptual fragmentation, and one of the reasons why jargon is more useful than people think.

Most murder mysteries on TV tend to have a small number of suspects, and the trick is to find which one did it. I get the feeling that real life murders the police either have absolutely no idea who did it, or know exactly who did it and just need to prove that it was them to the satisfaction of the court of law.

That explains why forensic tests (e.g. fingerprints) are used despite being pretty suspect. They convince the jury that the guilty guy did it, which is all that matters.

See https://issues.org/mnookin-fingerprints-evidence/ for more on fingerprints.

Load More