Comment Permalink

Garrett Baker5d20

After that I was writing shorter posts but without long context the things I write are very counterintuitive. So they got ruined)

Edit: Since if the long post is disliked, you can say "well they just didn't read it", and if the short post is disliked you can say "well it just sucks because its small". Meanwhile, it should in fact be pretty surprising you don't have any interesting or novel or useful insights in your whole 40 minute post which can't be explained in a reasonable length of blog post time.

See in context

2

[ Question ]

Share AI Safety Ideas: Both Crazy and Not. №2

by ank

28th Mar 2025

1 min read

2 10

2

AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.

Let’s throw out all the ideas—big and small—and see where we can take them together.

Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.

A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.

Looking forward to hearing your thoughts and ideas!

P.S. The AIs are moving fast, the last similar discussion was a month ago and was well received, so let's try again and see how the ideas changed.

Aligned AI ProposalsExistential riskSuperintelligenceTool AIAIPracticalWorld ModelingWorld Optimization

Frontpage

2

Share AI Safety Ideas: Both Crazy and Not. №2

New Answer

New Comment

2 Answers sorted by
top scoring

ank

Mar 28, 2025*

1-2

I think there is a major flaw with the setup of the “ForumMagnum” that LessWrong and EA use that causes us to lose many great safety researchers, authors or scare them away:

It’s actually ridiculous: you can double downvote multiple new posts even WITHOUT opening them! For example here (I tried it on one post and then removed the double downvote, please be careful, maybe just believe me it’s sadly really possible to ruin multiple new posts each day like that). UPDATE: My bad, it’s actually double downvoting a particular tag to remove it from that post but the problem is still there: sadistic people (or malicious bots/AI agents) can open new posts and double downvote them in mass without reading at all! https://www.lesswrong.com/w/ai?sortedBy=new
If someone in a bad mood gives your new post a "double downvote" because of a typo in the first paragraph or because a cat stepped on a mouse, even though you solved alignment, people can ignore this “-1” karma post, we're going to scare that genius away and probably make a supervillain instead.
Why not to at least ask people why they downvote? It will really help to improve posts. I think some downvote without reading because of a bad title or another easy to fix thing.
Sadly most ignore posts with some "-1" karma/rating.
If someone downvotes (especially "double downvotes"), the UI should ask: "Why?". Maybe give some common reasons and an ability to send anonymous or public feedback (a comment)
It sometimes feels some people get some sadistic pleasure out of downvoting everything and everyone.
For example, X allows to "downvote" in a more civilized way, by commenting, unfollowing, muting, flagging if it's really naughty, etc.

I for one almost stopped writing here because of anonymous double downvotes of long articles that took days to write (usually if you randomly get a downvote early instead of an upvote, so your post has “-1” karma now, then no one else will open or read it), I have no idea what most of those anonymous double downvoters didn’t like.

Some of my articles take 40 minutes to read, so it can be anything, downvotes give me zero information and just demotivate more and more.

I suspect it’s often something in the title or the first paragraph, it was like that with one of my posts where I politely asked downvoters to at least comment why they downvoted (the post got 20 downvotes from 7 people somehow, because a commenter catastrophized that by my polite asking I destroy the voting system and his followers rage downvoted me :-) It’s not his fault and it wasn’t his intention but it’s strange and majorly demotivating as you can imagine).

Thank you for reading!

[-]Garrett Baker5d53

usually if you randomly get a downvote early instead of an upvote, so your post has “-1” karma now, then no one else will open or read it

I will say that I often do read -1 downvoted posts, I will also say that much of the time it is deserved, despite how noisy a signal it may be.

Some of my articles take 40 minutes to read, so it can be anything, downvotes give me zero information and just demotivate more and more.

I think you should try writing shorter posts. Both for your sake (so you get more targeted information), and for the readers' sake.

1ank5d

Thank you for responding and reading -1 posts, Garrett! It’s important. The long post was actually a blockbuster for me that got 16 upvotes before I screwed the title and it got back to 13) After that I was writing shorter posts but without long context the things I write are very counterintuitive. So they got ruined) I think the right approach was to snowball it: to write each next post as a longer and longer book. I’m writing it now but outside of the website.

2Garrett Baker5d

This sounds like a rationalization. It seems much more likely the ideas just aren't that high quality if you need a whole hour for a single argument that couldn't possibly be broken up into smaller pieces that don't suck. Edit: Since if the long post is disliked, you can say "well they just didn't read it", and if the short post is disliked you can say "well it just sucks because its small". Meanwhile, it should in fact be pretty surprising you don't have any interesting or novel or useful insights in your whole 40 minute post which can't be explained in a reasonable length of blog post time.

1ank5d

It’s a combination of factors, I got some comments on my posts so I got the general idea: 1. My writing style is peculiar, I’m not a native speaker 2. Ideas I convey took 3 years of modeling. I basically Xerox PARCed (attempted and got some results) the ultimate future (billions of years from now). So when I write it’s like some Big Bang: ideas flow in all directions and I never have enough space for them) 3. One commenter recommended to change the title and remove some tags, I did it 4. If I use ChatGPT to organize my writing it removes and garbles it. If I edit it, I like having parenthesis within parenthesis 5. I write a book to solve those problems but mainly human and AI alignment (we better to stop AI agents, it's suicidal to make them) towards the best possible future, to prevent dystopias, it’ll be organized this way: * I’ll start with the “Ethical Big Bang” (physics can be modeled as a subset of ethics), * will chronologically describe and show binary tree model (it models freedoms, choices, quantum paths, the model is simple and ethicophysical so those things are the same in it) of the evolution of inequality from the hydrogen getting trapped in first stars to * hunter-gatherers getting enslaved by agriculturalist and * finish with the direct democratic simulated multiverse vs dystopia where an AI agent grabbed all our freedoms. * And will have a list of hundreds of AI safety ideas for considering.

[-]habryka5d33

sadistic people (or malicious bots/AI agents) can open new posts and double downvote them in mass without reading at all!

We do alt-account detection and mass-voting detection. I am quite confident we would reliably catch any attempts at this, and that this hasn't been happening so far.

Why not to at least ask people why they downvote? It will really help to improve posts. I think some downvote without reading because of a bad title or another easy to fix thing.

Because this would cause people to basically not downvote things, drastically reducing the signal to noise ratio of the site.

2ank4d

UI proposal to solve your concern that it’ll be harder to downvote (that will actually increase signal to noise ratio on the site because both authors and readers will have information why the post had downvotes) and the problem of demotivating authors: * UI proposal to solve the problem of demotivating writers, helps to teach writers how to improve their posts (so it makes all the posts better), it keeps the downvote buttons, increases signal to noise ratio on the site because both authors and readers will have information why the post was downvoted: * It’s important to ask for reasons why a downvoter downvotes if the downvote will move the post below zero in karma. The author was writing something for maybe months, the downvoter if it’s important enough to downvote, will be able to spend an additional moment to choose between some popular reasons to downvote (we have a lot of space on the desktop, we can put the most popular reasons for downvotes as buttons like Spam, Bad Title, Many Tags, Type...) or to choose some reason later on some special page. Else the writer will have no clue, will rage quit and become Sam Altman instead. * More serious now: On desktop we have a lot of space and can show buttons like this: Spam (this can potentially be a big offense, bigger than 1 downvote), Typo (they probably shouldn't lower the post as much as a full downvote), Too Many Tags, basically the popular reasons people downvote. We can make those buttons appear when hovering on the downvote button or always. This way people still click once to downvote like before. * Especially if a downvoter downvoted so much, the post now has negative karma, we show a bubble in a corner for 30 seconds that says something like this: "Please choose one of those other popular reasons people downvote or hover here to type a word or 2 why you double downvoted, it'll help the writer improve." * Downvoters can hover over the downvote button, it’ll hijack their typing cursor so they can qui

1ank5d

Thank you responding, habryka! It’s great we have alt-account and mass-voting detection. We can have a list of popular reasons for downvoting, like typos, bad titles, spammy tags, I’m not sure which are the most popular. The problem as I see it is demotivation of new authors or even experienced ones who has some tentative ideas that sound counterintuitive at first. We live in unreasonable times and possibly the holy grail alignment solutions we are looking for will sound unreasonable at first. On X they don’t have downvotes at first glance but they have similar functionality: they have quick short responses, flagging of spam and other things, muting, blocking, etc. It allows to quickly see what people think by reading responses. Readers will adjust: posts that had -1, will now have 3 upvotes. Posts that had 20 upvotes, will now have 30. There is a way to “inflate” the older posts so they’ll be comprable with new ones.

1ank5d

Some more thoughts: 1. We can prevent people from double downvoting if they opened the post and instantly double downvoted (spend almost zero time on the page). Those are most likely the ones who didn’t read anything except the title. 2. Maybe it’s better for them to flag it instead if it was some spam or another violation, or ask to change the title. It’s unfair for the writer and other readers to get authors double downvoted just because of the bad title or some typo. 3. We have the ability to comment and downvote paragraphs. This feature is great. Maybe we can aggregate those and they’ll be more precise. 4. Especially going below zero is demotivating. So maybe we can ask people to give some feedback (at least as a bubble in a corner after you downvoted. You can ignore this bubble). So you can double downvote someone below zero and then a bubble will appear for 30 seconds and maybe on some “Please, give feedback for some of your downvotes to motivate writers to improve” page. 5. We maybe want to teach authors why others “don’t like their posts”, so this cycle of downvotes (after initial success, almost each post I was writing was downvoted and I had no idea why, I thought they were too short and so hard to get the context, my ideas I counterintuitive and exploratory) will not become perpetual until the author will abandon the whole thing. 6. We can have the great website we have now plus a “school of newbies learning to become great thinkers, writers and safety researchers” by getting feedback or we can become more and more like some elitist club where only the established users are welcome and double upvote each other while double downvoting the newbies and those whose ideas are counterintuitive, too new or written not in some perfect journalistic style. Thank you for considering it! The rational community is great, kind and important. LessWrong is great, kind and important. Great website engines and UIs can become even greater. Thank you for the wor

ank

Mar 28, 2025*

Some ideas, please steelman them:

The elephant in the room: even if current major AI companies will align their AIs, there will be hackers (can create viruses with agentic AI component to steal money), rogue states (can decide to use AI agents to spread propaganda and to spy) and military (AI agents in drones and to hack infrastructure). So we need to align the world, not just the models:
Imagine a agentic AI botnet starts to spread on user computers and GPUs. I call it the agentic explosion, it's probably going to happen before the "intelligence-agency" explosion (intelligence on its own cannot explode, an LLM is a static geometric shape - a bunch of vectors - without GPUs). Right now we are hopelessly unprepared. We won't have time to create "agentic AI antiviruses".
To force GPU and OS providers to update their firmware and software to at least have robust updatable blacklists of bad AI (agentic?) models. And to have robust whitelists, in case there will be so many unaligned models, blacklists will become useless.
We can force NVIDIA to replace agentic GPUs with non-agentic ones. Ideally those non-agentic GPUs are like sandboxes that run an LLM internally and can only spit out the text or image as safe output. They probably shouldn't connect to the Internet, use tools, or we should be able to limit that in case we'll need to.
This way NVIDIA will have the skin in the game and be directly responsible for the safety of AI models that run on its GPUs.
The same way Apple feels responsible for the App Store and the apps in it, doesn't let viruses happen.
NVIDIA will want it because it can potentially like App Store take 15-30% cut from OpenAI and other commercial models, while free models will remain free (like the free apps in the App Store).
Replacing GPUs can double NVIDIA's business, so they can even lobby themselves to have those things. All companies and CEOs want money, have obligations to shareholders to increase company's market capitalization. We must make AI safety something that is profitable. Those companies that don't promote AI safety should go bankrupt or be outlawed.

Moderation Log

2

[ Question ]

Share AI Safety Ideas: Both Crazy and Not. №2

2

2

2 Answers sorted by top scoring

Mar 28, 2025*

Mar 28, 2025*

2 Answers sorted by
top scoring