ank

We were modelling the ultimate best future of humanity (billions of years from now) for 3+ years, Xerox PARC-style, we got very exciting results. Including AI safety results. x.com/tonykarev

Wikitag Contributions

Comments

Sorted by
ank*20

Yes, I want humans to be the superpowerful “ASI agents”, while the ASI itself will be the direct democratic simulated static places (with non-agentic simple algorithms doing the dirty non-fun work, the way it works in GTA3-4-5). It’s basically hard to explain without writing a book and it’s counterintuitive) But I’m convinced it will work, if the effort will be applied. All knowledge can be represented as static geometry, no agents are needed for that except us

ank10

Yes, we may want to have the ability to have some agency (especially human-initiated for less than an hour. so a person can supervise) but probably not letting agents roam free for days, weeks, years unsupervised, no one will monitor them, people cannot monitor for so long. So we better to have some limits and tools to impose those limits in every GPU

ank10

Thank you for your analysis, Winston! Sadly I have to write fast here because many of my posts get not much attention or minuses)

Here is a drafty continuation you can find interesting (or not ;):

In unreasonable times the solution to AI problem will sounds unreasonable at first. Even though it's probably the only reasonable and working solution.

Imagine in a year we solved alignment and even hackers/rogue states cannot unleash AI agents on us. How we did it?

  1. The most radical solution that will do it (unrealistic and undesirable): is having international cooperation and destroying all the GPUs, never making them again. Basically returning to some 1990s computer-wise, no 3D video games but everything else is similar. But it's unrealistic and probably stifles innovation too much.

  2. Less radical is keeping GPUs so people can have video games and simulations but internationally outlawing all AI and replacing GPUs with the ones that completely not support AI. They can even burn and call some FBI if a person tries to run some AI on it, it's a joke. So like returning to 2020 computer-wise, no AI but everything else the same.

  3. Less radical is to have whitelists of models right on GPU, a GPU becomes a secure computer that only works if it's connected to the main server (it can be some international agency, not NVIDIA, because we want all GPU makes, not just NVIDIA to be forced to make non-agentic GPUs). NVIDIA and other GPU providers approve models a bit like Apple approves apps in their App Store. Like Nintendo approves games for her Nintendo Switch. So no agentic models, we'll have non-agentic tool AIs that Max Tegmark recommends: they are task specific (don't have broad intelligence), they can be chatbots, fold proteins, do everything without replacing people. And place AIs that allow you to be the agent and explore the model like a 3D game. This is a good solution that keeps our world the way it is now but 100% safe.

And NVIDIA will be happy to have this world, because it will double her business, NVIDIA will be able to replace all the GPUs: so people will bring theirs and get some money for it, then they buy new non-agentic sandboxed GPU with an updatable whitelist (probably to use gpus you'll need internet connection from now on, especially if you didn't update the whitelist of AI models for more than a few days).

And NVIDIA will be able to take up to 15-30% commission from the paid AI model providers (like OpenAI). Smaller developers will make models, they will be registered in a stricter fashion than in Apple's App Store, in a similar fashion to Nintendo developers. Basically we'll want to know they are good people and won't run evil AI models or agents while pretending they are developing something benign. .. So we need just to spread the world and especially convince the politicians of the dangers and of this solution: that we just need to make GPU makers the gatekeepers who have skin in the game to keep all the AI models safe.

We'll give deadlines to GPU owners, first we'll update their GPUs with blacklists and whitelists. There will be a deadline to replace GPUs, else the old ones will stop working (will be remotely bricked, all OSes and AI tools will have a list of those bricked GPUs and will refuse to work with them) and law enforcement will take possession of them.

This way we'll sanitize our world from insecure unsafe GPUs we have now. Only whitelisted models will run inside of the sandboxed GPU and will only spit out safe text or picture output.

Having a few GPU companies to control is much easier than having infinitely many insecure unsafe GPUs with hackers, military and rogue states everywhere.

At least we can have politicians (in order to make defense and national security better) make NVIDIA and other GPU manufacturers sell those non-agentic GPUs to foreign countries, so there will be bigger and bigger % of non-agentic (or it can be some very limited agency if math proven safe) GPUs that are mathematically proven to be safe. Same way we try to make fewer countries have nuclear weapons, we can replace their GPUs (their "nukes", their potentially uncontrollable and autonomous weapons) with safe non-agentic GPUs (=conventional non-military civilian tech)

ank10

Yes, Buck, thank you for responding! A robust whitelist (especially hardware level, each GPU can become a computer for securing itself) potentially solves it (of course if there will be some state-level actors, it can potentially be broken, but at least millions of consumer GPUs will be protected). Each GPU is a battleground, we want to increase current 0% security, to above 0 on as many GPUs as possible, first in firmware (and on OS level) because updating online is easy, then in hardware (can bring much better security).

In the safest possible implementation, I imagine it as Apple App Store (or Nintendo online game shop): the AI models become a bit like apps, they run on the GPU internally, NVIDIA looks after them (they ping NVIDIA servers constantly or at least every few days to recheck the lists and update the security).

NVIDIA can be super motivated to have robust safety: they'll be able to get old hardware for cheap and sell new non-agentic GPUs (so they'll double their business) and have commissions like Apple does (so every GPU becomes a service business for NVIDIA, with constant cashflow, of course there will be free models, like free apps in the App Store, but each developer will be at least registered and so not some anonymous North Korean hacker), they'll find a way to make things very secure.

The ultimate test is this: can NVIDIA sell their non-agentic super-secure GPUs to North Korea without any risks? I think it's possible to have even some simple self-destruct mechanisms in case of attempted tampering.

But lets not make the perfect be the enemy of good. Right now we have nukes in each computer (GPUs) that are 100% unprotected at all. At least blacklists will already be better than nothing, and with new secure hardware, it can really slow down AI agents from spreading, so we can be 50% sure we'll have 99% security in most cases but it can become better and better (same way first computers were buggy and completely insecure but we started to make them more and more secure, at least gradually).

Let's not give up because we are not 100% sure we'll have 100% security) We'll probably never have that we can only have a path towards it that seems reasonable enough. We need rich allies, incentives that are aligned with us and with safety.

ank-2-11

The only complete and comprehensive solution that can make AIs 100% safe: in a nutshell we need to at least lobby politicians to make GPU manufacturers (NVIDIA and others) to make robust blacklists (whitelists and new non-agentic hardware, please, read on) of bad AI models, update GPU firmwares with them. It's not the full solution: please steelman and read the rest to learn how to make it much safer and why it will work (NVIDIA and other GPU makers will want to do it because it'll double their business and all future cash flows. Gov will want it because it removes all AI threats from China, all hackers, terrorists and rogue states):

  1. The elephant in the room: even if current major AI companies will align their AIs, there will be hackers (can create viruses with agentic AI component to steal money), rogue states (can decide to use AI agents to spread propaganda and to spy) and military (AI agents in drones and to hack infrastructure). So we need to align the world, not just the models:
  2. Imagine a agentic AI botnet starts to spread on user computers and GPUs, GPUs are like nukes to be taken, they are not protected from running bad AI models at all. I call it the agentic explosion, it's probably going to happen before the "intelligence-agency" explosion (intelligence on its own cannot explode, an LLM is a static geometric shape - a bunch of vectors - without GPUs). Right now we are hopelessly unprepared. We won't have time to create "agentic AI antiviruses".
  3. To force GPU and OS providers to update their firmware and software to at least have robust updatable blacklists of bad AI (agentic) models. And to have robust whitelists, in case there will be so many unaligned models, blacklists will become useless.
  4. We can force NVIDIA to replace agentic GPUs with non-agentic ones. Ideally those non-agentic GPUs are like sandboxes that run an LLM internally and can only spit out the text or image as safe output. They probably shouldn't connect to the Internet, use tools, or we should be able to limit that in case we'll need to.
  5. This way NVIDIA will have the skin in the game and be directly responsible for the safety of AI models that run on its GPUs.
  6. The same way Apple feels responsible for the App Store and the apps in it, doesn't let viruses happen.
  7. NVIDIA will want it because it can potentially like App Store take 15-30% cut from OpenAI and other commercial models, while free models will remain free (like the free apps in the App Store).
  8. Replacing GPUs can double NVIDIA's business, so they can even lobby themselves to have those things. All companies and CEOs want money, have obligations to shareholders to increase company's market capitalization. We must make AI safety something that is profitable. Those companies that don't promote AI safety should go bankrupt or be outlawed.
ank*32

 

UI proposal to solve your concern that it’ll be harder to downvote (that will actually increase signal to noise ratio on the site because both authors and readers will have information why the post had downvotes) and the problem of demotivating authors:

  • UI proposal to solve the problem of demotivating writers, helps to teach writers how to improve their posts (so it makes all the posts better), it keeps the downvote buttons, increases signal to noise ratio on the site because both authors and readers will have information why the post was downvoted:
  • It’s important to ask for reasons why a downvoter downvotes if the downvote will move the post below zero in karma. The author was writing something for maybe months, the downvoter if it’s important enough to downvote, will be able to spend an additional moment to choose between some popular reasons to downvote (we have a lot of space on the desktop, we can put the most popular reasons for downvotes as buttons like Spam, Bad Title, Many Tags, Type...) or to choose some reason later on some special page. Else the writer will have no clue, will rage quit and become Sam Altman instead.
  • More serious now: On desktop we have a lot of space and can show buttons like this: Spam (this can potentially be a big offense, bigger than 1 downvote), Typo (they probably shouldn't lower the post as much as a full downvote), Too Many Tags, basically the popular reasons people downvote. We can make those buttons appear when hovering on the downvote button or always. This way people still click once to downvote like before.
  • Especially if a downvoter downvoted so much, the post now has negative karma, we show a bubble in a corner for 30 seconds that says something like this: "Please choose one of those other popular reasons people downvote or hover here to type a word or 2 why you double downvoted, it'll help the writer improve."
  • Downvoters can hover over the downvote button, it’ll hijack their typing cursor so they can quickly type a word or two why they downvote and press enter to downvote. Again very elegant UI.
  • If they just click downvote (you can still keep this button), show a ballon for 30 secs where a downvoter can choose a popular reasons for downvoting or again hover and instantly type a word or 2 and press enter to downvote
  • A page “Leave feedback for your downvotes”: where we show downvotes that ruined articles, then double downvotes then the rest. We can say it honestly in our UI: those are new authors and your downvote most likely demotivated them because the article has negative karma now, so please write how can they improve. And we maybe can write that feedback for popular articles (with comments, from established writers and/or above zero karma posts) is not as important, so you don’t need to bother.
  • Phone UI is similar but we don’t have hovering. So we can still show popular reasons to downvote on top maybe as icons: Spam, Typo, etc and a button to Type and Downvote that looks like a typing cursor & Downvote icon - by tapping it the keyboard autoappears, you enter a word or two and press return to downvote. So again one tap to downvote like before in most cases. We just gave the downvoters more choices to express themselves.
  • As I said before we need this at least when the downvote will ruin the post - put it into the negative karma territory. This way we prioritizing teaching people how to write better and more, instead of scaring them away to abandon our website. Everybody wins.
  • Thank you for reading and making the website work!
ank*10

It’s a combination of factors, I got some comments on my posts so I got the general idea:

  1. My writing style is peculiar, I’m not a native speaker
  2. Ideas I convey took 3 years of modeling. I basically Xerox PARCed (attempted and got some results) the ultimate future (billions of years from now). So when I write it’s like some Big Bang: ideas flow in all directions and I never have enough space for them)
  3. One commenter recommended to change the title and remove some tags, I did it
  4. If I use ChatGPT to organize my writing it removes and garbles it. If I edit it, I like having parenthesis within parenthesis
  5. I write a book to solve those problems but mainly human and AI alignment (we better to stop AI agents, it's suicidal to make them) towards the best possible future, to prevent dystopias, it’ll be organized this way:
  • I’ll start with the “Ethical Big Bang” (physics can be modeled as a subset of ethics),
  • will chronologically describe and show binary tree model (it models freedoms, choices, quantum paths, the model is simple and ethicophysical so those things are the same in it) of the evolution of inequality from the hydrogen getting trapped in first stars to
  • hunter-gatherers getting enslaved by agriculturalist and
  • finish with the direct democratic simulated multiverse vs dystopia where an AI agent grabbed all our freedoms.
  • And will have a list of hundreds of AI safety ideas for considering.
ank*10

Some more thoughts:

  1. We can prevent people from double downvoting if they opened the post and instantly double downvoted (spend almost zero time on the page). Those are most likely the ones who didn’t read anything except the title.

  2. Maybe it’s better for them to flag it instead if it was some spam or another violation, or ask to change the title. It’s unfair for the writer and other readers to get authors double downvoted just because of the bad title or some typo.

  3. We have the ability to comment and downvote paragraphs. This feature is great. Maybe we can aggregate those and they’ll be more precise.

  4. Especially going below zero is demotivating. So maybe we can ask people to give some feedback (at least as a bubble in a corner after you downvoted. You can ignore this bubble). So you can double downvote someone below zero and then a bubble will appear for 30 seconds and maybe on some “Please, give feedback for some of your downvotes to motivate writers to improve” page.

  5. We maybe want to teach authors why others “don’t like their posts”, so this cycle of downvotes (after initial success, almost each post I was writing was downvoted and I had no idea why, I thought they were too short and so hard to get the context, my ideas I counterintuitive and exploratory) will not become perpetual until the author will abandon the whole thing.

  6. We can have the great website we have now plus a “school of newbies learning to become great thinkers, writers and safety researchers” by getting feedback or we can become more and more like some elitist club where only the established users are welcome and double upvote each other while double downvoting the newbies and those whose ideas are counterintuitive, too new or written not in some perfect journalistic style.

Thank you for considering it! The rational community is great, kind and important. LessWrong is great, kind and important. Great website engines and UIs can become even greater. Thank you for the work you do!

Load More