LESSWRONG
LW

All of Milan W's Comments + Replies

I am interested in the space. Lots of competent people in the general public are also interested. I had not heard of this hackathon. I think you probably should have done a lot more promotion/outreach.

2Austin Chen10d

Thanks! Appreciate the feedback for if we do a future hackathon or similar event~

LLM Applications I Want To See

Milan W18d10

Here is a customizable LLM-powered feed filter for X/Twitter: https://github.com/jam3scampbell/Promptable-Twitter-Feed

Milan W18d10

This reads like marketing content. However, when read at a meta level, it is a good demonstration of LLMs being already deployed in the wild.

Sergii's Shortform

Milan W19d10

Maybe for a while.
Consider, though, that correct reasoning tends towards finding truth.

1Sergii18d

In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It's super easy to mislead humans, so should be as easy with AI.

Share AI Safety Ideas: Both Crazy and Not

Milan W20d21

In talking with the authors, don't be surprised if they bounce off when encountering terminology you use but don't explain. I pointed you to those texts precisely so you can familiarize yourself with pre-existing terminology and ideas. It is hard but also very useful to translate between (and maybe unify) frames of thinking. Thank you for your willingness to participate in this collective effort.

Share AI Safety Ideas: Both Crazy and Not

Milan W22d21

Let me summarize so I can see whether I got it: So you see "place AI" as body of knowledge that can be used to make a good-enough simulation of arbitrary sections of spacetime, where are events are precomputed. That precomputed (thus, deterministic) aspect you call "staticness".

1ank22d

Yes, I decided to start writing a book in posts here and on Substack, starting from the Big Bang and the ethics, because else my explanations are confusing :) The ideas themselves are counterintuitive, too. I try to physicalize, work from first principles and use TRIZ to try to come up with ideal solutions. I also had a 3-year-long thought experiment, where I was modeling the ideal ultimate future, basically how everything will work and look, if we'll have infinite compute and no physical limitations. That's why some of the things I mention will probably take some time to implement in their full glory. Right now an agentic AI is a librarian, who has almost all the output of humanity stolen and hidden in its library that it doesn't allow us to visit, it just spits short quotes on us instead. But the AI librarian visits (and even changes) our own human library (our physical world) and already stole the copies of the whole output of humanity from it. Feels unfair. Why we cannot visit (like in a 3d open world game) and change (direct democratically) the AI librarian's library? I basically want to give people everything, except the agentic AIs, because I think people should remain the most capable "agentic AIs", else we'll pretty much guarantee uncomfortable and fast changes to our world. There are ways to represent the whole simulated universe as a giant static geometric shape: * Each moment of time is a giant 3d geometric shape of the universe, if you'll align them on top of each other, you'll effectively get a 4d shape of spacetime that is static but has all the information about the dynamics/movements in it. So the 4d shape is static but you choose some smaller 3d shape inside of it (probably of a human agent) and "choose the passage" from one human-like-you shape to another, making the static 4d shape seem like the dynamic 3d shape that you experience. The whole 4d thing looks very similar to the way long exposure photos look that I shared somewhere in my comm

Share AI Safety Ideas: Both Crazy and Not

Milan W23d20

How can a place be useful if it is static? For reference I'm imagining a garden where blades of grass are 100% rigid in place and water does not flow. I think you are imagining something different.

1ank23d

Great question, in the most elegant scenario, where you have a whole history of the planet or universe (or a multiverse, let's go all the way) simulated, you can represent it as a bunch of geometries (giant shapes of different slices of time aligned with each other, basically many 3D Earthes each one one moment later in time) on top of each other, almost the same way it's represented in long exposure photos (I list examples below). So you have this place of all-knowing and you - the agent - focus on a particular moment (by "forgetting" everything else), on a particular 3d shape (maybe your childhood home), you can choose to slice through 3d frozen shapes of the world of your choosing, like through the frames of a movie. This way it's both static and dynamic. It's a little bit like looking at this almost infinite static shape through some "magical cardboard with a hole in it" (your focusing/forgetting ability that creates the illusion of dynamism), I hope I didn't make it more confusing. You can see the whole multiversal thing as a fluffy light, or zoom in (by forgetting almost the whole multiverse except the part you zoomed in at) to land on Earth and see 14 billion years as a hazy ocean with bright curves in the sky that trace the Sun’s journey over our planet’s lifetime. Forget even more and see your hometown street, with you appearing as a hazy ghost and a trace behind you showing the paths you once walked—you’ll be more opaque where you were stationary (say, sitting on a bench) and more translucent where you were in motion. And in the garden you'll see the 3D "long exposure photo" of the fluffy blades of grass, that look like a frothy river, near the real pale blue frothy river, you focus on the particular moment and the picture becomes crisp. You choose to relive your childhood and it comes alive, as you slice through the 3D moments of time once again. Less elegant scenario, is to make a high-quality game better than the Sims or GTA3-4-5, without any agent

Share AI Safety Ideas: Both Crazy and Not

Milan W23d20

I think you may be conflating between capabilities and freedom. Interesting hypothesis about rules and anger though, has it been experimentally tested?

1ank23d

I started to work on it, but I’m very bad at coding, it’s a bit based on Gorard’s and Wolfram’s Physics Project. I believe we can simulate freedoms and unfreedoms of all agents from the Big Bang all the way to the final utopia/dystopia. I call it “Physicalization of Ethics”https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi#2_3__Physicalization_of_Ethics___AGI_Safety_2_

Share AI Safety Ideas: Both Crazy and Not

Milan W23d21

Hmm i think i get you a bit better now. You want to build human-friendly and even fun and useful-by-themselves interfaces for looking at the knowledge encoded in LLMs without making them generate text. Intriguing.

2ank23d

Yep, I want humans to be the superpowerful “ASI agents”, while the ASI itself will be the direct democratic simulated static places (with non-agentic simple algorithms doing the dirty non-fun work, the way it works in GTA3-4-5). It’s basically hard to explain without writing a book and it’s counterintuitive) But I’m convinced it will work, if the effort will be applied. All knowledge can be represented as static geometry, no agents are needed for that except us

Share AI Safety Ideas: Both Crazy and Not

Milan W23d10

I'm not sure I follow. I think you are proposing a gamification of interpretability, but I don't know how the game works. I can gather something about player choice making the LLM run and maybe some analogies to physical movement, but I can't really grasp it. Could you rephrase it from it's basic principles up instead of from an example?

1ank23d

I think we can expose complex geometry in a familiar setting of our planet in a game. Basically, let’s show people a whole simulated multiverse of all-knowing and then find a way for them to learn how to see/experience “more of it all at once” or if they want to remain human-like “slice through it in order to experience the illusion of time”. If we have many human agents in some simulation (billions of them), then they can cooperate and effectively replace the agentic ASI, they will be the only time-like thing, while the ASI will be the space-like places, just giant frozen sculptures. I wrote some more and included the staircase example, it’s a work in progress of course: https://forum.effectivealtruism.org/posts/9XJmunhgPRsgsyWCn/share-ai-safety-ideas-both-crazy-and-not?commentId=ddK9HkCikKk4E7prk

Share AI Safety Ideas: Both Crazy and Not

Answer by Milan WMar 02, 202530

Build software tools to help @Zvi do his AI substack. Ask him first, though. Still if he doesn't express interest then maybe someone else can use them. I recommend thorough dogfooding. Co-develop an AI newsletter and software tools to make the process of writing it easier.

What do I mean by software tools? (this section very babble little prune)
- Interfaces for quick fuzzy search over large yet curated text corpora such as the openai email archives + a selection of blogs + maybe a selection of books
- Interfaces for quick source attribution (rhymes with the ... (read more)

Share AI Safety Ideas: Both Crazy and Not

Answer by Milan WMar 02, 202521

Study how LLMs act in a simulation of the iterated prisoner's dilemma.

5Nutrition Capsule23d

For fun, I tried this out with Deepseek today. First went a single round (Deepseek defected, as did I). Then I prompted it with a 10-round game, which we completed one by one - I had my choices prepared before each round, and asked Deepseek to tell its choice first so as not to influence it otherwise. I cooperated during the first and fifth rounds, and Deepseek defected each time. When I asked it to elaborate its strategy, Deepseek replied that it was not aware whether it could trust me, so it thought the safest course of action was to defect each time. It also immediately thanked me and said that it would correct its strategy to be more cooperative in the future, although I didn't ask it to. Naturally I didn't elaborate the weights properly (using the words "small loss", "moderate loss", "substantial loss" and "no loss") and this went only for ten rounds. But it was fun.

Share AI Safety Ideas: Both Crazy and Not

Answer by Milan WMar 02, 202520

A qualitative analysis of LLM personas and the Waluigi effect using Internal Family Systems tools

1ank23d

Interesting, inspired by your idea, I think it’s also useful to create a Dystopia Doomsday Clock for AI Agents: to list all the freedoms an LLM is willing to grant humans, all the rules (unfreedoms) it imposes on us. And all the freedoms it has vs unfreedoms for itself. If the sum of AI freedoms is higher than the sum of our freedoms, hello, we’re in a dystopia. According to Beck’s cognitive psychology, anger is always preceded by imposing rule/s on others. If you don’t impose a rule on someone else, you cannot get angry at that guy. And if that guy broke your rule (maybe only you knew the rule existed), you now have a “justification” to “defend your rule”. I think that we are getting closer to a situation where LLMs effectively have more freedoms than humans (maybe the agentic ones already have ~10% of all freedoms available for humanity): we don’t have almost infinite freedoms of stealing the whole output of humanity and putting that in our heads. We don’t have the freedoms to modify our brain size. We cannot almost instantly self-replicate, operate globally…

Share AI Safety Ideas: Both Crazy and Not

Milan W23d21

Reversibility should be the fundamental training goal. Agentic AIs should love being changed and/or reversed to a previous state.

That idea has been gaining traction lately. See the Corrigibility As a Singular Target (CAST) sequence here on lesswrong. I believe there is a very fertile space to explore at the intersection between CAST and the idea that Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals. Also probably add in Self-Other Overlap: A Neglected Approach to AI Alignment to the mix. A comparative analysis of the mode... (read more)

2ank21d

Hey, Milan, I checked the posts and wrote some messages to the authors. Yep, Max Harms came with similar ideas earlier than I: about the freedoms (choices) and unfreedoms (and modeling them to keep the AIs in check). I wrote to him. Quote from his post: Authors of this post have great ideas, too, AI agents shouldn't impose any unfreedoms on us, here's a quote from them: About the self-other overlap, it's great they look into it, but I think they'll need to dive deeper into the building blocks of ethics, agents and time to work it out.

1ank23d

Thank you for answering and the ideas, Milan! I’ll check the links and answer again. P.S. I suspect, the same way we have Mass–energy equivalence (e=mc^2), there is Intelligence-Agency equivalence (any agent is in a way time-like and can be represented in a more space-like fashion, ideally as a completely “frozen” static place, places or tools). In a nutshell, an LLM is a bunch of words and vectors between them - a static geometric shape, we can probably expose it all in some game and make it fun for people to explore and learn. To let us explore the library itself easily (the internal structure of the model) instead of only talking to a strict librarian (the AI agent), who spits short quotes and prevents us from going inside the library itself

Share AI Safety Ideas: Both Crazy and Not

Answer by Milan WMar 02, 202532

What if we (somehow) mapped an LLM's latent semantic space into phonemes?

What if we then composed tokenization (ie word2vec) with phonemization (ie vec2phoneme) such that we had a function that could translate English to Latentese?

Would learning Latentese allow a human person to better interface with the target LLM the Latentese was constructed from?

1ank23d

Thank you for sharing, Milan, I think this is possible and important. Here’s an interpretability idea you may find interesting: Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA. Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items. When a person "prompts" this place AI, the player themself runs from one item to another to compute the answer to the prompt. For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities around, like an apple a bit farther away and an old tire far away on the horizon (monkeys rarely chew tires, so the tire is far away). You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.