LESSWRONG
is fundraising!
LW

Tapatakt's Shortform — LessWrong

Tapatakt's Shortform

11th Mar 2024

1 min read

5

This is a special post for quick takes by Tapatakt. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

56 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:23 AM

[-]Tapatakt1y160

I want to create a new content about AI Safety for Russian speakers. I was warned about possible backlash if I do it wrong.

What are the actual examples when bad oversimplified communication harmed the case it agitated for? Whose mistakes can I learn from?

[-]Viliam1y41

I think if the English original is considered good, there should be nothing wrong with a translation. So make sure you translate good texts. (If you are writing your own text, write English version first and ask for feedback.)

Also, get ready for disappointment if it turns out that the overlap between "can meaningfully debate AI safety" and "has problems reading English" turns out to be very small, possibly zero.

To give you a similar example, I have translated the LW Sequences to Slovak language, some people shared it on social networks, and the ultimate result was... nothing. The handful of Slovak people who came to at least one LW meetup all found the rationalist community on internet, and didn't read my translation.

This is not an argument against translating per se. I had much greater success at localizing software. It's just, when the target audience is very smart people, then... smart people usually know they should learn English. (A possible exception could be writing for smart kids.)

[-]Milan W1y10

(A possible exception could be writing for smart kids.)

The OP probably already knows this, but HPMOR has already been translated into Russian.

[-]Canaletto1y4-1

Not to be dissuading, but probably a lot of people who can do relevant work know English pretty well anyway? Speaking from experience, I guess, most students knew English well enough and consumed English content when i was in university. Especially the most productive ones. So, this still can be interesting project, but not like, very important and/or worth your time.

[-]RHollerith1y41

Even people who know English pretty well might prefer to consume information in their native language, particularly when they aren't in a task-oriented frame of mind and do not consider themselves to be engaged in work, which I'm guessing is when people are most receptive to learning more about AI safety.

[-]Tapatakt1y162

Did anyone try something like this?

Create a conlang with very simple grammar and small vocabulary (not like tokipona small, more like xkcd-thing-explainer small).
Use LLMs to translate a lot of texts into this conlang.
Train new LLM on this translations.
Try to research interpretability on this LLM.

[-]Viliam1y42

There is a Simple English Wikipedia with over 200 000 articles, which is not exactly what you want, but seems to be a thing that already exists and is somewhat in that direction.

[-]Zac Hatfield-Dodds1y40

I don't recall any interpretability experiments with TinyStories offhand, but I'd be surprised if there aren't any.

[-]Nathan Helm-Burger1y20

I agree that this sounds interesting and that I haven't heard of anyone doing this yet. I have heard of some interpretability experiments with TinyStories, as Zac mentioned. I think the more interesting thing would be a dataset focused on being enriched with synthetic data showing inherently logical things like deductive symbolic logic and math problems worked out (correctly!) step-by-step. You could have a dataset of this, plus the simplified-language versions of middle school through undergrad science textbooks. I expect the result would likely be more logical, and cohesive. It would be interesting to see if this made the model fundamentally more interpretable.

[-]Tapatakt1y160

A random thought on how to explain instrumental convergence:

You can teach someone the basics of, say, Sid Meier's Civilization V for a quite long time without explaining what the victory conditions are. There are many possible alternative victory conditions that would not change the development strategies much.

[-]Tapatakt1y145

The Seventh Sally or How Trurl's Own Perfection Led to No Good

Thanks to IC Rainbow and Taisia Sharapova who brought this matter in MiriY Telegram chat.

What. The. Hell.

In their logo they have:

They Think. They Feel. They're Alive

And the title of the video on the same page is:

AI People Alpha Launch: AI NPCs Beg Players Not to Turn Off the Game

And in the FAQ they wrote:

The NPCs in AI People are indeed advanced and designed to emulate thinking, feeling, a sense of aliveness, and even reactions that might resemble pain. However, it's essential to understand that they operate on a digital substrate, fundamentally different from human consciousness's biological substrate.

So this is the best argument they have?

Wake up, Torment Nexus just arrived.

(I don't think current models are sentient, but the way of thinking "they are digital, so it's totally OK to torture them" is utterly insane and evil)

[-]Richard_Kennaway1y92

(I don't think current models are sentient, but the way of thinking "they are digital, so it's totally OK to torture them" is utterly insane and evil)

I don't think the trailer is saying that. It's just showing people examples of what you can do, and what the NPCs can do. Then it's up to the player to decide how to treat the NPCs. AIpeople is creating the platform. The users will decide whether to make Torment Nexi.

At the end of the trailer, the NPCs are conspiring to escape the simulation. I wonder how that is going to be implemented in game terms.

I notice that there also exists a cryptocoin called AIPEOPLE, and a Russian startup based in Cyprus with the domain aipeople dot ru. I do not know if these have anything to do with the AIpeople game. The game itself is made by Keen Software House. They are based in Prague together with their sister company GoodAI.

[-]Tapatakt1y109

I don't think "We created a platform that lets you make digital minds feel bad and in the trailer we show you that you can do it, but we are in no way morally responsible if you will actually do it" is a defensible position. Anyway, they don't use this argument, only one about digital substrate.

[-]Richard_Kennaway1y60

The trailer is designed to draw prospective players' attention to the issue, no more than that. If you "don't think current models are sentient", and hence are not actually feeling bad, then I don't see a reason for having a problem here, in the current state of the game. If they manage to produce this game and keep upgrading it with the latest AI methods, when will you know if there is a problem?

I do not have an answer to that question.

[-]green_leaf1y30

Ideally, AI characters would get rights as soon as they could pass the Turing test. In the actual reality, we all know how well that will go.

[-]Tapatakt1mo119

New cause area: translate all EY's writing into Basic English. Only half-joke. And it's not only about Yudkowsky.

I think I will actually do something like this with some text for testing purposes.

[-]Kabir Kumar1mo40

We're actually doing this with the Arbital Alignment articles

[-]MichaelDickens1mo20

I don't do this on purpose but I feel like 90% of what I write about AI is something Eliezer already said at some point.

[-]Kabir Kumar1mo10

many such cases

[-]plex1mo2-2

Finally, a cause area that LLMs can just solve. Looking forward to the new version of readthesequences, expect with style transfer.

[-]Sodium1mo10

Man this is such a big issue with the Sequences. Like, "Is that your true rejection" is a concept that I use very often: when I decide to not do something, I would sometimes go "hmm, what's the real reason I don't want to do this?" I believe that we often come up with nice-sounding reasons to not do this that are totally unrelated to our true motivations, and noticing this is an important rationalist skill.

But "Is that your true rejection" is also just Eliezer complaining about how people wouldn't listen to him because he doesn't have a PhD, and him saying "I bet you wouldn't listen, even if I had one!!" ~~Sure grandpa let's get you to bed~~

[-]Signer1mo10

What it needs is not simpler words, but explicit type-annotations.

[-]Tapatakt1mo21

Most people don't understand the concept of type-annotation.

I think it's mostly not about simpler words, but about simpler sentences, actually.

[-]Kabir Kumar1mo10

i think, reduction of abstractions

[-]Tapatakt1y81

I came up with the decision theory problem. It has the same moral as xor-blackmail, but I think it's much easier to understand:

Omega has chosen you for an experiment:

First, Omega predicts your choice in a potential future offer.
Omega rolls a die. Omega doesn't show you the result.
If Omega predicted you would choose $200, they will only make you an offer if the die shows 6.
If Omega predicted you would choose $100, they will make you an offer if the die shows any number except 1.
Omega's offer, if made, is simple: "Would you like $100 or $200?"

You received an offer from Omega. Which amount do you choose?

I didn't come up with a сatchy name, though.

[-]Terence Coelho1y10

This was easier for me to understand (but everything is easier to understand it when you see it a second time, phrased in a different way).

[-]Tapatakt1y70

About possible backlashes from unsuccesfull communication.

I hoped for some examples like "anti-war movies have unintentionally boosted military recruitment", which is the only example I remembered myself.

Asked the same question to Claude, it gave me this examples:

Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The "Just Say No" anti-drug campaign: While well-intentioned, some research suggests this oversimplified message may have increased drug use among certain groups by triggering a "forbidden fruit" effect.

All others were not much relevant, mostly like "harm of this oversimplified communication was in oversimplification".

The common thing in two relevant examples and my own example about anti-war movies is, I think, "try to ensure you don't make bad thing look cool". Got it.

But is it all? Are there any examples that don't come down to this?

[-]Viliam1y20

"try to ensure you don't make bad thing look cool"

A similar concern is that maybe the thing is so rare that previously most people didn't even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason.

Infohazard:

Telling large groups of people, especially kids and teenagers, "don't put a light bulb in your mouth" or "don't lick the iron fence during winter" predictable leads to some people trying it, because they are curious about what will actually happen, or whether the horrible consequences you described were real.

Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence.

How does this apply to AI safety? If you say "if you do X, you might destroy humanity", in theory someone is guaranteed to do X or something similar to X, either because they think it is "edgy", or because they want to prove you wrong. But in practice, most people don't actually have an opportunity to do X.

[-]Tapatakt2y*70

Insight from "And All the Shoggoths Merely Players".

We know about Simulacrum Levels.

Simulacrum Level 1: Attempt to describe the world accurately.
Simulacrum Level 2: Choose what to say based on what your statement will cause other people to do or believe.
Simulacrum Level 3: Say things that signal membership to your ingroup.
Simulacrum Level 4: Choose which group to signal membership to based on what the benefit would be for you.

I suggest adding onother one:

Simulacrum Level NaN: Choose what to say based on ~~what changes your statement will cause in the world~~ (UPD: too CDT-like phrasing) what statement you think is best for you to say to achieve your goals without meaningful conceptual boundaries between "other people believe" and other consequenсes, and between "say" and other actions.

It's similar to Level 2, but it's not the same. And it seems that to solve the deception problem in AI you need to rule out Level NaN before you can rule out Level 2. If you want your AI to not lie to you, you need to make sure it communicates with you at all firsl.

[-]Vladimir_Nesov2y20

ASP illustrates how greedy consequentialism doesn't quite work. A variant of UDT where a decision chooses a legible policy that others get to know resolves some issues with not being considerate of making it easier for others to think about you. Ultimately choosing legible beliefs or meanings of actions is a coordination problem, it's not about one-sided optimization.

Logical uncertainty motivates treating things in the world (including yourself) separately from the consequences they determine, and interacting with things in terms of coordination rather than consequences. So I vaguely expect meaningful communication between agents and formulation of boundaries around agents and ideas in the world to fall out of decision theory that fixes these issues with consequentialism, at least for the agents and ideas that persist.

[-]Tapatakt2y10

Yes, I agree I formulated it too CDT-like, now fixed. But I think the point stays.

[-]Vladimir_Nesov2y20

Most variants of UDT also suffer from this issue by engaging in commitment racing instead of letting the rest of the world take its turns concurrently, in coordination between shaping and anticipating agent's intention. So the clue I'm gesturing at is more about consequentialism vs. coordination rather than about causal vs. logical consequences.

I think for LLMs the boundaries of human ideas are strong enough in the training corpus for post-training to easily elicit them, and decision theoretic consequences of deep change in the longer term might still maintain them as long as humans remain at all.

[-]Tapatakt26d30

I just discovered that I apparently independently invented already existing rephrasing of smoking lesion problem with toxoplasmosis. This is funny.

[-]Tapatakt1y3-4

Isn't the whole point of Petrov day kinda "thou shall not press the red button"?

[-]Tapatakt1y10

I mean, if I don't want to "launch the nukes", why would I even opt-in?

[-]Tapatakt23d20

I would like to have an option to sort comments to posts by "top scoring", but comments in shortforms by "newest". (totally not critical, just datapoint)

[-]Tapatakt1mo20

Okay, now with more cherrypicked example (from here, chapter 10, "Won’t AI differ from all the historical precedents?"):

If you study an immature AI in depth, manage to decode its mind entirely, develop a great theory of how it works that you validate on a bunch of examples, and use that theory to predict how the AI’s mind will change as it ascends to superintelligence and gains (for the first time) the very real option of grabbing the world for itself — even then you are, fundamentally, using a new and untested scientific theory to predict the results of an experiment that has not yet run, about what the AI will do when it really, actually, for real has the opportunity to grab power from the humans.

OMG, it was one sentence.

My version (well, my actual version is in Russian, here how the same changes would look in English):

Imagine: you study an immature AI in depth. You manage to decode its mind entirely. You develop a great theory of how it works. You validate this theory on a bunch of examples. You use that theory to predict how the AI’s mind will change as it ascends to superintelligence and gains (for the first time) the very real option of grabbing the world for itself. Even then you are, fundamentally, using a new and untested scientific theory to predict the results of an experiment that has not yet run, about what the AI will do when it really, actually, for real has the opportunity to grab power from the humans.

(I don't know, do English-speaking people dislike repetition of "you" that much? In Russian most of "you" can be omitted.)

[-]npostavs1mo10

I think most of "you" can be omitted in English as well:

Imagine: you study an immature AI in depth. Decode its mind entirely. Develop a great theory of how it works. Validate this theory on a bunch of examples. Use that theory to predict how the AI’s mind will change as it ascends to superintelligence and gains (for the first time) the very real option of grabbing the world for itself. Even then, you are, fundamentally, using a new and untested scientific theory to predict the results of an experiment that has not yet run, about what the AI will do when it really, actually, for real has the opportunity to grab power from the humans.

[-]Tapatakt1mo31

Wouldn't it be interpreted as imperative?

[-]npostavs1mo10

Seems understandable to me (although I guess I'm somewhat primed by reading the previous versions).

[-]Tapatakt1mo2-2

Does anyone pushing writing for raising awareness about AI risks to be more simple?

Not inferential-distance-simple, but stylistically-simple.

I translate online materials for IABIED into Russian. It has sentences like this:

The wonder of natural selection is not its robust error-correction covering every pathway that might go wrong; now that we’re dying less often to starvation and injury, most of modern medicine is treating pieces of human biology that randomly blow up in the absence of external trauma.

This is not cherrypicked at all. It's from the last page I translated. And I translated this sentence with three sentences. And quick LLM-check confirmed that English is actually less tolerant to overly long sentences than Russian.

I think this is bad. I hope it's better in the book (my copy hasn't reached me yet) and online materials are like this because they are poorly edited bonus. But I have a feeling that a lot of the writing on AI safety has the same problem.

[-]Viliam1mo20

Who is the target audience? If general population, it is bad. If educated people who identify as "I am very smart", it is good.

[-]Tapatakt2y21

Lesswrong reactions system creates the same bias as normal reactions - it's much much easier to use the reaction someone already used. So the first person to use a reaction under a comment gets undue influence on what reactions there will be under that comment in the future.

[-]Dagon2y20

I might say it fails to avoid that bias, rather than creating it. Personally, I think it carries enough more information than votes that it's worth having it. In fact, I'd probably remove the agree/disagree and just fold it into reacts.

You could probably reduce the bias a little by putting "suggested reacts" in the same line, with a 0 next to them, so they can just be clicked rather than needing to discover and click. At the expense of clutter and not seeing the ACTUAL reacts as easily.

[-]Tapatakt2y16

What if just turn off the possibility to use the reaction by clicking it in the list of already used reactions? Yes, people would use them less, but more deliberately.

[-]Tapatakt1y10

Epistemic state: thoughts off the top of my head, not the economist at all, talked with Claude about it

Why is there almost nowhere a small (something like 1%) universal tax on digital money transfers? It looks like a good idea to me:

it's very predictable
no one except banks has to do any paperwork
it's kinda progressive, if you are poor you can use cash

I see probable negative effects... but doesn't VAT and individial income tax just already have the same effects, so if this tax replace [parts of] those nothing will change much?

Also, as I understand, it would discourage high-frequency trading. I'm not sure if this would be a feature or a bug, but my current very superficial understanding leans towards the former.

Why is it a bad idea?

[-]Dagon1y31

It's too much for some transactions, and too little for others. For high-frequency (or mid-frequency) trading, 1% of the transaction is 3 or 4 times the expected value from the trade. For high-margin sales (yachts or software), 1% doesn't bring in enough revenue to be worth bothering (this probably doesn't matter unless the transaction tax REPLACES other taxes rather than being in addition to).

It also interferes with business organization - it encourages companies to do things in-house rather than outsourcing or partnering, since inside-company "transactions" aren't real money and aren't taxed.

It's not a bad idea per se, it just needs as many adjustments and carveouts as any other tax, so it ends up as politically complicated as any other tax and doesn't actually help with anything.

[-]Tapatakt1y10

For high-frequency (or mid-frequency) trading, 1% of the transaction is 3 or 4 times the expected value from the trade.

I'm very much not sure discouraging HFT is a bad thing.

this probably doesn't matter unless the transaction tax REPLACES other taxes rather than being in addition to

I imagine that it would replace/reduce some of the other taxes so the government would get the same amount of money.

it encourages companies to do things in-house rather than outsourcing or partnering, since inside-company "transactions" aren't real money and aren't taxed

But normal taxes have the same effect, don't they?

[-]Dagon1y20

I'm very much not sure discouraging HFT is a bad thing.

It's not just the "bad" HFT. It's any very-low-margin activity.

But normal taxes have the same effect, don't they?

Nope, normal taxes scale with profit, not with transaction size.

[-]Shankar Sivarajan1y30

One consideration is the government wouldn't want to encourage (harder-to-tax) cash transactions.

[-]Tapatakt1y10

First (possibly dumb) thought: could it be compensated by printing fewer large bills? Again, poor people would not care, but big business transactions with cash would become less convenient.

[-]Shankar Sivarajan1y20

I don't understand the problem you're trying to solve.

If you just like the aesthetic of cash transactions and want to see more of them, you could just mandate all brick-and-mortar retail stores only accept cash.

If you want to save people the hassle of doing tax paperwork, and offload that to banks, that's also easy: just mandate that banks offer for free the service of filing taxes for all their customers. If you have accounts with multiple banks, they can coördinate.

If you want to stop high-frequency trading, just ban it.

[-]Archimedes1y20

It already happens indirectly. Most digital money transfers are things like credit card transactions. For these, the credit card company takes a percentage fee and pays the government tax on its profit.

[-]Tapatakt1y10

Wow, really? I guess it's American thing. I think I know only one person with the credit card. And she only uses it up to the interest-free limit to "farm" her reputation with the bank in case she really needs a loan, so she doesn't actually pay the fee.

[-]Archimedes1y21

The customer doesn't pay the fee directly. The vendor pays the fee (and passes the cost to the customer via price). Sometimes vendors offer a cash discount because of this fee.

[-]Tapatakt2y10

Are there any research about how can we change network structure or protocols to make it more difficult for rogue AI to create and run a distributed copies of itself?

Moderation Log