LESSWRONG
LW

All of Tapatakt's Comments + Replies

Do you also prefer to not pay in Counterfactual Mugging?

2Martín Soto23d

Depends on the complexity of the logical coin. Certainly not for 1+1=2. But probably yes for appropriately complex statements. This is due to strong immediate identification with "my immediately past self who didn't yet know the truth value", and an understanding that "he (my past self) cannot literally rewrite my brain at will to ensure this behavior holds, but it's understood that I will play along to some extent to satisfy his vision (otherwise he would have to invest more in binding my behavior, which sounds like a waste)". (Of course, I need some kind of proof that the statement has been chosen non-adversarially, and I'm not yet sure that is possible)

Caleb Biddulph's Shortform

Tapatakt1mo60

Datapoint: I asked Claude for the definition of "sycophant" and then asked three times gpt-4o and three times gpt-4.1 with temperature 1:

"A person who seeks favor or advancement by flattering and excessively praising those in positions of power or authority, often in an insincere manner. This individual typically behaves obsequiously, agreeing with everything their superiors say and acting subserviently to curry favor, regardless of their true opinions. Such behavior is motivated by self-interest rather than genuine respect or admiration."
What word i

... (read more)

Claude 4

Tapatakt2mo3629

Poor Zvi

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Tapatakt2mo235

Are there any plans for Russian translation? If not, I'm interested in creating it (or even in organizing a truly professional translation, if someone gives me money for it).

8Rob Bensinger2mo

There's a professional Russian translator lined up for the book already, though we may need volunteer help with translating the online supplements. I'll keep you (and others who have offered) in mind for that -- thanks, Tapatakt. :)

Q Home's Shortform

Tapatakt2mo114

If crypto you choose meets definition of digital currency, you need to tread carefully.

While it's all about small sums, not really. Russian laws can be oppressive, but Russian... economic vibes... while you are poor enough, are actually pretty libertarian.

LDT (and everything else) can be irrational

Tapatakt2mo80

Against $9 rock, X always chooses $1. Consider the problem "symmetrical ultimatum game against X". By symmetry, X on average can get at most $5. But $9 rock always gets $9. So $9 rock is more rational than X.

I don't like the implied requirement "to be rational you must play at least as good as the opponent" instead of "to be rational you must play at least as good as any other agent in your place". $9 rock gets $0 if it plays against $9 rock.

(No objection to overall no-free-lunch conclusion, though)

1Christopher King2mo

In the first case, the problem is "symmetrical ultimatum game against X", in which $9 rock does get $9. In the second case you are correct, in the problem "symmetrical ultimatum game against $9 rock" $9 rock gets $0.

Weird Random Newcomb Problem

Tapatakt3mo10

(Or maybe the right way to think about this is: it will have a tiny but non-zero effect, because you are one of the |P| programs, but since |P| is huge, that is ~0.)

No effect. I meant that programmer has to write $b$ from $P$ , not that $b$ is added to $P$ . Probably I should change the phrasing to make it clearer.

But the intuition that you were expressing in Question 2 ("p2 is better than p1 because it scores better") isn't compatible with "caring equally about all programs". Instead, it sounds as if you positively want to score better

... (read more)

2Martín Soto3mo

I meant that it sounded like you "wanted a better average score (over as) when you are randomly sampled as b than other programs". Although again I think the intuition-pumping is misleading here because the programmer is choosing which b to fix, but not which a to fix. So whether you wanna one-box only depends on whether you condition on a = b.

Weird Random Newcomb Problem

Tapatakt3mo21

As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P.

O-ops, I didn't think about it, thanks! Maybe it would be better to change it so input is "a=b" or "a!=b", and $a$ always gets "a=b".

That aside, why are you assuming that program b "wants" anything? Essentially all of P won't be programs that have any sort of "want". If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do "want" money from this scenario? Note that being

... (read more)

2JBlack3mo

I'll try to make it clearer: Suppose b "knows" that Omega runs this experiment for all programs b. Then the optimal behaviour for a competent b (by a ridiculously small margin) is to 1-box. Suppose b suspects that box-choosing programs are slightly less likely to be run if they 1-box on equal inputs. Then the optimal behaviour for b is to 2-box, because the average extra payoff for 1-boxing on equal inputs is utterly insignificant while the average penalty for not being chosen to run is very much greater. Anything that affects probability of being run as box-chooser with probability greater than 1000/|P| (which is on the order of 1/10^10^10^10^100) matters far more than what the program actually does. In the original Newcombe problem, you know that you are going to get money based on your decision. In this problem, a running program does not know this. It doesn't know whether it's a or b or both, and every method for selecting a box-chooser is a different problem with different optimal strategies.

Weird Random Newcomb Problem

Tapatakt3mo10

Basically you know if Omega's program is the same as you or not (assuming you actually are $b$ and not $a$ )

Weird Random Newcomb Problem

Tapatakt3mo10

I don't think "functional" and "anthropic" approaches are meaningful in this motivating example. There aren't multiple instances of the same program with the same input.

Weird Random Newcomb Problem

Tapatakt3mo20

Do you mean to ask how b should behave on input (n(b), n(b)), and how b should be written to behave on input (n(b), n(b)) for that b?

Yes. You can assume that programmer doesn't know how $n$ works.

2Vladimir_Nesov3mo

Not knowing n(-) results in not knowing expected utility of b (for any given b), because you won't know how the terms a(n(a), n(a)) are formed. (And also the whole being given numeric codes of programs as arguments thing gets weird when you are postulated to be unable to interpret what the codes mean. The point of Newcomblike problems is that you get to reason about behavior of specific agents.)

Weird Random Newcomb Problem

Tapatakt3mo10

Yes, that's basically the same as what I mean by "Universal precommitment" framing. Weidness is in the fact that usually (I think, in all other decision-theoretic problems I ever encountered) "functional" and "anthropic" framings point in the same direction, but here they are not.

3cousin_it3mo

Wei's motivating example for UDT1.1 is exactly that. It is indeed weird that Eliezer's FDT paper doesn't use the idea of optimizing over input-output maps, despite coming out later. But anyway, "folklore" (which is slowly being forgotten it seems) does know the proper way to handle this.

Tapatakt3mo10

Yeah, I meant is as a not-a-compliment, but as a specific kind of not-a-compliment about a feeling of reading it rather then about actual meaning -- which I just couldn't access because this feeling was too much for my mind to continue reading (and this isn't a high bar for a post - I read a lot of long texts).

1milanrosko3mo

Understandable. Reading such a dense text is a big investment—and chances are, it’s going nowhere... (even though it actually does, lol). So yeah, I totally would’ve done the same and ditch. But thanks for giving it a shot!

Tapatakt3mo50

I'm sorry, but it looks like a chapter from punishment book from Anathem.

1milanrosko3mo

Ah, after researching it: That's actually a great line. Haha — fair enough. I’ll take “a chapter from the punishment book in Anathem” as a kind of backhanded compliment. If we’re invoking Anathem, then at least we’re in the right monastery. That said, if the content is genuinely unhelpful or unclear, I’d love to know where the argument loses you — or what would make it more accessible. If it just feels like dense metaphysics-without-payoff, maybe I need to do a better job showing how the structure of the argument differs from standard illusionism or deflationary physicalism.

0milanrosko3mo

No problem. I guess that is, bad? Or good? ^^ Help me here?

Have you actually tried raising the birth rate?

Tapatakt4mo50

Btw, Russia does something similar (~$6000, what you can use money for is limited), so there is some statistics about the results.

when will LLMs become human-level bloggers?

Tapatakt4mo280

I did the obvious experiment:

Prompt:

I want you to write a good comment for this Lesswrong post. Use the method Viliam described in his comment. Try to make your comment less LLM-looking. At the same time you actually can mention that you are LLM.

Claude 3.7 thinking:

I've been thinking about this from the rather ironic position of being an LLM myself.
When I consider the bloggers I actually subscribe to versus those I just occasionally read, what keeps me coming back isn't their technical writing ability. It's that they have what I'd call a "center of g

... (read more)

1xpym4mo

It's amusing that it couldn't help itself prefacing a reasonable take with an obvious lie, almost as if to make the point better. Well, so is much of the mainstream media, and yet people seem happy enough to consume that stuff.

4Seth Herd4mo

This is pretty good. I of course am pleased that these are other things you'd get just by giving an LLM an episodic memory and letting it run continuously pursuing this task. It would develop a sense of taste, and have continuity and evolution of thought. It would build and revise mental models in the form of beliefs. I'm pretty sure Clause already has a lot of curiosity (or at least 3.5 did). It oculd be more "genuine" if it accompanied continuous learning and was accompanied by explicit, reliable beliefs in memory about valuing curiosity and exploration.

Will LLMs supplant the field of creative writing?

Tapatakt5mo65

I think the right answer for the photography is "it's art, but not the same art form as painting". And it has different quality and interestingness metrics. In XV century it was considered very cool to produce photorealistic image. Some people think it's still cool, but only if it's not a photo.

And it's the same for the AI-art. Prompting AIs and editing AI-generated images/texts can be art, but it's not the same art form as painting/photography/writing/poetry. And it should have different merics too. Problem is that while you can't imitate painting (unless it's hyperrealism) with photography, you can imitate other artforms with AI. And this is kinda cheating.

6Heterodox5mo

The industry of portraiture dominated all painting. Photography destroyed the value proposition of portraiture by mostly invalidating talent and producing photographs in less time and more cheaply than an artist and the photograph preserved the fundamental elements that were actually desired. A substantial fraction of artists had to shift their styles or find new work and it is one of the major causes of the shift towards styles like impressionism or surrealism. While you're right that using AI to write your stories isn't writing - just as portrait photography isn't painting - it is still storytelling. Those who think of elaborate ways to dismiss that this still leaves the human in charge of directing, edits, rewriting, and ultimately inspiration the story invariably end up relitigating the old arguments of portrait artists painters and portrait photographers. The focus of both overlap, but they're also different. The photographer has additional concerns that the painter does not have and the same is true in reverse. This applies to literature and especially storytelling whether written by a human using a word processor, or by the long iterative process of outlining, prompting, editing, modifying, and pasting together a story. In the end I'm certain that people will accept "AI-Assisted Writing Storytelling" and other similar artistic endeavors because they still require human input and more importantly preserve the fundamental elements that people actually desire. Another user mentioned they'd consider this something akin to making collages and that's a fair comparison, but just look up a collage on Wikipedia and it will say: Collage is a technique of art creation, primarily used in the visual arts. Make a new word for creating stories out of artificially generated user directed word collections into well organized and edited collages that can be wholly unique and frequently indistinguishable from human writing if you want - you're right that it isn't the same a

No one has the ball on 1500 Russian olympiad winners who've received HPMOR

Tapatakt6mo40

I tried to get a grant to write one, but it was rejected.

Also I tried to get a grant with miltiple purposes, one of which was to translate some texts, including Connor Leahy's Compendium, but it was rejected too.

Review: Planecrash

Tapatakt6mo11

the utilities of both parties might be utterly incomparable, or the effort of both players might be very different

IIRC, it was covered in Planecrash also!

Poll: what’s your impression of altruism?

Answer by TapataktNov 09, 202440

Sometimes altruism is truly selfless (if we don't use too broad tautological definition of self-interest).

Sometimes altruism is actually an enlightened/disguised/corrupted/decadent self-interest.

I feel like there is some sense in which first kind is better then second, but can we have more of whatever kind, please?

Tapatakt's Shortform

Tapatakt8mo10

For high-frequency (or mid-frequency) trading, 1% of the transaction is 3 or 4 times the expected value from the trade.

I'm very much not sure discouraging HFT is a bad thing.

this probably doesn't matter unless the transaction tax REPLACES other taxes rather than being in addition to

I imagine that it would replace/reduce some of the other taxes so the government would get the same amount of money.

it encourages companies to do things in-house rather than outsourcing or partnering, since inside-company "transactions" aren't real money and aren't taxed

But normal taxes have the same effect, don't they?

2Dagon8mo

It's not just the "bad" HFT. It's any very-low-margin activity. Nope, normal taxes scale with profit, not with transaction size.

Tapatakt's Shortform

Tapatakt8mo81

I came up with the decision theory problem. It has the same moral as xor-blackmail, but I think it's much easier to understand:

Omega has chosen you for an experiment:

First, Omega predicts your choice in a potential future offer.
Omega rolls a die. Omega doesn't show you the result.
If Omega predicted you would choose $200, they will only make you an offer if the die shows 6.
If Omega predicted you would choose $100, they will make you an offer if the die shows any number except 1.
Omega's offer, if made, is simple: "Would you like $100 or $200?"

You received an... (read more)

1Terence Coelho8mo

This was easier for me to understand (but everything is easier to understand it when you see it a second time, phrased in a different way).

Tapatakt's Shortform

Tapatakt8mo10

First (possibly dumb) thought: could it be compensated by printing fewer large bills? Again, poor people would not care, but big business transactions with cash would become less convenient.

2Shankar Sivarajan8mo

I don't understand the problem you're trying to solve. If you just like the aesthetic of cash transactions and want to see more of them, you could just mandate all brick-and-mortar retail stores only accept cash. If you want to save people the hassle of doing tax paperwork, and offload that to banks, that's also easy: just mandate that banks offer for free the service of filing taxes for all their customers. If you have accounts with multiple banks, they can coördinate. If you want to stop high-frequency trading, just ban it.

Tapatakt's Shortform

Tapatakt8mo10

Wow, really? I guess it's American thing. I think I know only one person with the credit card. And she only uses it up to the interest-free limit to "farm" her reputation with the bank in case she really needs a loan, so she doesn't actually pay the fee.

2Archimedes8mo

The customer doesn't pay the fee directly. The vendor pays the fee (and passes the cost to the customer via price). Sometimes vendors offer a cash discount because of this fee.

Tapatakt's Shortform

Tapatakt8mo10

Epistemic state: thoughts off the top of my head, not the economist at all, talked with Claude about it

Why is there almost nowhere a small (something like 1%) universal tax on digital money transfers? It looks like a good idea to me:

it's very predictable
no one except banks has to do any paperwork
it's kinda progressive, if you are poor you can use cash

I see probable negative effects... but doesn't VAT and individial income tax just already have the same effects, so if this tax replace [parts of] those nothing will change much?

Also, as I understand, it would... (read more)

3Dagon8mo

It's too much for some transactions, and too little for others. For high-frequency (or mid-frequency) trading, 1% of the transaction is 3 or 4 times the expected value from the trade. For high-margin sales (yachts or software), 1% doesn't bring in enough revenue to be worth bothering (this probably doesn't matter unless the transaction tax REPLACES other taxes rather than being in addition to). It also interferes with business organization - it encourages companies to do things in-house rather than outsourcing or partnering, since inside-company "transactions" aren't real money and aren't taxed. It's not a bad idea per se, it just needs as many adjustments and carveouts as any other tax, so it ends up as politically complicated as any other tax and doesn't actually help with anything.

2Archimedes8mo

It already happens indirectly. Most digital money transfers are things like credit card transactions. For these, the credit card company takes a percentage fee and pays the government tax on its profit.

3Shankar Sivarajan8mo

One consideration is the government wouldn't want to encourage (harder-to-tax) cash transactions.

What's a good book for a technically-minded 11-year old?

Answer by TapataktNov 05, 202410

Only one mention of Jules Verne in answers seems weird to me.

First and foremost, "The Mysterious Island". (But maybe it has already been read at nine?)

Survival without dignity

Tapatakt8mo40

I guess the big problem for someone who tries to do it not in small form is that while you write the story it is already getting old. There are writers who can write a novel in a season, but not many. At least if we talk about good writers. Hm-m-m, did rationalists try to hire Stephen King? :)

4L Rudolf L8mo

Do the stories get old? If it's trying to be about near-future AI, maybe the state-of-the-art will just obsolete it. But that won't make it bad necessarily, and there are many other settings than 2026. If it's about radical futures with Dyson spheres or whatever, that seems like at least a 2030s thing, and you can easily write a novel before then. Also, I think it is actually possible to write pretty fast. 2k/day is doable, which gets you a good length novel in 50 days; even x3 for ideation beforehand and revising after the first draft only gets you to 150 days. You'd have to be good at fiction beforehand, and have existing concepts to draw on in your head though

Survival without dignity

Tapatakt8mo1817

I often think something like "It's a shame there's so little modern science fiction that takes AI developments into account". Thanks for doing something in this niche, even if in such small form!

8L Rudolf L8mo

Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions). I like when people try to do this - for example, Richard Ngo has a bunch here, and Daniel Kokotajlo has his near-prophetic scenario here. I've previously tried to do it here (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to. But overall: the LessWrong bubble has a high emphasis on radical AI futures, and an enormous amount of fiction in its canon (HPMOR, Unsong, Planecrash). I keep being surprised that so few people combine those things.

I turned decision theory problems into memes about trolleys

Tapatakt8mo20

I always understood it as "not pull -> trolley does not turn; pull -> trolley turns". It definitely works like this on the original picture.

I turned decision theory problems into memes about trolleys

Tapatakt8mo20

I really like it! One remark, though: two upper tracks must be swapped, otherwise it's possible to precommit by staying in place and not running to the lever.

1Warty8mo

wait is the lever position meaningful like that? I used lever direction = where the train go, cause it seemed intuitive.

I turned decision theory problems into memes about trolleys

Tapatakt8mo11

The point is Omega would not send it to you it if it was false and Omega would always send it to you if it was true.

1Terence Coelho8mo

Oh I missed the quotations; you're right

I turned decision theory problems into memes about trolleys

Tapatakt8mo20

O-ops, you're absolutely right, I accidentally missed "not" when I was rewriting the text. Fixed now. Thank you!

I turned decision theory problems into memes about trolleys

Tapatakt8mo10

"if and only if this message is true"

I turned decision theory problems into memes about trolleys

Tapatakt8mo50

Death in Damascus is easy, but boring.

Doomsday Argument is not a decision theory problem... but it can be turned into one... I think the trolley version would be too big and complicated, though.

Obviously, only problems with discrete choice can be expressed as a Trolley problems.

What are the best arguments for/against AIs being "slightly 'nice'"?

Tapatakt9mo1-1

One could claim that "the true spirit of friendship" is loving someone unconditionally or something, and that might be simple, but I don't think that's what humans actually implement.

Yeah, I agree that humans implement something more complex. But it is what we want AI to implement, isn't it? And it looks like may be quite natural abstraction to have.

(But again, it's useless while we don't know how to direct AI to the specific abstraction.)

2Dweomite9mo

Then we're no longer talking about "the way humans care about their friends", we're inventing new hypothetical algorithms that we might like our AIs to use. Humans no longer provide an example of how that behavior could arise naturally in an evolved organism, nor a case study of how it works out for people to behave that way.

What are the best arguments for/against AIs being "slightly 'nice'"?

Tapatakt9mo30

I hope "the way humans care about their friends" is another natural abstraction, something like "my utility function includes link to yours utility function". But we still don't know how to direct AI to the specific abstraction, so it's not a big hope.

2Dweomite9mo

My model is that friendship is one particular strategy for alliance-formation that happened to evolve in humans. I expect this is natural in the sense of being a local optimum (in the ancestral environment), but probably not in the sense of being simple to formally define or implement. I think friendship is substantially more complicated than "I care some about your utility function". For instance, you probably stop valuing their utility function if they betray you (friendship can "break"). I also think the friendship algorithm includes a bunch of signalling to help with coordination (so that you understand the other person is trying to be friends), and some less-pleasant stuff like evaluations of how valuable an ally the other person is and how the friendship will affect your social standing. Friendship also appears to include some sort of check that the other person is making friendship-related-decisions using system 1 instead of system 2--possibly as a security feature to make it harder for people to consciously exploit (with the unfortunate side-effect that we penalize system-2-thinkers even when they sincerely want to be allies), or possibly just because the signalling parts evolved for system 1 and don't generalize properly. (One could claim that "the true spirit of friendship" is loving someone unconditionally or something, and that might be simple, but I don't think that's what humans actually implement.)

[Completed] The 2024 Petrov Day Scenario

Tapatakt9mo51

I think making an opt-in link as a big red button and posting it before the rules were published caused a pre-selection of players in favor of those who would press the big red button. Which is... kinda realistic for generals, I think, but not realistic for citizens.

2rossry9mo

Think of it as your own little lesson in irreversible consequences of appealing actions, maybe? Rather than a fully-realistic element.

Tapatakt's Shortform

Tapatakt9mo10

I mean, if I don't want to "launch the nukes", why would I even opt-in?

Tapatakt's Shortform

Tapatakt9mo3-4

Isn't the whole point of Petrov day kinda "thou shall not press the red button"?

1Tapatakt9mo

I mean, if I don't want to "launch the nukes", why would I even opt-in?

Tapatakt's Shortform

Tapatakt9mo109

I don't think "We created a platform that lets you make digital minds feel bad and in the trailer we show you that you can do it, but we are in no way morally responsible if you will actually do it" is a defensible position. Anyway, they don't use this argument, only one about digital substrate.

6Richard_Kennaway9mo

The trailer is designed to draw prospective players' attention to the issue, no more than that. If you "don't think current models are sentient", and hence are not actually feeling bad, then I don't see a reason for having a problem here, in the current state of the game. If they manage to produce this game and keep upgrading it with the latest AI methods, when will you know if there is a problem? I do not have an answer to that question.

Tapatakt's Shortform

Tapatakt9mo206

The Seventh Sally or How Trurl's Own Perfection Led to No Good

Thanks to IC Rainbow and Taisia Sharapova who brought this matter in MiriY Telegram chat.

What. The. Hell.

In their logo they have:

They Think. They Feel. They're Alive

And the title of the video on the same page is:

AI People Alpha Launch: AI NPCs Beg Players Not to Turn Off the Game

And in the FAQ they wrote:

The NPCs in AI People are indeed advanced and designed to emulate thinking, feeling, a sense of aliveness, and even reactions that might resemble pain. However, it's essential to understand that

... (read more)

3green_leaf9mo

Ideally, AI characters would get rights as soon as they could pass the Turing test. In the actual reality, we all know how well that will go.

9Richard_Kennaway9mo

I don't think the trailer is saying that. It's just showing people examples of what you can do, and what the NPCs can do. Then it's up to the player to decide how to treat the NPCs. AIpeople is creating the platform. The users will decide whether to make Torment Nexi. At the end of the trailer, the NPCs are conspiring to escape the simulation. I wonder how that is going to be implemented in game terms. I notice that there also exists a cryptocoin called AIPEOPLE, and a Russian startup based in Cyprus with the domain aipeople dot ru. I do not know if these have anything to do with the AIpeople game. The game itself is made by Keen Software House. They are based in Prague together with their sister company GoodAI.

Tapatakt's Shortform

Tapatakt10mo70

About possible backlashes from unsuccesfull communication.

I hoped for some examples like "anti-war movies have unintentionally boosted military recruitment", which is the only example I remembered myself.

Asked the same question to Claude, it gave me this examples:

Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The "Just Say No" anti-drug campaign: While well-intentioned, some research suggests this oversimplified

... (read more)

2Viliam9mo

A similar concern is that maybe the thing is so rare that previously most people didn't even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason. Infohazard: Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence. How does this apply to AI safety? If you say "if you do X, you might destroy humanity", in theory someone is guaranteed to do X or something similar to X, either because they think it is "edgy", or because they want to prove you wrong. But in practice, most people don't actually have an opportunity to do X.

Tapatakt's Shortform

Tapatakt10mo160

I want to create a new content about AI Safety for Russian speakers. I was warned about possible backlash if I do it wrong.

What are the actual examples when bad oversimplified communication harmed the case it agitated for? Whose mistakes can I learn from?

4Viliam10mo

I think if the English original is considered good, there should be nothing wrong with a translation. So make sure you translate good texts. (If you are writing your own text, write English version first and ask for feedback.) Also, get ready for disappointment if it turns out that the overlap between "can meaningfully debate AI safety" and "has problems reading English" turns out to be very small, possibly zero. To give you a similar example, I have translated the LW Sequences to Slovak language, some people shared it on social networks, and the ultimate result was... nothing. The handful of Slovak people who came to at least one LW meetup all found the rationalist community on internet, and didn't read my translation. This is not an argument against translating per se. I had much greater success at localizing software. It's just, when the target audience is very smart people, then... smart people usually know they should learn English. (A possible exception could be writing for smart kids.)

4Canaletto10mo

Not to be dissuading, but probably a lot of people who can do relevant work know English pretty well anyway? Speaking from experience, I guess, most students knew English well enough and consumed English content when i was in university. Especially the most productive ones. So, this still can be interesting project, but not like, very important and/or worth your time.

"Deception Genre" What Books are like Project Lawful?

Tapatakt10mo20

My opinion, very briefly:

Good stuff:

Deception plotline
Demonstration of LDT in action
A lot of thought processes of smart characters
Convincing depictions of how people with very weird and evil ideology can have at least seemingly consistent worldview, be humans and not be completely insane.

Stuff that might be good for some and bad for others:

It's Yudkowsky's work and it feels. Some people like the style of his texts, some don't.
Sex scenes (not very erotic and mostly talking)
Re-construction of Pathfinder game mechanics in setting
Math classes (not the best pos

... (read more)

2Said Achmiz10mo

(Done poorly)

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

Tapatakt10mo21

Well, it's quite good random crime procedural with very likable main characters, but yes, in the first season AI plotline is very slow until last 2 episodes. And then it's slow again for the most part.

Tapatakt's Shortform

Tapatakt11mo162

Did anyone try something like this?

Create a conlang with very simple grammar and small vocabulary (not like tokipona small, more like xkcd-thing-explainer small).
Use LLMs to translate a lot of texts into this conlang.
Train new LLM on this translations.
Try to research interpretability on this LLM.

4Viliam11mo

There is a Simple English Wikipedia with over 200 000 articles, which is not exactly what you want, but seems to be a thing that already exists and is somewhat in that direction.

2Nathan Helm-Burger11mo

I agree that this sounds interesting and that I haven't heard of anyone doing this yet. I have heard of some interpretability experiments with TinyStories, as Zac mentioned. I think the more interesting thing would be a dataset focused on being enriched with synthetic data showing inherently logical things like deductive symbolic logic and math problems worked out (correctly!) step-by-step. You could have a dataset of this, plus the simplified-language versions of middle school through undergrad science textbooks. I expect the result would likely be more logical, and cohesive. It would be interesting to see if this made the model fundamentally more interpretable.

4Zac Hatfield-Dodds11mo

I don't recall any interpretability experiments with TinyStories offhand, but I'd be surprised if there aren't any.

Tapatakt's Shortform

Tapatakt11mo160

A random thought on how to explain instrumental convergence:

You can teach someone the basics of, say, Sid Meier's Civilization V for a quite long time without explaining what the victory conditions are. There are many possible alternative victory conditions that would not change the development strategies much.

The Conscious River: Conscious Turing machines negate materialism

Tapatakt11mo85

If consciousness arises from matter, then for a stream of consciousness to exist, at the very least, the same atoms should be temporarily involved in the flowing of the river

Why? I see no problem with the consciousness that constantly changes what atoms it is built on.

This modification to the river seems to suggest that there is no such thing as a "stream of consciousness," but rather only "moments of consciousness" that have the illusion of being a stream because they can recall memories of previous moments.

Well, OK? Doesn't seem weird to me.

1blallo11mo

yes, those are computationalists views. Computationalism is pretty much self consistent since it says that any materialized computation can be conscious, and very similar to illusionism.

Open Thread Summer 2024

Tapatakt11mo20

Gretta Duleba is MIRI's Communication Manager. I think she is the person you should ask who write to.

Open Thread Summer 2024

Tapatakt11mo10

Everyone who is trying to create GAI is trying to create aligned GAI. But they think it will be easy (in the sense "not very super hard so they will probably fail and create misaligned one"), otherwise they wouldn't try in the first place. So, I think, you should not share your info with them.

1Crazy philosopher11mo

I understand. My question is, can I publish an article about this so that only MIRI guys can read it, or send in Eliezer e-mail, or something.