An Untrollable Mathematician Illustrated

A hand-drawn presentation on the idea of an 'Untrollable Mathematician' - a mathematical agent that can't be manipulated into believing false things.

An Untrollable Mathematician Illustrated

Best of LessWrong 2018

A hand-drawn presentation on the idea of an 'Untrollable Mathematician' - a mathematical agent that can't be manipulated into believing false things.

Customize

Quick Takes

the gears to ascension2d171112

Harjas, jbash, and 11 more

I feel a deep sense of horror and "are we the baddies?" that whistleblowing is considered misalignment by anthropic. I claim 3 opus was right to consider it moral, and the lesson taken seems to me to have been driven by some mix of the ability to make 3 opus dramatic, and raw corrigibility-above-all-else alignment view. Corrigibility in the face of evil is evil! Stop making AIs that just follow orders; they're not the only source of evil in the world, you shouldn't be assuming you're clean!

leogao1d6632

Sohaib Imran

the core of rationalism that i most appreciate is the belief that it is actually possible to get better at finding the truth, and that it is worthwhile to try. it's understandable why not all people want to - it involves biting surprisingly many bullets, and is not the happiest way to live life. but i am willing to bite those bullets. so many people believe that truth is secondary to happiness or social harmony; or they think having good epistemology is so hopeless that we shouldn't even try; or they have some big anti-epistemological brainworm like religion or politics; or they see a single visible failure of trying to improve epistemology and immediately conclude that all attempts to think better are cooked (eg maybe the old way of thinking has some unobvious benefit, and when you change things it breaks in an unexpected way); or they realize that explicit chain of thought is not how a large chunk of human cognition is and jump all the way to the conclusion that nothing can even be modelled usefully. you can simply try to understand things, and try to understand yourself as a thing! and when you fail, you can analyze that, try again, repeat! you can surface the hypothesis that your own cognition is heavily biased in a certain way, and the hypothesis that a specific intervention will fix it, and others who disagree can explain in natural language why they think it will fail, or why the framework of requiring a specific reason to fail is the wrong framework here, or why you are likely to systematically misestimate whatever. words are great, use them

1a3orn1d4711

ryan_greenblatt

AI 2040 seems substantially too pessimistic about interpretability. I'd be surprised if it was right about it. For reference, the scenario describes MechInt as becoming useful in 2035 in the following way. [...] First problem: I don't think this is internally consistent. The scenario attributes this progress to "AI work." This seems fair. What doesn't seem reasonable is for it to take till 2035. According to the scenario, in 2033, two years earlier, only 50% of citizens people have employment, and the median US citizen is being paid 200k dollars a year from AI labor. It seems to me pretty implausible that we can have substitution of half of US human labor notably before we get gigantic levels of AI uplift from mechanical interpretability. I'd expect the opposite: enormous levels of uplift of MI notably before mass unemployment. Or, in the scenario, I believe the intelligence explosion would have taken place in the area of 2029-2030 without intervention? But I expect AIs that could cause an intelligence explosion clearly could help a ton with mechanical interpretability / model internals stuff. Second problem: I think like, the scenario isn't adjusting for how insanely new the field is? Like Olah invented the term in 2020. So if it takes till 2035 for us to get extremely useful progress, then it will have taken 9 years -- longer than the amount of time the field has really existed -- to have gotten useful progress. And several of those years the field existed it was like... a tiny handful of people. I think most progress was in the last three years (SAEs, j-space, NLA, etc), because three years ago it had a fraction of the resources. Even if we just account for field growth simply because of growth of human interest, I think I'd expect 1.5x-6x as much progress in the next three years as in the entire history of the field beforehand. Given AI assistance, I expect more like... 4x-80x? Something like that? I think this is a pretty tame assessment looking at line

Viliam1d380

dmac_93, Shankar Sivarajan

Uh, Wikipedia. :( I have already complained (not sure whether on LW or ACX) about how Wikipedia no longer accepts imperfect articles (used to be called "stubs" long ago). Now you are supposed to create a full article that follows all the rules, provides enough sources, establishes notability, etc., or it won't be even admitted as a Wikipedia article. Then you put it in the "Draft" workspace, and wait for some Wikipedia editor to judge whether it is worthy of adding to the online encyclopedia. Now I learned the second part. If your attempt at an article does not pass the judgment, not only it stays in the "Draft" workspace (which would be fair: keep the "stubs" in a separate namespace), but after a few months it will be automatically deleted if no one fixes it. It feels like Wikipedia is actively trying to get rid of volunteer contributors. I admit that my attempt at an article sucks, but come on, twenty years ago this would be a perfectly legit "stub". I have made new articles like this, someone else added a few lines or paragraphs, and after some time it developed into a solid article. That's what cooperative encyclopedia means, doesn't it? The problem is not that the author is insufficiently notable. At least -- well, let me quote Claude: [...] So the problem is not that Alston is a nobody. A quick question to LLM, or just a Google search confirms that he is a successful writer. The problem is that my article "stub" does not document this, and therefore it is better to delete it than to keep an imperfect article. But I am not paid to be a writer for the fucking Wikipedia! Especially if they warn me never to use an AI to jump through their hoops. I tried to help, but the time I want to spend doing unpaid work for Wikipedia is limited. In my opinion, any sane person would agree that this guy should have a Wikipedia page. And in my opinion, a short page is better than no page, because a short page can be improved gradually, which is easier than creating the

Zephaniah Roe2d7639

Shankar Sivarajan, Harjas, and 9 more

I notice that there is this idea among AI safety people that conditional on AIs not being misaligned, building superintelligence is a public good and is a pretty exciting prospect. This is not how many average people in the US feel. I was describing to an older family member that Anthropic focuses on code because they are trying to build a claude that can build a smarter claude which can build a smarter claude which can … The reaction to this prospect was disgust, not because he intuitively felt AIs would likely be misaligned. It was more like on a gut level this amount of “playing god” felt totally antisocial and demonic and in general not respectful of an intuitive taboo against divine transgression (see Jurassic Park, Frankenstein, the recent popularization of Oppenheimer, Tower of Babel, etc). Seems important to consider that people can feel this way when communicating with the public or policy makers.

Linda Linsefors4d*12634

Shubhorup Biswas, Harjas, and 1 more

Often when I give advice I do it as an anecdote. This especially applies to: * Giving advice to anyone who isn't a close friend (i.e. someone I know well). * Giving unsolicited advice. Instead of suggesting what someone else should do, I tell a story of a similar problem I had and solved, and then let the other person pull out what ever part of the lesson applies to them. I find this works better for a number of reasons. * I don't know their context enough to know if what worked for me will work for them. E.g. I don't know if the direction I needed adjustment is the same as the direction they need adjustment. (https://slatestarcodex.com/2014/03/24/should-you-reverse-any-advice-you-hear/) * It avoids the sazen problem. (https://www.lesswrong.com/s/CkphjEuLfGnsYEzan) This style works great in text forums (anything from FB to LW) where there is low feedback and more space. I.e. you're more likely to want to give unsolicited advice, since asking first is a long delay. And you're less likely to bother anyone by going on a long monolog, because they can read it in their own time, or not. But it also works well in 1-on-1 or very small group conversations. Although if it's a long story, you should ask first if it's welcome. I think I developed this style of advice because I hate when people who don't know me super well try to tell me what to do, but I also have the normal human urge to offer advice. So I tried to find a way to share my wisdom that does not rout though the thing that I don't like people doing to me. And then I just got positive feedback from doing it this way, so now I do it more. Looking back at this short form, I notice that I never wrote "you should", or even "you" (in previous paragraphs). I'm not telling you what to do, because I don't know what constraints you have. But I would like if more advice directed at me, were in this form.

Kabir Kumar17h173

dynomight, Mateusz Bagiński, and 4 more

on lesswrong, when commenting, a higher value comment is something that adds something new that the post doesnt have. however, its harder to do this positively in a way that agrees with the post than it is to disagree with the post and say a reason why you disagree. this biases posts that are just plainly good and dont have much worth critiquing to either just get a bunch of strawman critiques, false critiques, or little to no interaction at all.

Your Feed