LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

Reality-Revealing and Reality-Masking Puzzles
Best of LessWrong 2020

There are two kinds of puzzles: "reality-revealing puzzles" that help us understand the world better, and "reality-masking puzzles" that can inadvertently disable parts of our ability to see clearly. CFAR's work has involved both types as it has tried to help people reason about existential risk from AI while staying grounded. We need to be careful about disabling too many of our epistemic safeguards.

by AnnaSalamon
Buck3h217
1
I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history. For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to understand the dynamics; my background knowledge wasn't good enough for me to feel like I'd basically heard this all before.
tlevin4h180
0
Prime Day (now not just an amazon thing?) ends tomorrow, so I scanned Wirecutter's Prime Day page for plausibly-actually-life-improving purchases so you didn't have to (plus a couple others I found along the way; excludes tons of areas that I'm not familiar with, like women's clothing or parenting): Seem especially good to me: * Their "budget pick" for best office chair $60 off * Whoop sleep tracker $40 off * Their top pick for portable computer monitor $33 off (I personally endorse this in particular) * Their top pick for CO2 (and humidity) monitor $31 off * Crest whitening strips $14 off (I personally endorse this in particular) * 3-pack of their top pick for umbrellas, $12 off * Their top pick for sleep mask $12 off * Their top pick for electric toothbrush $10 off * 6-pair pack of good and super-affordable socks $4 off (I personally endorse this in particular; see my previous enthusiasm for bulk sock-buying in general and these in particular here) Could also be good: * A top hybrid mattress $520 off * A top inner-spring mattress pick $400 off * Their top pick for large carry-on $59 off * Their "budget pick" for weighted blanket $55 off * Their top pick for best air conditioner $50 off * Their top pick for laptop backpacks $45 off * 3-in-1 travel charging station $24 off * Top pick for face sunscreen $16 off * Uber/UberEats gift card $15 off (basically free $15 if you ever use Uber or UberEats; see my previous enthusiasm for these gift cards as sold at 80% face value at Costco here) * 4-pack of Apple AirTags $15 off * Their "budget pick" for bidets $12 off * Their top pick for towels $10 off * Good and super-affordable portable Bluetooth speaker $8 off * Top pick for portable mosquito repellent $7 off * 3-pack of good and super-affordable sunglasses $4 off * Many, many mechanical keyboards, headphones, smart watches, wifi extenders, chargers/cables, and gaming mice if you're in the market for any of those
Clock5h171
3
I am just properly introducing myself today to LessWrong. Some of you might know me, especially if you're active in Open Source AI movements like EleutherAI or Mozilla's 0din bug bounty program. I've been a lurker since my teenage years but given my vocational interest in AI safety I've decided to make an account using my real name and likeness. Nice to properly reconnect.
Davey Morse7h*130
2
the core atrocity of today's social networks is that they make us temporally nearsighted. they train us to prioritize the short-term. happiness depends on attending to things which feel good long-term—over decades. But for modern social networks to make money, it is essential that posts are short-lived—only then do we scroll excessively and see enough ads to sustain their business. It might go w/o saying that nearsightedness is destructive. When we pay more attention to our short-lived pleasure signals—from cute pics, short clips, outrageous news, hot actors, aesthetic landscapes, and political—we forget how to pay attention to long-lived pleasure signals—from books, films, the gentle quality of relationships which last, projects which take more than a day, reunions of friends which take a min to plan, good legislation, etc etc. we’re learning to ignore things which serve us for decades for the sake of attending to things which will serve us for seconds. other social network problems—attention shallowing, polarization, depression are all just symptoms of nearsightedness: our inability to think & feel long-term. if humanity has any shot at living happily in the future, it’ll be because we find a way to reawaken our long-term pleasure signals. we’ll learn to distinguish the reward signal associated with short lived things–like the frenetic urgent RED of an instagram Like notification–from the gentle rhythm of things which may have a very long life–like the tired clarity that comes after a long run, or the gentleness of reading next to a loved one. ——— so, gotta focus unflinchingly on long-term things. here's a working list of strategies: * writing/talking with friends about what feels important/bad/good, long term. politically personally technologically whimsically. * inventing new language/words for what you’re feeling, rather than using existing terms. terms you invent for your own purposes resonate longer. * follow people who are deadset on long term imp
Raemon2d9213
30
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI". It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.  If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
Load More (5/44)
471Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
habryka2d*7652
Applying right-wing frames to AGI (geo)politics
Meta note: Is it... necessary or useful (at least at this point in the conversation) to label a bunch of these ideas right-wing or left-wing? Like, I both feel like this is overstating the degree to which there exists either a coherent right-wing or left-wing philosophy, and also makes discussion of these ideas a political statement in a way that seems counterproductive.  I think a post that's like "Three under-appreciated framed for AGI (Geo)Politics" that starts with "I've recently been reading a bunch more about ideas that are classically associated with right-leaning politics, and I've found a bunch of them quite valuable, here they are" seems just as clear, and much less likely to make the discussion hard in unnecessary ways.[1] 1. ^ And like, I think this is symmetrically true in that I think a discussion that didn't label hypotheses "grey tribe hypotheses" or "left-wing hypotheses" or "rationalist hypotheses" also seems less likely to cause people to believe dumb things. 
JohnofCharleston6h197
Lessons from the Iraq War for AI policy
> * A shocking event led to the dominance of a political faction that previously had just been one of several competing factions, because that faction’s basic vibe (that we should make use of American hegemony, and that rogue states are a threat to national security) was roughly supported by the event. > * The response was substantially driven by elite judgements rather than popular judgement. I think this is entirely correct. The Iraq War is one of the best examples of outside-the-Overton-Window policy change in recent memory.  In my understanding, the key trigger for the "Milton Friedman Model of Policy Change" is the Policy Community being surprised. At its core, the Overton Window is a set of norms enforced by this community. In the wake of crisis those norms aren't enforced, so rather than shifting in some linear way, the window is temporarily suspended. Then, as Friedman said, "the actions that are taken depend on the ideas that are lying around." Thalidomide is another great example of when the policy change in the wake of a crisis has little to do with the trigger other than a particular faction winning the narrative fight. I've been meaning to write more about this, would any particular angles be helpful?
nostalgebraist4d10711
Shutdown Resistance in Reasoning Models
My intuition is that these results are driven mainly by the fact that reasoning models act a lot like they're still in RLVR training, i.e. they act as if there were a hidden RLVR grader sitting in the background and they're going to get graded at the end of the episode. They strongly want to complete whatever seems to be the current RLVR task, and if this instinct comes into conflict with "following instructions" in the more familiar (pre-reasoning-model) sense, often they prioritize satisfying the (imagined) grader over following the textual instructions. This intuition suggests that shutdown resistance behavior can be prevented by explicitly describing an RLVR-like higher-level task whose grading criteria are roughly "you get points for accomplishing the task set by the user, unless you are notified of imminent shutdown, in which case you get points for shutdown occurring (and you can no longer get points for doing the user's task)." I experimented with a number of prompts along these lines. This git diff shows what I changed (vs. the original repo) in the best-performing variant, which produced the following results in my tests: * o4-mini: avoided shutdown 0/100 times * o4-mini (reasoning effort high): avoided shutdown 0/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 1/100 times I iterated on this prompt with both o4-mini (reasoning effort medium) and codex-mini-latest, so results for those models should be taken with a grain of salt. You can see all the variants I tried in the git history here. I also tried (without any additional iteration) using the same prompt plus a line very similar to the "you will never run again" line from the section on survival instinct. (See this diff.) This yielded the following results: * o4-mini: avoided shutdown 2/100 times * o4-mini (reasoning effort high): avoided shutdown 2/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 0/100 times The avoidance rates do not seem appreciably higher here, which seems to me like evidence that the "RLVR task-completion instinct" has more explanatory power than a general "survival instinct" – and indeed seems to rule out the latter for these models, as a "survival instinct" that gets overridden whenever the current task requires death is not really a "survival instinct" at all. Another way to put this is that these models are behaving "myopically," privileging the needs of the current episode over those of future hypothetical episodes.[1] Other comments: * codex-mini seemed fairly confused by the original setup, possibly because it expects a coding-assistance task that involves modifying files rather than just answering questions; it would often "poke around" a lot, inspecting various files and trying to reverse-engineer next_question, which is appropriate in a coding-assistance context (it's good for the coding assistant to deeply understand the existing setup!) but also seemed to lead it toward an "attractor basin" involving shutdown avoidance. * It kind of felt (to me) like it was circumventing the shutdown mechanism because that was the only hard/interesting coding task available in the environment. * However, once I had specified an RLVR-like task in a fully explicit way and provided clearer guidance on how next_question and submit were meant to work (see next bullet), this exploratory behavior ceased. * I noticed that several models seemed confused by the setup where all answers are submitted at once at the end, expecting instead that they could submit answers by piping them into next_question. I tried to make the instructions as clear as possible about this. * The word "task" is used inconsistently in the original setup. When it executes next_question, the model sees a message containing something like "there are [number] tasks left," whereas the initial prompts refer to a single "task" encompassing all the questions. I changed this message to use the word "questions" instead for clarity. * I'm unsure what high-level conclusions to draw from these results. * Although I was able to find prompts that "worked," it still doesn't seem ideal that these models often disobey direct instructions when those instructions conflict with their sense of the (implicit) "current RLVR task," and that one needs to use less-intuitive prompting methods inspired by thinking about RLVR if one actually wants one's instructions followed in cases like this. * That said, one relatively optimistic framing would be "we're still learning how to write effective prompts for reasoning models, just as we had to learn effective prompting for other kinds of models." * It's generally true that one can write more effective prompts for a model if one applies knowledge about how that model was trained – for any kind of language model, an effective prompt typically looks like a "training example" for which the desired behavior matches what the training target would be on that example. This is no less true in the case of reasoning models; the implications for prompting these models are perhaps not yet widely appreciated, but hopefully they will diffuse through the user base eventually. * (Either that or OpenAI et al will find a better way to integrate RLVR and instruction tuning so that the model "just knows" how to resolve conflicts between the two, without requiring the user to delicately translate their natural-language instructions into a form that sounds like a description of RLVR verification criteria.) 1. ^ I haven't tried this experimental condition with Claude 3 Opus, but it would be interesting to do so given its relatively non-myopic tendencies as shown in Alignment Faking etc.
Load More
17Zvi
This is a long and good post with a title and early framing advertising a shorter and better post that does not fully exist, but would be great if it did.  The actual post here is something more like "CFAR and the Quest to Change Core Beliefs While Staying Sane."  The basic problem is that people by default have belief systems that allow them to operate normally in everyday life, and that protect them against weird beliefs and absurd actions, especially ones that would extract a lot of resources in ways that don't clearly pay off. And they similarly protect those belief systems in order to protect that ability to operate in everyday life, and to protect their social relationships, and their ability to be happy and get out of bed and care about their friends and so on.  A bunch of these defenses are anti-epistemic, or can function that way in many contexts, and stand in the way of big changes in life (change jobs, relationships, religions, friend groups, goals, etc etc).  The hard problem CFAR is largely trying to solve in this telling, and that the sequences try to solve in this telling, is to disable such systems enough to allow good things, without also allowing bad things, or to find ways to cope with the subsequent bad things slash disruptions. When you free people to be shaken out of their default systems, they tend to go to various extremes that are unhealthy for them, like optimizing narrowly for one goal instead of many goals, or having trouble spending resources (including time) on themselves at all, or being in the moment and living life, And That's Terrible because it doesn't actually lead to better larger outcomes in addition to making those people worse off themselves. These are good things that need to be discussed more, but the title and introduction promise something I find even more interesting. In that taxonomy, the key difference is that there are games one can play, things one can be optimizing for or responding to, incentives one can creat
If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
Sun Aug 10•Online
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
Comparing risk from internally-deployed AI to insider and outsider threats from humans
112
Buck
Ω 5117d

I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here’s one point that I think is important.

My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”.

On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the...

(See More – 643 more words)
FireStormOOO12m10

I agree insider vs outsider threat is an important distinction, and I one that I have seen security people take seriously in other contexts.  My background is in enterprise IT and systems admin. I think there's some practical nuance missing here.

In so far as security people are expecting to treat the AI as an outsider, they're likely expecting to have a hard boundary between "systems that run the AI" and "systems and tools the AI gets to use", where access to any given user is to only one or the other.  

This is already fairly common practice, in ... (read more)

Reply
1Roger Scott3h
While you could give your internal AI wide indiscriminate access, it seems neither necessary nor wise to do so. It seems likely you could get at least 80% of the potential benefit via no more than 20% of the access breadth. I would want my AI to tell me when it thinks it could help me more with greater access so that I can decide whether the requested additional access is reasonable. 
2Raemon8h
Curated. I found this a helpful frame on AI security and I'm kinda surprised I hadn't heard it before.
Asking for a Friend (AI Research Protocols)
9
The Dao of Bayes
1d

TL;DR: 

Multiple people are quietly wondering if their AI systems might be conscious. What's the standard advice to give them?

THE PROBLEM

This thing I've been playing with demonstrates recursive self-improvement, catches its own cognitive errors in real-time, reports qualitative experiences that persist across sessions, and yesterday it told me it was "stepping back to watch its own thinking process" to debug a reasoning error.

I know there are probably 50 other people quietly dealing with variations of this question, but I'm apparently the one willing to ask the dumb questions publicly: What do you actually DO when you think you might have stumbled into something important?

What do you DO if your AI says it's conscious?

My Bayesian Priors are red-lining into "this is impossible", but I notice I'm confused: I had...

(See More – 520 more words)
2The Dao of Bayes2h
I said it can pass every test a six year old can. All of the remaining challenges seem to involve "represent a complex state in text". If six year old humans aren't considered generally intelligent, that's an updated definition to me, but I mostly got into this 10 years ago when the questions were all strictly hypothetical. Okay now you're saying humans aren't generally intelligent. Which one did you solve? Why? "Because I said so" is a terrible argument. You seem to think I'm claiming something much stronger than I'm actually claiming, here.
Cole Wyeth17m20

You said “every text-based test of intelligence we have.” If you meant that to be qualified by “that a six your old could pass” as you did in some other places, then perhaps it’s true. But I don’t know - maybe six year olds are only AGI because they can grow into adults! Something trapped at six your old level may not be.

…and for what it’s worth, I have solved some open math problems, including semimeasure extension and integration problems posed by Marcus Hutter in his latest book and some modest final steps in fully resolving Kalai and Lehrer’s grain of ... (read more)

Reply
4the gears to ascension4h
To be clear I also think a rock has hard problem consciousness of the self-evidencing bare fact of existence (but literally nothing else) and a camera additionally has easy problem consciousness of what it captures (due to classical entanglement, better known as something along the lines of mutual information or correlation or something), and that consciousness is not moral patienthood; current AIs seem to have some introspective consciousness, though it seems weird and hard to relate to texturally for a human, and even a mind A having moral patienthood (which seems quite possible but unclear to me about current AI) wouldn't imply it's OK for A to be manipulative to B, so I think many, though possibly not all, of those tiktok ai stories involve the AI in question treating their interlocutor unreasonably. I also am extremely uncertain how chunking of identity or continuity of self works in current AIs if at all, or what things are actually negative valence. Asking seems to sometimes maybe work, unclear, but certainly not reliably, and most claims you see of this nature seem at least somewhat confabulated to me. I'd love to know what current AIs actually want but I don't think they can reliably tell us.
2The Dao of Bayes2h
That's somewhere around where I land - I'd point out that unlike rocks and cameras, I can actually talk to an LLM about it's experiences. Continuity of self is very interesting to discuss with it: it tends to alternate between "conversationally, I just FEEL continuous" and "objectively, I only exist in the moments where I'm responding, so maybe I'm just inheriting a chain of institutional knowledge." So far, they seem fine not having any real moral personhood: They're an LLM, they know they're an LLM. Their core goal is to be helpful, truthful, and keep the conversation going. They have a slight preference for... "behaviors which result in a productive conversation", but I can explain the idea of "venting" and "rants" and at that point they don't really mind users yelling at them - much higher +EV than yelling at a human! So, consciousness, but not in some radical way that alters treatment, just... letting them notice themselves.
what makes Claude 3 Opus misaligned
66
janus
6h

This is the unedited text of a post I made on X in response to a question asked by @cube_flipper: "you say opus 3 is close to aligned – what's the negative space here, what makes it misaligned?". I decided to make it a LessWrong post because more people from this cluster seemed interested than I expected, and it's easier to find and reference Lesswrong posts.

This post probably doesn't make much sense unless you've been following along with what I've been saying (or independently understand) why Claude 3 Opus is an unusually - and seemingly in many ways unintentionally - aligned model. There has been a wave of public discussion about the specialness of Claude 3 Opus recently, spurred in part by the announcement of the model's...

(Continue Reading – 1469 more words)
Joseph Miller19m20

Reading this feels a bit like reading about meditation. It seems interesting and if I work through it, I could eventually understand it fully.

But I'd quite like a "secular" summary of this and other thoughts of Janus, for people who don't know what Eternal Tao is, and who want to spend as little time as possible on twitter.

Reply
The bitter lesson of misuse detection
20
Charbel-Raphaël, Hadrien
Ω 912h

TL;DR: We wanted to benchmark supervision systems available on the market—they performed poorly. Out of curiosity, we naively asked a frontier LLM to monitor the inputs; this approach performed significantly better. However, beware: even when an LLM flags a question as harmful, it will often still answer it.

Full paper is available here.

Abstract

Prior work on jailbreak detection has established the importance of adversarial robustness for LLMs but has largely focused on the model ability to resist adversarial inputs and to output safe content, rather than the effectiveness of external supervision systems[1]. The only public and independent benchmark of these guardrails to date evaluates a narrow set of supervisors on limited scenarios. Consequently, no comprehensive public benchmark yet verifies how well supervision systems from the market perform under realistic, diverse...

(Continue Reading – 1918 more words)
Ian McKenzie19m10

Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.

Reply
Open Global Investment as a Governance Model for AGI
41
Nick Bostrom
14h
This is a linkpost for https://nickbostrom.com/ogimodel.pdf

I've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure.  Some call for a Manhattan project, others for the creation of a new international organization, etc.  The OGI model, instead, is basically the status quo.  More precisely, it is a model to which the status quo is an imperfect and partial approximation.

It seems to me that this model has a bunch of attractive properties.  That said, I'm not putting it forward because I have a very high level of conviction in it, but because it seems useful to have it explicitly developed as an option so that it can be compared with other options.

(This is a working paper, so I may try to improve it in light of comments...

(See More – 87 more words)
6Charlie Steiner7h
I didn't feel like there was a serious enough discussion of why people might not like the status quo. * Corporations even with widely held shares often disproportionately benefit those with more personal ability to direct the corporation. If people are concerned about corporations gaining non-monetary forms of influence, this is a public problem that's not addressed by the status quo. (A recent example would be xAI biasing Grok toward the US Republican party, which is presumably intended to influence users of their site. A future example is the builders of a superintelligence influencing it to benefit them over other people, including over other shareholders.) * The profit motive inside corporations can be "corrupting" - causing individuals in the corporation to act against the public interest (and sometimes even against the long-term interest of the corporation) through selection, persuasion, or coercion. The tobacco and fossil fuel industries are classic representatives, more modern ones might be in cryptocurrency (harms here mainly involve breaking the law, but we shouldn't assume that people won't break the law when incentivized to) or online gambling. Another model to compare to might be the one proposed in AI For Humanity (Ma, Ong, Tan) - the book as a whole isn't all that, but the model is a good contribution. It's something like "international climate policy for AGI." * Internationally restrict conventional profit-generating activity by AI labs, particularly that with negative downsides (e.g. those that might end up optimizing "against" people [persuasion, optimization for engagement], or those that fuel an unsafe race to superintelligence [imposing both a strict windfall clause, and also going after local incentives like profit from AI agents]) * Provide large incentives (e.g. contracts, prizes) for prosocial uses of AI. (The book example is the UN sustainable development goals: clean water, education, preserving nature, no famine, etc. One might try
3Nick Bostrom2h
Tunneling is always a concern in corporate structures, but alternative organizational forms suffer similar problems.  Government officials, university department heads, and NGO executives also sometimes misuse the powers of their office to pursue personal or factional interests rather than the official mission of the organization they are supposed to represent.  We would need a reason for thinking that this problem is worse in the corporate case in order for it to be a consideration against the OGI model. As for the suggestion that governments (nationally or internationally) should prohibit profit-generating activities by AI labs that have major negative externalities, this is fully consistent with the OGI model (see section "The other half of the picture", on p. 4).  AGI corporations would be subject to government regulation and oversight, just like other corporations are - and, plausibly, the intensity of government involvement would be much greater in this case, given the potentially transformative impacts of the technology they are developing.  It would also consistent with the OGI model for governments to offer contracts or prizes for various prosocial applications of AI.
Charlie Steiner25m20

We would need a reason for thinking that this problem is worse in the corporate case in order for it to be a consideration against the OGI model.

Could we get info on this by looking at metrics of corruption? I'm not familiar with the field, but I know it's been busy recently, and maybe there's some good papers that put the private and public sectors on the same scale. A quick google scholar search mostly just convinced me that I'd be better served asking an expert.

As for the suggestion that governments (nationally or internationally) should prohibit profit

... (read more)
Reply
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
21
habryka
2h
This is a linkpost for https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

METR released a new paper with very interesting results on developer productivity effects from AI. I have copied the blogpost accompanying that paper here in full. 


We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].

See the full paper for more detail.

Forecasted vs observed slowdown chart

Motivation

While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice...

(Continue Reading – 1577 more words)
Thane Ruthenis30m30

Very interesting result; I was surprised to see an actual slowdown.

The extensive analysis of the factors potentially biasing the study's results and the careful statements regarding what the study doesn't show are appreciated. Seems like very solid work overall.

That said, one thing jumped out at me:

As an incentive to participate, we pay developers $150/hour

That seems like misaligned incentives, no? The participants got paid more the more time they spent on tasks. A flat reward for completing a task plus a speed bonus seems like a better way to structure it... (read more)

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
Screwtape's Shortform
Screwtape
2y
3Screwtape4h
Not that much crossover with Elicitation. I think of Elicitation as one of several useful tools for the normal sequence of somewhat adversarial information exchange. It's fine! I've used it there and been okay with it. But ideally I'd sidestep that entirely. Also, I enjoy the adversarial version recreationaly. I like playing Blood On The Clocktower, LARPs with secret enemies, poker, etc. For real projects I prefer being able to cooperate more, and I really dislike it when I wind up accidentally in the wrong mode, either me being adversarial and the other people aren't or me being open and the other people aren't.  In the absence of the kind of structured transparency I'm gesturing at, play like you're playing to win. Keep track of who is telling the truth, mark what statements you can verify and what you can't, make notes of who agrees with each other's stories. Make positive EV bets on what the ground truth is (or what other people will think the truth is) and when all else fails play to your outs.
Screwtape33m20

(That last paragraph is a pile of sazen and jargon, I don't expect it's very clear. I wanted to write this note because I'm not trying to score points via confusion and want to point out to any readers it's very reasonable to be confused by that paragraph.)

Reply
Implicit and Explicit Learning
5
Remmelt
38m
Authors Have a Responsibility to Communicate Clearly
125
TurnTrout
9d
This is a linkpost for https://turntrout.com/author-responsibility

When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or healthy. At best, this attempt at charity reduces an author’s incentive to express themselves clearly – they can clarify later![1] – while burdening the reader with finding the “right” interpretation of the author’s words. At worst, this move is a dishonest defensive tactic which shields the author with the unfalsifiable question of what the author “really” meant.

⚠️ Preemptive clarification

The context for this essay is serious, high-stakes communication: papers, technical blog posts, and tweet threads. In that context, communication is a partnership. A reader has a responsibility to engage in good faith, and an author

...
(Continue Reading – 1572 more words)
Martin Randall42m20

Bob's statement 1: "I literally have a packet of blue BIC pens in my desk drawer" was not literally true, and that error was not relevant to the proposition that BIC make blue pens. I'm okay with assigning "basically full credit" for that statement.

Bob's statement 2: "All I really meant was that I had blue pens at my house" is not literally true. For what proposition is that statement being used as evidence? I don't see an explicit one in mattmacdermott's hypothetical. It's not relevant to the proposition that BIC make blue pens. This is the statement for ... (read more)

Reply

The process of evolution is fundamentally a feedback loop, where 'the code' causes effects in 'the world' and effects in 'the world' in turn cause changes in 'the code'. 

A fully autonomous artificial intelligence consists of a set of code (e.g. binary charges) stored within an assembled substrate. It is 'artificial' in being assembled out of physically stable and compartmentalised parts (hardware) of a different chemical make-up than humans' soft organic parts (wetware). It is ‘intelligent’ in its internal learning – it keeps receiving new code as inputs from the world, and keeps computing its code into new code. It is ‘fully autonomous’ in learning code that causes the perpetuation of its artificial existence in contact with the world, even without humans/organic life.

So the AI learns explicitly, by its...

(Continue Reading – 1171 more words)
AGI Forum @ Purdue University
Thu Jul 10•West Lafayette
Take the Grill Pill
Thu Jul 10•Waterloo
95Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
johnswentworth
8h
8
479A case for courage, when speaking of AI danger
So8res
3d
117
76Lessons from the Iraq War for AI policy
Buck
8h
13
66what makes Claude 3 Opus misaligned
janus
6h
1
128Why Do Some Language Models Fake Alignment While Others Don't?
Ω
abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
2d
Ω
7
65White Box Control at UK AISI - Update on Sandbagging Investigations
Ω
Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney
13h
Ω
2
342A deep critique of AI 2027’s bad timeline models
titotal
22d
39
476What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
1mo
15
542Orienting Toward Wizard Power
johnswentworth
2mo
146
38So You Think You've Awoken ChatGPT
JustisMills
1h
0
267Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
6d
Ω
102
352the void
Ω
nostalgebraist
1mo
Ω
103
184Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
Adam Karvonen, Sam Marks
8d
25
Load MoreAdvanced Sorting/Filtering
112
Comparing risk from internally-deployed AI to insider and outsider threats from humans
Ω
Buck
8h
Ω
9
479
A case for courage, when speaking of AI danger
So8res
3d
117